Video to Rich Article Generator

I want to input a video and get an output of a rich article with images. For example, I might record a "how to" video such as `how to make software mockups with GIMP`. Or, I might record a "breaking news" video such as `new LLM breaks all benchmarks`. The video will be a screencast video where I demonstrate something, either how to use a software, or websites I am discussing, etc. I will demonstrate visually and talk about what I am doing at the same time. Then, I want to input the video MP4. And I would like the agents will do the following, to convert this video to an article: 1. Transcribe my text. This will generate the transcription of my text with timestamps. 2. Decide on six key sections. My agents work best when given a clear number of actions, so here I suggest an LLM reviews the transcription and suggests six chapters/sections to divide the video into. These may be steps (in the case of a `how to` video) or simply key take-aways (in the case of a `news` video). These may be explicitly mentioned by me as steps/key sections, or the LLM may need to use its own judgement. These sections should be returned with timestamps, a title of the section, and a brief summary of what is discussed in this section. 3. Transcription rewritten Next we will want to rewrite the transcription (since it was written by an LLM and is imperfect). 4. Generate thumbnail images Next we will want to extract thumbnail images from the video, for the six timestamps from step 2. These six images are intended to be inserted into the article 5. Generate article Finally we should take the rewritten transcription and insert the thumbnail images to generate a final article.


subagent1

subagentX-refined

subagentXmermaid

https://static.aiz.ac/1723651668-mermaid/mermaid-1.png