I want to build an ai agent that returns screenshots videos and images relating to a product/website URL.
So I enter a product/company name (variable1), along with a website url for this product/company name (variable2), and the agent will return various screenshot videos and images relating to this product/website.
I think we can use 5 subagents, all of them individually very simple, but together quite comprehensive:
1. Firstly, we will take the website URL and return a screenshot video of the website.
2. Secondly, I want a screenshot video of the youtube URL in question. To do this, we will take the product name and insert it into a youtube URL template (https://www.youtube.com/results?search_query=product+name, eg https://www.youtube.com/results?search_query=deepseek+r1). We should simply tell the task to drop [variable2] into the youtube template url, and this will then be processed by the screenshot video task, to ensure we are extracting a screenshot video of the specific URL of https://www.youtube.com/results?search_query=product+name .
3. Thirdly, I want us to search the web for IMAGES that relate to the product in question.
4. Fourthly, I want to again get screenshot videos of news URLs, and this would involve a few steps - perhaps first we find news URLs about this product, then an LLM returns the top 2 most relevant ones, then we take screenshots of those 2. So we would have 2 silos here within this subagent, both searching for news URLs, then passing those URLs so the `video screenshot of URL` skill and asking it to take a screenshot video URL of #1, then #2`
5. Fifthly, a simple final LLM task can input all previously created content - the screenshot video of the website, the screenshot video of the youtube url, the images about the product, and the screenshot of the 2 news URLs. All could be returned in a JSON.