Cut out news website XY

I want to research and build a database of news websites, take screenshots of the sites, and then cut out x/y co-ordinates of the most important, updated d and/or newsworthy parts of the website (eg above the fold, there might be a "TOP NEWS" container which is at x=100,y300 to x=500,y=900). I will enter a single keyword topic, then the Agent will research news websites on the topic, and generate a shortlist of possible URLs (eg "10x top news websites about xyz"). That shortlist will then be filtered down by taking a screenshot of each website and looking to see if there is a rectangular x/y shape (eg a "TOP NEWS" or "LATEST" container) that can be crawled. This container should be in as aspect ratio shape of between 1:3 and 3:1. We will then crop out the x/y co-ordinates of this image and save it to our database. We will then categorise all of these websites (perhaps by looking at the similarweb traffic rank, keyword research for relevance, and perhaps crawling the text content of the page, then passing all this info to an LLM to give the site a final title, description and perhaps 1-5 related keywords and 6-20 secondary keywords). Finally, this database, containing the original full screenshot image, the x/y cropped image, final title, descrirption, related keywords (comma separated) and secondary keywords (comma separated) will be saved in a CSV file, ready to be exported.


final-output-comps..1

final-output-comps..2

final-output-comps GPT..3

CLAUDE:- final-output-and-component-outputs



claude 3 - extract final output


extract component outputs


Claude - research-ideas-and-input-variables


GPT - research-ideas-and-input-variables


template-ideas


claude 3 - searches


Example CSV File News Website Data Title Description Keywords
Top News Website Screenshot Layout Analysis
News Website Top Stories Container Cropped Image Example
News Website Metadata Best Practices Title Description Keywords
Keyword Research Report Template News Industry

subagent-summary


subagent-summary-amended-for-actual-tasks..1

subagent-summary-amended-for-actual-tasks..2

subagent-summary-amended-for-actual-tasks GPT..3

subagent-summary-amended-for-actual-tasks


random-autocomplete


claude 3 - mermaid-for-all





SubAgent #1

claude 3 - subagent1


subagent1-refined


subagent1refined-repeating


Subagent1 task-ID-numbers


subagent1-output-variables


subagent1refined-repeating-json-core


subagent1mermaid






SubAgent #2

claude 3 - subagent2


subagent2-refined


subagent2refined-repeating


Subagent2 task-ID-numbers


subagent2-output-variables


subagent2refined-repeating-json-core


subagent2mermaid






SubAgent #3

claude 3 - subagent3


subagent1-refined


subagent1refined-repeating


Subagent1 task-ID-numbers3


subagent3-output-variables


subagent3refined-repeating-json-core


subagent3mermaid


prompts-required



subagent1-json-array


subagent1-json-array-mermaid


subagent2-json-array


subagent2-json-array-mermaid