What Shall We Build Next?

Describe

Describe your task

Refine

Refine the plan

SubAgents

Review all agents

Deploy

Deploy your agent

Let me break this down for SUBAGENT 1: TRANSCRIPTION GENERATOR.

A) SUBAGENT SUMMARY: 
A specialized transcription agent that converts a tutorial video into a detailed, timestamped text transcript, marking speaker utterances approximately every 3-8 seconds for precise synchronization with video content.

B) FINAL TASK OUTPUT:
A text file containing timestamped transcription with the following specific format:
- Timestamps in [00:00:00] format
- Text segments of approximately 3-8 seconds each
- Clear paragraph breaks between timestamped segments
- UTF-8 encoded text file
- Includes speaker utterances, pauses, and relevant audio cues
- Maintains chronological order of speech

C) SUBAGENT INPUT:
- Primary Input: URL of MP4 video file
- Secondary Input (optional): Any specific transcription requirements or focus areas

E) SUBAGENT TASK SUMMARY:
1. MP4 Input > Skill #196 (Extract MP3 Audio From MP4 File) > MP3 URL
2. MP3 URL > Skill #198 (Get Transcription Of MP3 With Timings) > Initial Transcription Text
3. Initial Transcription > Skill #190 (Write or rewrite text based on instructions) [with prompt to format and clean transcript] > Final Formatted Transcript

F) SILOS:
SILO 1: AUDIO EXTRACTION
- Purpose: Extract clean audio from video
- Input: MP4 URL
- Process: Skill #196
- Output: MP3 URL

SILO 2: BASIC TRANSCRIPTION
- Purpose: Generate raw transcription
- Input: MP3 URL from Silo 1
- Process: Skill #198
- Output: Raw timestamped transcription

SILO 3: TRANSCRIPT FORMATTING
- Purpose: Clean and format transcript
- Input: Raw transcription from Silo 2
- Process: Skill #190 with formatting prompt
- Output: Final formatted transcript
- Formatting Rules:
  * Consistent timestamp format [00:00:00]
  * Clear paragraph breaks
  * Proper punctuation
  * Removal of unnecessary timestamps (maintaining 3-8 second intervals)
  * Proper marking of speaker changes or significant pauses

This workflow ensures we get a clean, properly formatted transcript that can be easily used by subsequent subagents in the larger workflow. The separation into silos allows for quality control at each stage and makes it easier to troubleshoot if any issues arise.

SubAgent #1 - Diagram

Expand Diagram

Let me break down the StepExtractor subagent in detail:

A) SUBAGENT SUMMARY:
A specialized parser that receives a timestamped video transcription and intelligently extracts/condenses it into exactly six discrete tutorial steps with timestamps.

B) FINAL TASK OUTPUT:
A structured JSON array containing exactly 6 step objects, where each object contains:
{
  "step_number": (1-6),
  "step_title": "concise action title",
  "timestamp": "MM:SS",
  "description": "2-3 sentence explanation",
  "transcript_excerpt": "relevant quote from original"
}

C) SUBAGENT INPUT:
- Complete timestamped transcription from VideoTranscriber (in format "MM:SS - spoken text")
- Original video duration (in seconds)
- Tutorial topic/title

E) SUBAGENT TASK SUMMARY:
1. Initial Analysis:
Use Skill #223 (Powerful LLM) to analyze full transcription and identify ALL possible tutorial steps
Input: transcription + prompt "Analyze this tutorial transcript and identify ALL distinct instructional steps"
Output: Text list of potential steps

2. Step Condensing:
Use Skill #190 (Write/rewrite text) to condense into exactly 6 major steps
Input: full step list + prompt "Condense these steps into exactly 6 major tutorial steps"
Output: 6 step descriptions

3. Timestamp Extraction:
Use Skill #223 to match each condensed step with most relevant timestamp from original
Input: 6 steps + original transcription + prompt "Find most relevant timestamp for each step"
Output: Steps with timestamps

4. Final Formatting:
Use Skill #190 to structure everything into final JSON format
Input: All previous outputs + prompt "Format into JSON with specified structure"
Output: Final JSON array

F) SILOS:
The subagent operates in a linear fashion without parallel processing needs, but has these distinct phases:
1. Analysis Silo: Understanding full content (Skill #223)
2. Condensing Silo: Creating exactly 6 steps (Skill #190)
3. Timing Silo: Timestamp matching (Skill #223)
4. Formatting Silo: JSON structuring (Skill #190)

This structure ensures that regardless of input length or complexity, we always get exactly 6 well-formed, timestamped steps that capture the key tutorial elements while maintaining chronological accuracy.

SubAgent #2 - Diagram

Expand Diagram

Let me break down this specific subagent according to the requested format.

A) SUBAGENT SUMMARY: 
A specialized extractor that captures exactly six high-quality screenshot images from specific timestamps of a tutorial video, naming and organizing them systematically for use in the final how-to article.

B) FINAL TASK OUTPUT:
Six individual PNG image files:
- Named systematically as step1.png through step6.png
- Each saved at 1024x1024 resolution (standard size for tutorial screenshots)
- Stored on server with accessible URLs
- Accompanied by a JSON metadata file containing:
  * Image filenames
  * Original timestamps
  * Image URLs
  * Brief description of what each image shows

C) SUBAGENT INPUT:
1. MP4 video URL of the tutorial
2. JSON array containing six timestamp markers (in seconds) identified by Subagent 2
3. Brief description for each timestamp explaining what should be visible in that frame

D) SUBAGENT TASK SUMMARY:
For each of the six required screenshots:
1. Input MP4 > Skill #194 (Cut Small Section From MP4 Video) [creates 1-second clip at timestamp]
2. Result MP4 > Skill #202 (Extract Thumbnail Images of MP4 Video) [extracts clear frame]
3. Result Image > Skill #191 (Resize Image) [standardizes to 1024x1024]
4. Result Image > Skill #176 (Analyze An Image With GPT Vision & Return Text) [validates image shows correct step]

E) SILOS:
SILO 1 - IMAGE EXTRACTION (Repeated 6 times, once per timestamp):
- Input: MP4 URL + single timestamp
- Process: 
  * Use #194 to extract 1-second clip
  * Use #202 to get clear frame
  * Use #191 to resize to 1024x1024
- Output: Single PNG file

SILO 2 - IMAGE VALIDATION:
- Input: 6 extracted images
- Process:
  * Use #176 to analyze each image
  * Verify each image shows correct tutorial step
- Output: Validation text for each image

SILO 3 - METADATA COMPILATION:
- Input: All validated images + analysis
- Process:
  * Use #223 (Powerful LLM) to generate JSON metadata
- Output: Final JSON file with image data

This workflow ensures we get exactly six high-quality, appropriately-sized screenshots from the tutorial video, with proper naming and metadata, ready for integration into the final article.

SubAgent #3 - Diagram

Expand Flow

Let me break this down for SUBAGENT 4 (TextRefiner):

A) SUBAGENT SUMMARY:
A text refinement system that accepts raw tutorial step data and transcription text, then processes it into polished, professional instructional content with exactly six steps plus introduction and conclusion.

B) FINAL TASK OUTPUT:
A structured text document containing:
- One introduction paragraph (150-200 words)
- Six distinct tutorial steps (each 100-150 words)
- One conclusion paragraph (100-150 words)
All formatted as plain text with clear section demarcations.

C) SUBAGENT INPUT:
1. Six-step outline (from StepExtractor) containing:
   - Step titles
   - Timestamps
   - Initial descriptions
2. Full video transcription text
3. Optional context about the tutorial topic

E) SUBAGENT TASK SUMMARY:
The workflow chains together as follows:

1. Research Phase:
#216 (Research Topic Deeply) > Research current best practices for the tutorial topic

2. Content Generation Phase:
#223 (Powerful LLM Prompt-to-Text Response) > Generate introduction
#190 (Write or rewrite text based on instructions) > Process each of the six steps
#223 (Powerful LLM Prompt-to-Text Response) > Generate conclusion

3. Refinement Phase:
#190 (Write or rewrite text based on instructions) > Final polish of all content

F) SILOS:
SILO 1: INTRODUCTION GENERATION
- Input: Topic research + transcription
- Skill #216: Research topic deeply
- Skill #223: Generate introduction
- Output: Introduction paragraph

SILO 2: STEP CONTENT (REPEATED 6 TIMES)
- Input: Step outline + relevant transcription section
- Skill #190: Write/rewrite step content
- Output: Polished step text

SILO 3: CONCLUSION GENERATION
- Input: All previous content
- Skill #223: Generate conclusion
- Output: Conclusion paragraph

SILO 4: FINAL ASSEMBLY
- Input: All generated content
- Skill #190: Final polish and formatting
- Output: Complete structured text document

This ensures each component is handled separately but cohesively, with appropriate skills for each specific task, while maintaining consistency across the entire output.

4 Template & Links

Expand Flow

Let me break down the Article Compiler (ArticleAssembler) subagent in detail:

A) SUBAGENT SUMMARY: 
A specialized compiler that takes refined tutorial text and screenshots to produce a properly formatted HTML/Markdown article with exactly six steps and corresponding images.

B) FINAL TASK OUTPUT:
A single HTML or Markdown file containing:
- Title section
- Introduction paragraph
- Six numbered sections, each containing:
  * Step heading
  * Descriptive paragraph
  * Embedded image (PNG format)
- Conclusion paragraph
All formatted with proper HTML tags or Markdown syntax for web display.

C) SUBAGENT INPUT:
1. Refined tutorial text content (from Subagent 4) containing:
   - Article title
   - Introduction text
   - Six step descriptions
   - Conclusion text
2. Six PNG image URLs (from Subagent 3)

E) SUBAGENT TASK SUMMARY:
The flow works like this:

1. First Pass - Content Preparation:
INPUT (text content) > #223 (LLM to structure content into clear sections) > structured text blocks

2. Second Pass - Format Conversion:
structured text + image URLs > #185 (Write Text) with specific HTML/Markdown formatting instructions > formatted article with image placeholders

3. Third Pass - Image Integration:
formatted article + image URLs > #185 (Write Text) with image embedding instructions > complete article with embedded images

F) SILOS:
The subagent operates in three distinct silos:

SILO 1: CONTENT STRUCTURING
- Purpose: Organize raw content into clear sections
- Input: Raw text content from Subagent 4
- Skill: #223 (Powerful LLM)
- Output: Structured text blocks ready for formatting

SILO 2: FORMAT CONVERSION
- Purpose: Convert structured content into HTML/Markdown
- Input: Structured text from Silo 1
- Skill: #185 (Write Text)
- Output: Formatted article with placeholders

SILO 3: IMAGE INTEGRATION
- Purpose: Embed images into formatted content
- Input: Formatted text + 6 image URLs
- Skill: #185 (Write Text)
- Output: Final article with embedded images

This structured approach ensures:
1. Clean separation of concerns
2. Proper formatting at each stage
3. Consistent output regardless of input variations
4. Easy troubleshooting if any stage fails

Each silo's output feeds directly into the next, creating a robust pipeline for article assembly.