What Shall We Build Next?

Describe

Describe your task

Refine

Refine the plan

SubAgents

Review all agents

Deploy

Deploy your agent

Let me break down the Transcription Processor subagent in detail:

A) SUBAGENT SUMMARY: 
A specialized processor that converts video content into a structured, timestamped JSON transcript by extracting audio and processing it through speech-to-text analysis.

B) FINAL TASK OUTPUT:
A JSON file containing:
- Full transcription text
- Precise timestamps (in seconds) for each spoken segment
- Each segment containing approximately 3-8 seconds of speech
- Format: {timestamp: "00:00", text: "spoken content"}

C) SUBAGENT INPUT:
- MP4 video file URL

D) SUBAGENT TASK SUMMARY:
Input MP4 URL > 
#196 Extract MP3 Audio From MP4 File >
#207 Get Transcription From MP4 Video URL >
#223 Powerful LLM Prompt-to-Text Response (to format into JSON) >
Output JSON transcript file

E) SILOS:
The subagent operates in three distinct silos:

SILO 1: AUDIO EXTRACTION
Purpose: Extract clean audio from video
Input: MP4 URL
Skill: #196 Extract MP3 Audio From MP4 File
Output: MP3 URL

SILO 2: TRANSCRIPTION GENERATION
Purpose: Generate raw timestamped transcription
Input: MP3 URL from Silo 1
Skill: #207 Get Transcription From MP4 Video URL
Output: Raw timestamped transcription text

SILO 3: JSON FORMATTING
Purpose: Structure transcription into clean JSON
Input: Raw timestamped transcription from Silo 2
Skill: #223 Powerful LLM Prompt-to-Text Response
Prompt instruction: "Convert the following timestamped transcription into JSON format where each entry contains a timestamp and corresponding text segment. Format as {timestamp: 'MM:SS', text: 'spoken content'}"
Output: Formatted JSON file

This structure ensures clean separation of concerns while maintaining a clear data flow from video input to final JSON output. Each silo handles a specific transformation of the data, making the process more robust and easier to troubleshoot if needed.

The final JSON format will be particularly useful for the next subagent (Step Identification & Analysis) to process and identify the six key tutorial steps.

SubAgent #1 - Diagram

Expand Diagram

Let me break down the Step Identification & Analysis subagent in detail.

A) SUBAGENT SUMMARY: 
A specialized analysis agent that processes a video's transcription to identify, structure, and output exactly six key tutorial steps with their associated timestamps.

B) FINAL TASK OUTPUT:
A JSON file containing:
- Six distinct tutorial steps
- Each step containing:
  * Step number (1-6)
  * Step title/heading
  * Associated timestamp from video
  * Full transcription text relevant to that step
  * Brief description/summary of the step

C) SUBAGENT INPUT:
- Complete timestamped transcription file from Subagent 1 (in format: [timestamp] transcribed_text)
- Video duration in seconds
- Tutorial topic/title

D) SUBAGENT TASK SUMMARY:
1. Initial Analysis:
   - Use Skill #223 (Powerful LLM) to analyze full transcription and identify all potential tutorial steps
   
2. Step Selection & Structuring:
   - Use Skill #223 again to select and structure exactly 6 most important steps from identified steps
   
3. Timestamp Association:
   - Use Skill #190 (Write/rewrite text) to process and clean up the timestamps for each step
   
4. JSON Construction:
   - Use Skill #185 (Write Text) to format everything into proper JSON structure

E) SILOS:
The workflow divides into three distinct silos:

SILO 1: INITIAL CONTENT ANALYSIS
```
Input: Raw transcription
Skill #223: Analyze transcription to identify all tutorial steps
Output: List of all potential steps with timestamps
```

SILO 2: STEP SELECTION & REFINEMENT
```
Input: List of all potential steps
Skill #223: Select and structure exactly 6 key steps
Output: 6 refined steps with descriptions
```

SILO 3: FORMAT & STRUCTURE
```
Input: 6 refined steps
Skill #190: Clean up timestamps and text
Skill #185: Format into JSON
Output: Final JSON file
```

Each silo operates sequentially, with the output of each feeding into the next. The final JSON structure will be formatted as:
```json
{
  "tutorial_title": "string",
  "total_steps": 6,
  "steps": [
    {
      "step_number": 1,
      "title": "string",
      "timestamp": "MM:SS",
      "transcript_text": "string",
      "description": "string"
    },
    // ... repeated for all 6 steps
  ]
}
```

SubAgent #2 - Diagram

Expand Diagram

I'll analyze SUBAGENT 3 (Visual Content Processor) and break it down according to your requirements:

A) SUBAGENT SUMMARY:
A specialized processor that extracts, analyzes, and optimizes six key video frames into standardized PNG screenshots that illustrate the main steps of a tutorial.

B) FINAL TASK OUTPUT:
Six optimized PNG image files, each 1400x1400 pixels, showing clear tutorial steps, saved with sequential naming (step1.png through step6.png)

C) SUBAGENT INPUT:
- MP4 video file URL
- JSON file containing six specific timestamps (from Subagent 2)
- Optional contextual descriptions for each timestamp

D) SUBAGENT TASK SUMMARY:
Input > Extract Frames > Quality Check > Optimize > Output

Detailed flow:
1. For each of the 6 timestamps:
   - Use Skill #194 to cut small section from MP4 video
   - Use Skill #202 to extract thumbnail from that section
2. For each extracted image:
   - Use Skill #176 to analyze image quality with GPT Vision
   - Use Skill #191 to resize to 1400x1400
3. Final quality verification:
   - Use Skill #177 to analyze all 6 images together for consistency

E) SILOS:
The subagent operates in three distinct silos:

SILO 1: FRAME EXTRACTION (Repeated 6x)
- Input: MP4 URL + timestamp
- Skill #194: Cut small section of video (2-second clip around timestamp)
- Skill #202: Extract thumbnail
Output: Raw frame PNG

SILO 2: INDIVIDUAL IMAGE PROCESSING (Repeated 6x)
- Input: Raw frame PNG
- Skill #176: Analyze image quality ("Check if image clearly shows tutorial step and is visually clear")
- Skill #191: Resize to 1400x1400
Output: Processed PNG

SILO 3: BATCH QUALITY VERIFICATION
- Input: All 6 processed PNGs
- Skill #177: Analyze all images together ("Verify all 6 images show clear tutorial steps and maintain visual consistency")
- Output: Final verified PNG set

This structure ensures each frame is properly extracted, processed, and verified both individually and as part of the complete set, with clear quality controls at each stage.

SubAgent #3 - Diagram

Expand Flow

Let me break this down for SUBAGENT 4: ARTICLE ASSEMBLY & FORMATTING:

A) SUBAGENT SUMMARY: 
A content integration agent that combines structured tutorial steps, optimized images, and formatted text into a complete markdown article following a standardized tutorial template.

B) FINAL TASK OUTPUT:
A single markdown (.md) file containing:
- Title section
- Introduction paragraph
- 6 distinct tutorial steps, each with:
  * Numbered heading
  * Descriptive text
  * Embedded image (PNG format)
- Conclusion section
All formatted with proper markdown syntax (##headings, image embeds, etc.)

C) SUBAGENT INPUT:
- JSON file containing 6 structured tutorial steps with their descriptions
- 6 PNG image files (numbered 1-6)
- Original video transcription text (for context/introduction)

E) SUBAGENT TASK SUMMARY:
1. Generate article structure (#223 Powerful LLM) 
   - Input: JSON of 6 steps + transcription
   - Output: Basic article structure with intro/conclusion
   
2. Format step content (#190 Write/rewrite text)
   - Input: Step descriptions from JSON
   - Output: Properly formatted step text
   
3. Process image embedding (#185 Write Text)
   - Input: 6 PNG files
   - Output: Markdown image embedding syntax
   
4. Final assembly (#190 Write/rewrite text)
   - Input: All components
   - Output: Complete markdown document

F) SILOS:
SILO 1: INITIAL STRUCTURE
- Skill #223: Generate article template from input JSON
- Skill #190: Format introduction text from transcription
- Skill #190: Format conclusion text

SILO 2: STEP FORMATTING (Repeated 6x)
- Skill #190: Format each step's content
- Skill #185: Generate image embedding markdown for each step

SILO 3: FINAL ASSEMBLY
- Skill #190: Combine all elements into final markdown
- Skill #223: Final formatting check and cleanup

The workflow ensures each component is properly formatted before final assembly, with special attention to maintaining consistent markdown syntax and proper image embedding throughout the document.