What Shall We Build Next?

Describe

Describe your task

Refine

Refine the plan

SubAgents

Review all agents

Deploy

Deploy your agent

Let me break down the SCRIPTWRITER AGENT flow based on your guidelines:

A) SUBAGENT SUMMARY: 
A specialized script-writing agent that converts a user's topic/theme into a polished, well-structured 100-300 word voiceover script optimized for talking head videos.

B) FINAL TASK OUTPUT: 
A single text file containing a 100-300 word script, formatted specifically for voiceover recording, with clear paragraph breaks, proper punctuation for natural speech patterns, and timing indicators.

C) SUBAGENT INPUT:
- Primary topic/theme from user
- Any specific requirements (tone, style, duration preferences)
- Target audience information (if provided)

E) SUBAGENT TASK SUMMARY:
Input > #216 (Research Topic Deeply) > #218 (Brainstorm Related Keywords) > #223 (Powerful LLM Prompt-to-Text Response) > #190 (Write/Rewrite Text Based on Instructions) > #171 (Write Voice Over Script Based On Instructions) > Output

Specifically:
1. Research phase: Use #216 to gather comprehensive topic information
2. Keyword phase: Use #218 to identify key points to cover
3. Initial draft: Use #223 to create first version of script
4. Refinement: Use #190 to optimize for voice delivery
5. Final formatting: Use #171 to format specifically for voiceover

F) SILOS:
SILO 1 - RESEARCH & PREPARATION
- Input: User topic/requirements
- Skill #216: Deep research on topic
- Skill #218: Extract key talking points
- Output: Research summary + keywords

SILO 2 - INITIAL SCRIPT CREATION
- Input: Research summary from Silo 1
- Skill #223: Generate initial script
- Output: Raw script draft

SILO 3 - OPTIMIZATION & FORMATTING
- Input: Raw script from Silo 2
- Skill #190: Optimize for verbal delivery
- Skill #171: Format for voiceover
- Output: Final formatted script

This workflow ensures thorough research, proper structuring, and optimal formatting for voiceover delivery, while maintaining the target length of 100-300 words through progressive refinement in each silo.

SubAgent #1 - Diagram

Expand Diagram

I'll analyze the VOICESYNTH AGENT and break it down according to your requirements:

A) SUBAGENT SUMMARY: 
A specialized voice synthesis agent that converts written script text into a high-quality MP3 audio file, optimizing the voice-over for clarity and natural speech patterns.

B) FINAL TASK OUTPUT: 
A single MP3 file URL containing the complete voice-over narration, with clear audio quality suitable for video production, typically 1-5 minutes in duration (based on 100-300 word script).

C) SUBAGENT INPUT:
- Primary Input: Formatted text script (100-300 words) from Scriptwriter Agent
- Optional Input: Voice style preferences (if provided in original user requirements)

D) SUBAGENT TASK SUMMARY:
Input > #170 (Turn Script Into Voice Over MP3) > Output MP3 URL

However, given the critical nature of this component and available skills, I recommend expanding this into a more robust flow:

Input > #190 (Write/rewrite text based on instructions) [to optimize script for voice synthesis] > #170 (Turn Script Into Voice Over MP3) > #198 (Get Transcription of MP3 With Timings) [for quality verification] > Output MP3 URL

E) SILOS:
SILO 1: SCRIPT OPTIMIZATION
- Purpose: Prepare script for optimal voice synthesis
- Skill: #190 (Write/rewrite text based on instructions)
- Input: Original script
- Output: Voice-optimized script
- Note: Adds natural pauses, phonetic clarity

SILO 2: VOICE SYNTHESIS
- Purpose: Generate voice audio
- Skill: #170 (Turn Script Into Voice Over MP3)
- Input: Optimized script
- Output: MP3 URL

SILO 3: QUALITY VERIFICATION
- Purpose: Verify audio quality and timing
- Skill: #198 (Get Transcription of MP3 With Timings)
- Input: Generated MP3
- Output: Transcription with timings
- Note: This allows verification of proper synthesis before passing to next agent

This enhanced workflow adds crucial quality control steps while maintaining the core functionality, ensuring the audio output meets the requirements for the subsequent video generation stage.

SubAgent #2 - Diagram

Expand Diagram

Let me break down the AVATARCREATOR AGENT flow in detail:

A) SUBAGENT SUMMARY: 
Creates a themed, professional-looking AI avatar image suitable for a talking head video, ensuring the avatar matches the context of the video content and has appropriate composition for animation.

B) FINAL TASK OUTPUT:
1024x1024 transparent PNG file of a head-and-shoulders portrait with clear facial features, particularly well-defined mouth area, suitable for lip-sync animation.

C) SUBAGENT INPUT:
1. Video topic/theme description
2. Script context (from Subagent 1)
3. Optional style preferences for the avatar

D) SUBAGENT TASK SUMMARY:
Input > #223 (Generate optimal prompt) > #182 (Create first attempt) > #176 (Analyze image) > #222 (Create refined version) > #191 (Resize if needed) > Output

E) SILOS:
The process breaks into three distinct silos:

SILO 1: PROMPT ENGINEERING
1. Use Skill #223 (Powerful LLM Prompt-to-Text Response)
- Input: Video topic + script context
- Output: Optimized prompt for avatar generation
- Purpose: Ensures prompt will generate suitable talking head image

SILO 2: INITIAL GENERATION & VALIDATION
1. Use Skill #182 (Create Dalle Image)
- Input: Optimized prompt from Silo 1
- Output: First attempt at avatar (1024x1024 PNG)

2. Use Skill #176 (Analyze Image With GPT Vision)
- Input: Generated PNG from step 1
- Output: Analysis of image suitability
- Purpose: Verify image meets talking head requirements

SILO 3: REFINEMENT & OPTIMIZATION
1. If needed, use Skill #222 (Make Image)
- Input: Original prompt + feedback from analysis
- Output: Refined avatar image
- Purpose: Creates improved version if first attempt inadequate

2. If needed, use Skill #191 (Resize Image)
- Input: Final avatar PNG
- Output: Properly sized 1024x1024 PNG
- Purpose: Ensures exact size requirements met

The silos operate sequentially but may loop back if the analysis shows the image needs regeneration.

This workflow ensures:
- Proper prompt engineering for optimal avatar generation
- Quality control through image analysis
- Refinement if needed
- Exact size specifications
- Transparent background for video processing
- Suitable composition for lip-sync animation

SubAgent #3 - Diagram

Expand Flow

I'll analyze the VIDEOASSEMBLER AGENT with these guidelines in mind:

A) SUBAGENT SUMMARY: 
A specialized agent that combines an AI avatar PNG image with an MP3 voiceover to create a synchronized talking head video where the avatar's mouth movements match the audio speech patterns.

B) FINAL TASK OUTPUT:
MP4 video file (1024x1024 resolution, duration matching input MP3 length) containing the animated avatar with synchronized lip movements matching the voiceover audio.

C) SUBAGENT INPUT:
- PNG URL of AI avatar (1024x1024, transparent background)
- MP3 URL of voiceover audio
- Original script text with timings (optional but recommended for improved lip-sync accuracy)

D) SUBAGENT TASK SUMMARY:
The core workflow should be:

1. Extract audio timing data:
input: MP3 URL > #198 (Get Transcription of MP3 With Timings) > text transcription with timestamps

2. Generate initial talking head:
input: MP3 URL + PNG URL > #168 (Generate Talking Head Video From MP3 & transcription) > initial MP4

3. Quality check and refine:
input: initial MP4 > #207 (Get Transcription From MP4 Video URL) > verification of sync
input: initial MP4 > #202 (Extract Thumbnail Images) > verification of key frames

E) SILOS:
SILO 1 - AUDIO ANALYSIS
- Purpose: Extract precise timing data for lip sync
- Skills: #198
- Input: MP3 URL
- Output: Timestamped transcription

SILO 2 - VIDEO GENERATION
- Purpose: Create initial talking head animation
- Skills: #168
- Input: MP3 URL, PNG URL, transcription
- Output: MP4 video

SILO 3 - QUALITY VERIFICATION
- Purpose: Verify sync and quality
- Skills: #207, #202
- Input: Generated MP4
- Output: Verification data for quality check

This workflow ensures proper synchronization between the audio and visual elements while maintaining quality control through verification steps. The combination of timing extraction and talking head generation creates a natural-looking result with accurate lip movements.

Note: This subagent heavily relies on skill #168 which appears to have the core lip-sync capability built in. If more granular control is needed, additional skills for frame-by-frame manipulation might need to be added to the available skill set.