What Shall We Build Next?

Describe

Describe your task

Refine

Refine the plan

SubAgents

Review all agents

Deploy

Deploy your agent

Let me analyze the ScriptMaster subagent and break it down according to your guidelines.

A) SUBAGENT SUMMARY: 
ScriptMaster generates a concise, well-structured voice-over script (100-300 words) from a user's topic description, optimized for talking head video presentation.

B) FINAL TASK OUTPUT: 
A text file containing a properly formatted voice-over script, structured with clear paragraphs, natural pauses, and appropriate pacing for video presentation (approximately 1-3 minutes of speaking time when read aloud at natural pace).

C) SUBAGENT INPUT:
- Primary user topic/description
- Optional style preferences (tone, length, format)
- Optional target audience information

E) SUBAGENT TASK SUMMARY:
The workflow will use three key skills in sequence:

1. #216 (Research Topic Deeply) 
INPUT: User's topic description
OUTPUT: 1000-3000 character research summary

2. #223 (Powerful LLM Prompt-to-Text Response)
INPUT: Research summary + specific instructions for script formatting
OUTPUT: Initial script draft with structure

3. #171 (Write Voice Over Script Based On Instructions)
INPUT: Initial script draft + voice-over specific formatting requirements
OUTPUT: Final voice-over script

Flow: user_input > #216 > #223 > #171 > final_script_output

F) SILOS:
SILO 1 - RESEARCH PHASE
- Skill: #216 Research Topic Deeply
- Purpose: Gather comprehensive background information
- Input: User topic
- Output: Research summary

SILO 2 - INITIAL SCRIPT STRUCTURING
- Skill: #223 Powerful LLM Prompt-to-Text Response
- Purpose: Convert research into initial script structure
- Input: Research summary
- Output: Initial script draft

SILO 3 - VOICE-OVER OPTIMIZATION
- Skill: #171 Write Voice Over Script Based On Instructions
- Purpose: Optimize for voice-over presentation
- Input: Initial script draft
- Output: Final voice-over script

This workflow ensures we first gather comprehensive information (#216), then structure it appropriately (#223), and finally optimize it specifically for voice-over presentation (#171). Each silo builds upon the previous one, creating a progressively refined script that's perfectly suited for talking head video presentation.

SubAgent #1 - Diagram

Expand Diagram

I'll break down the VoiceForge subagent according to the requested format:

A) SUBAGENT SUMMARY: 
VoiceForge converts a text script into a high-quality voice-over MP3 file, optimizing the audio for use in a talking head video.

B) FINAL TASK OUTPUT: 
A single MP3 file URL containing the voice-over audio, optimized for talking head synchronization, with clear pronunciation and natural pacing (typically 120-150 words per minute).

C) SUBAGENT INPUT:
- Primary Input: Text script (100-300 words) from ScriptMaster subagent
- Optional Input: Voice style preferences (if provided in original user prompt)

E) SUBAGENT TASK SUMMARY:
The workflow follows this sequence:

Input (text script) > 
#170 (Turn Script Into Voice Over MP3) >
#198 (Get Transcription Of MP3 With Timings) [to verify quality] >
#179 (Create Visual Waveform Of 60 second Wav/mp3 File) [to verify audio levels] >
#176 (Analyze An Image With GPT Vision & Return Text) [to analyze waveform] >
If waveform analysis shows issues: Repeat #170 with adjusted parameters
Final Output: MP3 File URL

F) SILOS:
SILO 1: VOICE GENERATION
- Input: Text script
- Skill: #170 (Turn Script Into Voice Over MP3)
- Output: Initial MP3 URL

SILO 2: QUALITY VERIFICATION
- Input: MP3 from Silo 1
- Skills: 
  * #198 (Get Transcription Of MP3 With Timings)
  * #179 (Create Visual Waveform)
  * #176 (Analyze waveform image)
- Output: Quality verification report

SILO 3: QUALITY ADJUSTMENT (if needed)
- Input: Quality report from Silo 2
- Skill: #170 (with adjusted parameters if initial quality check fails)
- Output: Final MP3 URL

This structured approach ensures we're not just generating voice-over audio, but also verifying its quality and making adjustments if needed, which is crucial for a talking head video where audio quality directly impacts lip-sync accuracy.

SubAgent #2 - Diagram

Expand Diagram

Let me break down the AvatarVision subagent following your guidelines:

A) SUBAGENT SUMMARY: 
AvatarVision generates a high-quality, themed AI avatar image that matches the video topic and maintains consistent visual style suitable for talking head animation.

B) FINAL TASK OUTPUT: 
A square (1024x1024) transparent PNG file of a professional-looking avatar head/shoulders shot with clean edges, suitable background removal, and appropriate styling/theming for the video topic.

C) SUBAGENT INPUT:
- User's topic/theme description
- Style preferences for avatar (if any)
- Professional context/setting requirements
- Any specific visual elements needed for topic relevance

E) SUBAGENT TASK SUMMARY:
The flow should be:

Input > #223 (Powerful LLM to craft detailed image prompt) > #222 (Make initial themed avatar) > #176 (Analyze image for quality check) > #221 (Recreate/refine if needed) > #191 (Resize if required) > Final Output

Specifically:
1. Use #223 to convert user requirements into detailed image generation prompt
2. Use #222 to generate initial avatar (optimized for text/details)
3. Use #176 to analyze image quality/suitability
4. If needed, use #221 to recreate/refine based on analysis
5. Use #191 to ensure final 1024x1024 size
6. Return final PNG URL

F) SILOS:
SILO 1: PROMPT ENGINEERING
- Input: Raw user requirements
- Skill #223: Convert to detailed image prompt
- Output: Optimized image generation prompt

SILO 2: IMAGE GENERATION
- Input: Optimized prompt
- Skill #222: Generate initial avatar
- Output: Initial PNG

SILO 3: QUALITY CONTROL
- Input: Initial PNG
- Skill #176: Analyze image
- Skill #221: Recreate if needed (based on analysis)
- Output: Refined PNG

SILO 4: FINAL FORMATTING
- Input: Refined PNG
- Skill #191: Resize to exact specifications
- Output: Final 1024x1024 transparent PNG

This approach ensures high-quality, consistent avatar generation with appropriate quality checks and refinement steps built in.

SubAgent #3 - Diagram

Expand Flow

Let me break down the LipSyncWizard subagent in detail:

A) SUBAGENT SUMMARY: 
LipSyncWizard analyzes an audio file to generate precise phoneme timing data and corresponding viseme mappings for accurate lip synchronization of an AI avatar.

B) FINAL TASK OUTPUT:
A structured data file containing:
- Precise timing markers (in milliseconds)
- Corresponding phoneme identifications
- Mapped viseme positions for the avatar's mouth/face
- Optional head movement timing data

C) SUBAGENT INPUT:
- MP3 voice-over file URL
- Transcription with timing data
- Avatar reference image URL (to understand mouth/face structure)

E) SUBAGENT TASK SUMMARY:
The flow would be:
1. Convert MP3 to WAV for precise audio analysis (#178)
2. Generate detailed transcription with timing data (#198)
3. Create visual waveform for amplitude analysis (#179)
4. Extract beat points for natural head movement (#180)
5. Analyze waveform image with GPT Vision (#176) to identify key audio segments
6. Use LLM (#223) to convert audio analysis into structured viseme data

Specific flow:
Input MP3 > #178 Convert to WAV > #198 Get detailed transcription > #179 Generate waveform > #180 Extract beatpoints > #176 Analyze waveform > #223 Generate final structured data > Output JSON

F) SILOS:
Silo 1: Audio Processing
- Convert MP3 to WAV (#178)
- Generate transcription (#198)
- Create waveform (#179)
- Extract beat points (#180)

Silo 2: Visual Analysis
- Analyze waveform with Vision (#176)
- Analyze avatar image reference (#176)

Silo 3: Data Structuring
- Process all data through LLM (#223) to create final structured output
- Format timing data, phonemes, visemes, and movement cues into JSON

Note: This subagent appears to need some additional custom skills for optimal performance, particularly in generating precise viseme mappings from phoneme data. The current workflow uses available skills to approximate this functionality, but a dedicated phoneme-to-viseme mapping skill would improve accuracy.

4 Template & Links

Expand Flow

Let me break down the VideoAssemblerPro subagent following the requested format:

A) SUBAGENT SUMMARY:
VideoAssemblerPro combines an AI-generated avatar image, voice-over audio, and lip-sync data to create a synchronized talking head video with natural mouth movements and facial expressions.

B) FINAL TASK OUTPUT:
MP4 video file (1920x1080 resolution, 16:9 aspect ratio) featuring the AI avatar with synchronized lip movements matching the voice-over audio, with duration matching the input audio file length.

C) SUBAGENT INPUT:
- PNG URL of the AI-generated avatar image
- MP3 URL of the voice-over audio
- Text transcription with timing data (for lip sync)

E) SUBAGENT TASK SUMMARY:
The workflow requires the following sequence:

1. First process the voice-over audio:
Input MP3 > #198 (Get Transcription Of MP3 With Timings) > Transcription with precise timing data

2. Generate the base talking head:
MP3 + Transcription > #168 (Generate Talking Head Video From MP3 & transcription) > Initial MP4

3. Enhance with visual elements:
Avatar PNG + Initial MP4 > #199 (Add Images & Videos On Top Of Existing MP4) > Final MP4

F) SILOS:
SILO 1: AUDIO PROCESSING
- Purpose: Extract precise timing data for lip sync
- Input: Voice-over MP3
- Skill: #198
- Output: Transcription with timings

SILO 2: BASE VIDEO GENERATION
- Purpose: Create initial talking head animation
- Input: MP3 + Transcription
- Skill: #168
- Output: Base MP4 video

SILO 3: VISUAL ENHANCEMENT
- Purpose: Overlay custom avatar and finalize
- Input: Avatar PNG + Base MP4
- Skill: #199
- Output: Final MP4 video

Note: This workflow utilizes existing skills to approximate lip-sync functionality through the combination of transcription timing data and the talking head generation capability. While not as sophisticated as a dedicated lip-sync engine, this approach should produce acceptable results for most use cases.