What Shall We Build Next?

Describe

Describe your task

Refine

Refine the plan

SubAgents

Review all agents

Deploy

Deploy your agent

Let me break down the ScriptMaster subagent in detail:

A) SUBAGENT SUMMARY:
ScriptMaster takes a user's topic description and generates an optimized, well-structured voice-over script specifically designed for talking head videos, incorporating proper pacing, natural speech patterns, and clear narrative flow.

B) FINAL TASK OUTPUT:
A text file containing a 100-300 word voice-over script, formatted with proper punctuation and timing markers, optimized for natural speech delivery and lip-sync animation.

C) SUBAGENT INPUT:
- Primary user topic/description
- Optional style preferences (tone, length, target audience)
- Optional technical requirements (exact word count, specific terminology)

E) SUBAGENT TASK SUMMARY:
The workflow follows this sequence:

1. Research Phase:
#216 (Research Topic Deeply) > outputs comprehensive research data
↓
2. Keyword Enhancement:
#218 (Brainstorm Related Keywords) > outputs relevant terminology/concepts
↓
3. Initial Script Generation:
#171 (Write Voice Over Script Based On Instructions) > outputs draft script
↓
4. Script Refinement:
#190 (Write or rewrite text based on instructions) > outputs polished script

This creates a research-backed, keyword-enhanced, properly formatted script that's optimized for voice-over delivery.

F) SILOS:
The subagent operates in three distinct silos:

SILO 1: RESEARCH & PREPARATION
• Input: User topic description
• Skill: #216 (Research Topic Deeply)
• Skill: #218 (Brainstorm Related Keywords)
• Output: Research data + enhanced keywords

SILO 2: INITIAL SCRIPT CREATION
• Input: Research data + keywords
• Skill: #171 (Write Voice Over Script Based On Instructions)
• Output: Draft script

SILO 3: SCRIPT REFINEMENT
• Input: Draft script
• Skill: #190 (Write or rewrite text based on instructions)
• Output: Final polished script

Each silo's output feeds directly into the next silo's input, creating a smooth workflow that ensures the final script is well-researched, properly structured, and optimized for voice-over delivery.

The final output [avatar-voice-script] will be ready to feed directly into the next subagent (VoiceForge) in the main workflow.

SubAgent #1 - Diagram

Expand Diagram

Let me break down the VoiceForge subagent in detail:

A) SUBAGENT SUMMARY: 
VoiceForge converts a text script into a high-quality voice-over MP3 file, optimizing the audio for use in a talking head video.

B) FINAL TASK OUTPUT: 
A single MP3 file URL containing the voice-over audio, typically 1-5 minutes in length, with clear voice articulation suitable for lip-syncing.

C) SUBAGENT INPUT:
- Primary Input: Text script (100-300 words)
- Optional Input: Voice style preferences (if any)

E) SUBAGENT TASK SUMMARY:
The workflow should follow these steps:

1. Script Length Check:
Using #190 (Write or rewrite text based on instructions) to verify and potentially adjust script length if needed.

2. Voice Generation:
Using #170 (Turn Script Into Voice Over MP3) to create the initial voice-over.

3. Audio Quality Check:
Using #179 (Create Visual Waveform Of 60 second Wav/mp3 File) to analyze the audio quality.
Using #176 (Analyze An Image With GPT Vision & Return Text) to review the waveform and ensure proper audio levels.

4. Audio Processing:
Using #178 (Convert 1-20 MP3s to wav) to convert for analysis
Using #219 (Cut Wav/mp3 Audio into Multiple Pieces/Samples) if any sections need adjustment
Using #170 again if any sections need to be re-recorded

Full chain:
Input Script > #190 > #170 > #179 > #176 > #178 > #219 (if needed) > #170 (if needed) > Final MP3 Output

F) SILOS:
SILO 1: SCRIPT PREPARATION
- Input: Raw script
- Skill: #190
- Output: Verified/adjusted script

SILO 2: VOICE GENERATION
- Input: Verified script
- Skill: #170
- Output: Initial MP3

SILO 3: QUALITY CONTROL
- Input: Initial MP3
- Skills: #179 > #176 > #178
- Output: Quality analysis and WAV format

SILO 4: REFINEMENT (if needed)
- Input: WAV file
- Skills: #219 > #170
- Output: Final MP3

The subagent includes quality control measures and refinement options, ensuring the final audio output is optimal for lip-syncing in the talking head video.

SubAgent #2 - Diagram

Expand Diagram

Here's my complete analysis and workflow for SUBAGENT 3: "AvatarVision":

A) SUBAGENT SUMMARY:
A specialized image generation system that creates a high-quality, themed AI avatar image appropriate for talking head videos, ensuring the avatar matches the video's topic and maintains professional presentation standards.

B) FINAL TASK OUTPUT:
A single 1024x1024 transparent PNG file of a professional-looking avatar head/shoulders portrait with clean edges, suitable for animation, featuring clear facial features and appropriate styling/theming for the video topic.

C) SUBAGENT INPUT:
1. Topic/theme of the video (from main agent)
2. Style preferences for avatar (if specified by user)
3. Professional context/setting requirements
4. Specific features or characteristics requested

E) SUBAGENT TASK SUMMARY:
The workflow chains together in this sequence:

Input > #223 (Powerful LLM Prompt-to-Text) to generate optimal image generation prompt > #222 (Make Image With Text) to create initial avatar > #176 (Analyze Image With GPT Vision) to verify quality > #221 (Recreate New Image) if needed > #191 (Resize Image) to ensure 1024x1024 > Final Output

F) SILOS:
SILO 1: PROMPT ENGINEERING
- Skill #223 (Powerful LLM)
Purpose: Generate highly specific image prompt that will ensure:
- Professional head/shoulders framing
- Clear facial features
- Appropriate themed elements
- Suitable for animation
Output: Optimized image generation prompt

SILO 2: IMAGE GENERATION & VERIFICATION
- Skill #222 (Make Image With Text)
Purpose: Generate initial avatar image
Output: PNG of avatar

- Skill #176 (Analyze Image With GPT Vision)
Purpose: Verify image meets all requirements:
- Clear facial features
- Appropriate theming
- Professional appearance
- Suitable for animation
Output: Analysis text

SILO 3: REFINEMENT (If Needed)
- Skill #221 (Recreate New Image)
Purpose: Improve image based on analysis
Output: Refined PNG

SILO 4: FINAL FORMATTING
- Skill #191 (Resize Image)
Purpose: Ensure exact 1024x1024 dimensions
Output: Final formatted PNG

This workflow ensures quality control through verification and potential refinement, while maintaining strict technical specifications needed for the final talking head video generation.

SubAgent #3 - Diagram

Expand Flow

Let me break down the LipSyncWizard subagent following your guidelines:

A) SUBAGENT SUMMARY:
LipSyncWizard processes an audio file to generate precise phoneme timing data and maps it to corresponding viseme (mouth shape) animations, creating a synchronized data structure for avatar lip movements.

B) FINAL TASK OUTPUT:
A JSON data structure containing:
- Timestamp markers (in milliseconds)
- Corresponding phoneme identifiers
- Mapped viseme positions (mouth shapes)
- Basic head movement instructions
- Duration markers for each phoneme/viseme pair

C) SUBAGENT INPUT:
- MP3 voice-over file URL
- Transcription with word timing data
- List of supported visemes for the avatar system

D) SUBAGENT TASK SUMMARY:
1. Convert MP3 to WAV for precise audio analysis (#178 - Convert MP3 to WAV)
2. Generate detailed transcription with timing (#198 - Get Transcription of MP3 With Timings)
3. Create visual waveform for amplitude analysis (#179 - Create Visual Waveform)
4. Analyze waveform image with GPT Vision (#176 - Analyze Image With GPT Vision)
5. Extract key audio characteristics (#180 - Extract Beatpoints & Tempo)
6. Use LLM to process all data (#223 - Powerful LLM Prompt-to-Text Response)

E) SILOS:
SILO 1: AUDIO PREPARATION
- Input: MP3 URL
- Skill #178: Convert MP3 to WAV
- Output: WAV file URL

SILO 2: SPEECH ANALYSIS
- Input: WAV file URL
- Skill #198: Generate detailed transcription
- Skill #179: Create waveform
- Output: Transcription with timings + Waveform image

SILO 3: WAVEFORM ANALYSIS
- Input: Waveform image
- Skill #176: Analyze waveform with GPT Vision
- Skill #180: Extract beatpoints
- Output: Amplitude data + rhythm markers

SILO 4: DATA SYNTHESIS
- Input: All previous outputs
- Skill #223: Use LLM to synthesize data into final JSON
- Output: Final lip-sync data structure

This workflow creates a comprehensive lip-sync data package by analyzing both the audio characteristics and speech patterns, then combining them into a structured format that can be used by the video generation system.

The final output JSON will contain precise timing data that maps each phoneme to its corresponding viseme, allowing for natural-looking lip movements when animated.

4 Template & Links

Expand Flow

Here's my complete analysis and workflow for Subagent 5 (VideoAssemblerPro):

A) SUBAGENT SUMMARY: 
VideoAssemblerPro combines an AI-generated avatar image, voice-over audio, and lip-sync timing data to create a synchronized talking head video where the avatar's mouth and facial movements match the audio speech patterns.

B) FINAL TASK OUTPUT:
MP4 video file (16:9 aspect ratio) featuring the AI avatar speaking with synchronized lip movements and natural head movements, with clear audio and smooth animation, duration matching the input audio file length.

C) SUBAGENT INPUT:
- PNG URL of the AI-generated avatar image
- MP3 URL of the voice-over audio
- Transcription with precise timing data (for lip-sync coordination)

E) SUBAGENT TASK SUMMARY:
1. First extract audio info and create transcription:
Input MP3 > #198 (Get Transcription Of MP3 With Timings) > Transcription with timing data

2. Generate talking head base video:
(Avatar PNG + MP3 + Transcription) > #168 (Generate Talking Head Video From MP3 & transcription) > Base MP4

3. Analyze and enhance the video:
Base MP4 > #202 (Extract Thumbnail Images) to verify quality > Final MP4

F) SILOS:

SILO 1: AUDIO PREPARATION
- Input: MP3 URL
- Skill: #198 Get Transcription Of MP3 (With Timings)
- Output: Transcription with precise timing data

SILO 2: VIDEO GENERATION
- Input: Avatar PNG + MP3 + Transcription
- Skill: #168 Generate Talking Head Video
- Output: Base MP4 video

SILO 3: QUALITY VERIFICATION
- Input: Base MP4
- Skill: #202 Extract Thumbnail Images
- Purpose: Verify quality of lip-sync and movement
- Output: Final MP4 video

Note: This workflow utilizes existing skills optimally, particularly leveraging #168 which appears to have built-in lip-sync capabilities. The transcription data from #198 helps ensure precise timing for mouth movements. The final verification step helps ensure quality of the output.