Menu
NEW AGENT
MY AGENTS
ASSISTANTS
Step 1:
Topical Avatar Video Creator
1️⃣
Perfect output
- scan ALL
2️⃣ Add
output numbers
, then...
3️⃣ Add
Subagent Numbers
(work backwards
from output number!
)
4️⃣ Add
ACTUAL Skills
to subagent
✅ DONE..Copy x4 to Step 3...
SETTINGS
LOGOUT
What Shall We Build Next?
1
Describe
Describe your task
2
Refine
Refine the plan
3
SubAgents
Review all agents
4
Deploy
Deploy your agent
Sub Agent 1
Sub Agent 2
Sub Agent 3
Sub Agent 4
Sub Agent 5
Sub Agent 6
Sub Agent 7
Sub Agent 8
Let me break down the ScriptMaster subagent in detail: A) SUBAGENT SUMMARY: ScriptMaster takes a user's topic description and generates an optimized, well-structured voice-over script specifically designed for talking head videos, incorporating proper pacing, natural speech patterns, and clear narrative flow. B) FINAL TASK OUTPUT: A text file containing a 100-300 word voice-over script, formatted with proper punctuation and timing markers, optimized for natural speech delivery and lip-sync animation. C) SUBAGENT INPUT: - Primary user topic/description - Optional style preferences (tone, length, target audience) - Optional technical requirements (exact word count, specific terminology) E) SUBAGENT TASK SUMMARY: The workflow follows this sequence: 1. Research Phase: #216 (Research Topic Deeply) > outputs comprehensive research data ↓ 2. Keyword Enhancement: #218 (Brainstorm Related Keywords) > outputs relevant terminology/concepts ↓ 3. Initial Script Generation: #171 (Write Voice Over Script Based On Instructions) > outputs draft script ↓ 4. Script Refinement: #190 (Write or rewrite text based on instructions) > outputs polished script This creates a research-backed, keyword-enhanced, properly formatted script that's optimized for voice-over delivery. F) SILOS: The subagent operates in three distinct silos: SILO 1: RESEARCH & PREPARATION • Input: User topic description • Skill: #216 (Research Topic Deeply) • Skill: #218 (Brainstorm Related Keywords) • Output: Research data + enhanced keywords SILO 2: INITIAL SCRIPT CREATION • Input: Research data + keywords • Skill: #171 (Write Voice Over Script Based On Instructions) • Output: Draft script SILO 3: SCRIPT REFINEMENT • Input: Draft script • Skill: #190 (Write or rewrite text based on instructions) • Output: Final polished script Each silo's output feeds directly into the next silo's input, creating a smooth workflow that ensures the final script is well-researched, properly structured, and optimized for voice-over delivery. The final output [avatar-voice-script] will be ready to feed directly into the next subagent (VoiceForge) in the main workflow.
SubAgent #1 - Diagram
Expand Diagram
Let me break down the VoiceForge subagent in detail: A) SUBAGENT SUMMARY: VoiceForge converts a text script into a high-quality voice-over MP3 file, optimizing the audio for use in a talking head video. B) FINAL TASK OUTPUT: A single MP3 file URL containing the voice-over audio, typically 1-5 minutes in length, with clear voice articulation suitable for lip-syncing. C) SUBAGENT INPUT: - Primary Input: Text script (100-300 words) - Optional Input: Voice style preferences (if any) E) SUBAGENT TASK SUMMARY: The workflow should follow these steps: 1. Script Length Check: Using #190 (Write or rewrite text based on instructions) to verify and potentially adjust script length if needed. 2. Voice Generation: Using #170 (Turn Script Into Voice Over MP3) to create the initial voice-over. 3. Audio Quality Check: Using #179 (Create Visual Waveform Of 60 second Wav/mp3 File) to analyze the audio quality. Using #176 (Analyze An Image With GPT Vision & Return Text) to review the waveform and ensure proper audio levels. 4. Audio Processing: Using #178 (Convert 1-20 MP3s to wav) to convert for analysis Using #219 (Cut Wav/mp3 Audio into Multiple Pieces/Samples) if any sections need adjustment Using #170 again if any sections need to be re-recorded Full chain: Input Script > #190 > #170 > #179 > #176 > #178 > #219 (if needed) > #170 (if needed) > Final MP3 Output F) SILOS: SILO 1: SCRIPT PREPARATION - Input: Raw script - Skill: #190 - Output: Verified/adjusted script SILO 2: VOICE GENERATION - Input: Verified script - Skill: #170 - Output: Initial MP3 SILO 3: QUALITY CONTROL - Input: Initial MP3 - Skills: #179 > #176 > #178 - Output: Quality analysis and WAV format SILO 4: REFINEMENT (if needed) - Input: WAV file - Skills: #219 > #170 - Output: Final MP3 The subagent includes quality control measures and refinement options, ensuring the final audio output is optimal for lip-syncing in the talking head video.
SubAgent #2 - Diagram
Expand Diagram
Here's my complete analysis and workflow for SUBAGENT 3: "AvatarVision": A) SUBAGENT SUMMARY: A specialized image generation system that creates a high-quality, themed AI avatar image appropriate for talking head videos, ensuring the avatar matches the video's topic and maintains professional presentation standards. B) FINAL TASK OUTPUT: A single 1024x1024 transparent PNG file of a professional-looking avatar head/shoulders portrait with clean edges, suitable for animation, featuring clear facial features and appropriate styling/theming for the video topic. C) SUBAGENT INPUT: 1. Topic/theme of the video (from main agent) 2. Style preferences for avatar (if specified by user) 3. Professional context/setting requirements 4. Specific features or characteristics requested E) SUBAGENT TASK SUMMARY: The workflow chains together in this sequence: Input > #223 (Powerful LLM Prompt-to-Text) to generate optimal image generation prompt > #222 (Make Image With Text) to create initial avatar > #176 (Analyze Image With GPT Vision) to verify quality > #221 (Recreate New Image) if needed > #191 (Resize Image) to ensure 1024x1024 > Final Output F) SILOS: SILO 1: PROMPT ENGINEERING - Skill #223 (Powerful LLM) Purpose: Generate highly specific image prompt that will ensure: - Professional head/shoulders framing - Clear facial features - Appropriate themed elements - Suitable for animation Output: Optimized image generation prompt SILO 2: IMAGE GENERATION & VERIFICATION - Skill #222 (Make Image With Text) Purpose: Generate initial avatar image Output: PNG of avatar - Skill #176 (Analyze Image With GPT Vision) Purpose: Verify image meets all requirements: - Clear facial features - Appropriate theming - Professional appearance - Suitable for animation Output: Analysis text SILO 3: REFINEMENT (If Needed) - Skill #221 (Recreate New Image) Purpose: Improve image based on analysis Output: Refined PNG SILO 4: FINAL FORMATTING - Skill #191 (Resize Image) Purpose: Ensure exact 1024x1024 dimensions Output: Final formatted PNG This workflow ensures quality control through verification and potential refinement, while maintaining strict technical specifications needed for the final talking head video generation.
SubAgent #3 - Diagram
Expand Flow
Let me break down the LipSyncWizard subagent following your guidelines: A) SUBAGENT SUMMARY: LipSyncWizard processes an audio file to generate precise phoneme timing data and maps it to corresponding viseme (mouth shape) animations, creating a synchronized data structure for avatar lip movements. B) FINAL TASK OUTPUT: A JSON data structure containing: - Timestamp markers (in milliseconds) - Corresponding phoneme identifiers - Mapped viseme positions (mouth shapes) - Basic head movement instructions - Duration markers for each phoneme/viseme pair C) SUBAGENT INPUT: - MP3 voice-over file URL - Transcription with word timing data - List of supported visemes for the avatar system D) SUBAGENT TASK SUMMARY: 1. Convert MP3 to WAV for precise audio analysis (#178 - Convert MP3 to WAV) 2. Generate detailed transcription with timing (#198 - Get Transcription of MP3 With Timings) 3. Create visual waveform for amplitude analysis (#179 - Create Visual Waveform) 4. Analyze waveform image with GPT Vision (#176 - Analyze Image With GPT Vision) 5. Extract key audio characteristics (#180 - Extract Beatpoints & Tempo) 6. Use LLM to process all data (#223 - Powerful LLM Prompt-to-Text Response) E) SILOS: SILO 1: AUDIO PREPARATION - Input: MP3 URL - Skill #178: Convert MP3 to WAV - Output: WAV file URL SILO 2: SPEECH ANALYSIS - Input: WAV file URL - Skill #198: Generate detailed transcription - Skill #179: Create waveform - Output: Transcription with timings + Waveform image SILO 3: WAVEFORM ANALYSIS - Input: Waveform image - Skill #176: Analyze waveform with GPT Vision - Skill #180: Extract beatpoints - Output: Amplitude data + rhythm markers SILO 4: DATA SYNTHESIS - Input: All previous outputs - Skill #223: Use LLM to synthesize data into final JSON - Output: Final lip-sync data structure This workflow creates a comprehensive lip-sync data package by analyzing both the audio characteristics and speech patterns, then combining them into a structured format that can be used by the video generation system. The final output JSON will contain precise timing data that maps each phoneme to its corresponding viseme, allowing for natural-looking lip movements when animated.
4 Template & Links
Expand Flow
Here's my complete analysis and workflow for Subagent 5 (VideoAssemblerPro): A) SUBAGENT SUMMARY: VideoAssemblerPro combines an AI-generated avatar image, voice-over audio, and lip-sync timing data to create a synchronized talking head video where the avatar's mouth and facial movements match the audio speech patterns. B) FINAL TASK OUTPUT: MP4 video file (16:9 aspect ratio) featuring the AI avatar speaking with synchronized lip movements and natural head movements, with clear audio and smooth animation, duration matching the input audio file length. C) SUBAGENT INPUT: - PNG URL of the AI-generated avatar image - MP3 URL of the voice-over audio - Transcription with precise timing data (for lip-sync coordination) E) SUBAGENT TASK SUMMARY: 1. First extract audio info and create transcription: Input MP3 > #198 (Get Transcription Of MP3 With Timings) > Transcription with timing data 2. Generate talking head base video: (Avatar PNG + MP3 + Transcription) > #168 (Generate Talking Head Video From MP3 & transcription) > Base MP4 3. Analyze and enhance the video: Base MP4 > #202 (Extract Thumbnail Images) to verify quality > Final MP4 F) SILOS: SILO 1: AUDIO PREPARATION - Input: MP3 URL - Skill: #198 Get Transcription Of MP3 (With Timings) - Output: Transcription with precise timing data SILO 2: VIDEO GENERATION - Input: Avatar PNG + MP3 + Transcription - Skill: #168 Generate Talking Head Video - Output: Base MP4 video SILO 3: QUALITY VERIFICATION - Input: Base MP4 - Skill: #202 Extract Thumbnail Images - Purpose: Verify quality of lip-sync and movement - Output: Final MP4 video Note: This workflow utilizes existing skills optimally, particularly leveraging #168 which appears to have built-in lip-sync capabilities. The transcription data from #198 helps ensure precise timing for mouth movements. The final verification step helps ensure quality of the output.
5 Template & Links
Expand Flow
I notice that you've asked me to analyze "subagent 6" but in the original list of subagents, there were only 5 subagents defined (ScriptMaster, VoiceForge, AvatarVision, LipSyncWizard, and VideoAssemblerPro). There is no sixth subagent to analyze. Would you like me to: 1. Analyze one of the existing five subagents (if so, which one?) 2. Help design a new sixth subagent for a specific purpose? 3. Review the entire workflow to see if we're missing any critical components that might require a sixth subagent? Please let me know which direction you'd like to pursue and I'll provide a detailed analysis following the format you've outlined.
6 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
7 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
8 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
9 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
10 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
11 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
12 Template & Links
Expand Flow
Need To Start Afresh?
BACK TO REFINE
Tweaked & Good To Go?
PROCEED TO DEPLOY