Menu
NEW AGENT
MY AGENTS
ASSISTANTS
Step 1:
Topical Avatar Video Creator
1️⃣
Perfect output
- scan ALL
2️⃣ Add
output numbers
, then...
3️⃣ Add
Subagent Numbers
(work backwards
from output number!
)
4️⃣ Add
ACTUAL Skills
to subagent
✅ DONE..Copy x4 to Step 3...
SETTINGS
LOGOUT
What Shall We Build Next?
1
Describe
Describe your task
2
Refine
Refine the plan
3
SubAgents
Review all agents
4
Deploy
Deploy your agent
Sub Agent 1
Sub Agent 2
Sub Agent 3
Sub Agent 4
Sub Agent 5
Sub Agent 6
Sub Agent 7
Sub Agent 8
Let me analyze the ScriptMaster subagent and break it down according to your guidelines. A) SUBAGENT SUMMARY: ScriptMaster generates a concise, well-structured voice-over script (100-300 words) from a user's topic description, optimized for talking head video presentation. B) FINAL TASK OUTPUT: A text file containing a properly formatted voice-over script, structured with clear paragraphs, natural pauses, and appropriate pacing for video presentation (approximately 1-3 minutes of speaking time when read aloud at natural pace). C) SUBAGENT INPUT: - Primary user topic/description - Optional style preferences (tone, length, format) - Optional target audience information E) SUBAGENT TASK SUMMARY: The workflow will use three key skills in sequence: 1. #216 (Research Topic Deeply) INPUT: User's topic description OUTPUT: 1000-3000 character research summary 2. #223 (Powerful LLM Prompt-to-Text Response) INPUT: Research summary + specific instructions for script formatting OUTPUT: Initial script draft with structure 3. #171 (Write Voice Over Script Based On Instructions) INPUT: Initial script draft + voice-over specific formatting requirements OUTPUT: Final voice-over script Flow: user_input > #216 > #223 > #171 > final_script_output F) SILOS: SILO 1 - RESEARCH PHASE - Skill: #216 Research Topic Deeply - Purpose: Gather comprehensive background information - Input: User topic - Output: Research summary SILO 2 - INITIAL SCRIPT STRUCTURING - Skill: #223 Powerful LLM Prompt-to-Text Response - Purpose: Convert research into initial script structure - Input: Research summary - Output: Initial script draft SILO 3 - VOICE-OVER OPTIMIZATION - Skill: #171 Write Voice Over Script Based On Instructions - Purpose: Optimize for voice-over presentation - Input: Initial script draft - Output: Final voice-over script This workflow ensures we first gather comprehensive information (#216), then structure it appropriately (#223), and finally optimize it specifically for voice-over presentation (#171). Each silo builds upon the previous one, creating a progressively refined script that's perfectly suited for talking head video presentation.
SubAgent #1 - Diagram
Expand Diagram
I'll break down the VoiceForge subagent according to the requested format: A) SUBAGENT SUMMARY: VoiceForge converts a text script into a high-quality voice-over MP3 file, optimizing the audio for use in a talking head video. B) FINAL TASK OUTPUT: A single MP3 file URL containing the voice-over audio, optimized for talking head synchronization, with clear pronunciation and natural pacing (typically 120-150 words per minute). C) SUBAGENT INPUT: - Primary Input: Text script (100-300 words) from ScriptMaster subagent - Optional Input: Voice style preferences (if provided in original user prompt) E) SUBAGENT TASK SUMMARY: The workflow follows this sequence: Input (text script) > #170 (Turn Script Into Voice Over MP3) > #198 (Get Transcription Of MP3 With Timings) [to verify quality] > #179 (Create Visual Waveform Of 60 second Wav/mp3 File) [to verify audio levels] > #176 (Analyze An Image With GPT Vision & Return Text) [to analyze waveform] > If waveform analysis shows issues: Repeat #170 with adjusted parameters Final Output: MP3 File URL F) SILOS: SILO 1: VOICE GENERATION - Input: Text script - Skill: #170 (Turn Script Into Voice Over MP3) - Output: Initial MP3 URL SILO 2: QUALITY VERIFICATION - Input: MP3 from Silo 1 - Skills: * #198 (Get Transcription Of MP3 With Timings) * #179 (Create Visual Waveform) * #176 (Analyze waveform image) - Output: Quality verification report SILO 3: QUALITY ADJUSTMENT (if needed) - Input: Quality report from Silo 2 - Skill: #170 (with adjusted parameters if initial quality check fails) - Output: Final MP3 URL This structured approach ensures we're not just generating voice-over audio, but also verifying its quality and making adjustments if needed, which is crucial for a talking head video where audio quality directly impacts lip-sync accuracy.
SubAgent #2 - Diagram
Expand Diagram
Let me break down the AvatarVision subagent following your guidelines: A) SUBAGENT SUMMARY: AvatarVision generates a high-quality, themed AI avatar image that matches the video topic and maintains consistent visual style suitable for talking head animation. B) FINAL TASK OUTPUT: A square (1024x1024) transparent PNG file of a professional-looking avatar head/shoulders shot with clean edges, suitable background removal, and appropriate styling/theming for the video topic. C) SUBAGENT INPUT: - User's topic/theme description - Style preferences for avatar (if any) - Professional context/setting requirements - Any specific visual elements needed for topic relevance E) SUBAGENT TASK SUMMARY: The flow should be: Input > #223 (Powerful LLM to craft detailed image prompt) > #222 (Make initial themed avatar) > #176 (Analyze image for quality check) > #221 (Recreate/refine if needed) > #191 (Resize if required) > Final Output Specifically: 1. Use #223 to convert user requirements into detailed image generation prompt 2. Use #222 to generate initial avatar (optimized for text/details) 3. Use #176 to analyze image quality/suitability 4. If needed, use #221 to recreate/refine based on analysis 5. Use #191 to ensure final 1024x1024 size 6. Return final PNG URL F) SILOS: SILO 1: PROMPT ENGINEERING - Input: Raw user requirements - Skill #223: Convert to detailed image prompt - Output: Optimized image generation prompt SILO 2: IMAGE GENERATION - Input: Optimized prompt - Skill #222: Generate initial avatar - Output: Initial PNG SILO 3: QUALITY CONTROL - Input: Initial PNG - Skill #176: Analyze image - Skill #221: Recreate if needed (based on analysis) - Output: Refined PNG SILO 4: FINAL FORMATTING - Input: Refined PNG - Skill #191: Resize to exact specifications - Output: Final 1024x1024 transparent PNG This approach ensures high-quality, consistent avatar generation with appropriate quality checks and refinement steps built in.
SubAgent #3 - Diagram
Expand Flow
Let me break down the LipSyncWizard subagent in detail: A) SUBAGENT SUMMARY: LipSyncWizard analyzes an audio file to generate precise phoneme timing data and corresponding viseme mappings for accurate lip synchronization of an AI avatar. B) FINAL TASK OUTPUT: A structured data file containing: - Precise timing markers (in milliseconds) - Corresponding phoneme identifications - Mapped viseme positions for the avatar's mouth/face - Optional head movement timing data C) SUBAGENT INPUT: - MP3 voice-over file URL - Transcription with timing data - Avatar reference image URL (to understand mouth/face structure) E) SUBAGENT TASK SUMMARY: The flow would be: 1. Convert MP3 to WAV for precise audio analysis (#178) 2. Generate detailed transcription with timing data (#198) 3. Create visual waveform for amplitude analysis (#179) 4. Extract beat points for natural head movement (#180) 5. Analyze waveform image with GPT Vision (#176) to identify key audio segments 6. Use LLM (#223) to convert audio analysis into structured viseme data Specific flow: Input MP3 > #178 Convert to WAV > #198 Get detailed transcription > #179 Generate waveform > #180 Extract beatpoints > #176 Analyze waveform > #223 Generate final structured data > Output JSON F) SILOS: Silo 1: Audio Processing - Convert MP3 to WAV (#178) - Generate transcription (#198) - Create waveform (#179) - Extract beat points (#180) Silo 2: Visual Analysis - Analyze waveform with Vision (#176) - Analyze avatar image reference (#176) Silo 3: Data Structuring - Process all data through LLM (#223) to create final structured output - Format timing data, phonemes, visemes, and movement cues into JSON Note: This subagent appears to need some additional custom skills for optimal performance, particularly in generating precise viseme mappings from phoneme data. The current workflow uses available skills to approximate this functionality, but a dedicated phoneme-to-viseme mapping skill would improve accuracy.
4 Template & Links
Expand Flow
Let me break down the VideoAssemblerPro subagent following the requested format: A) SUBAGENT SUMMARY: VideoAssemblerPro combines an AI-generated avatar image, voice-over audio, and lip-sync data to create a synchronized talking head video with natural mouth movements and facial expressions. B) FINAL TASK OUTPUT: MP4 video file (1920x1080 resolution, 16:9 aspect ratio) featuring the AI avatar with synchronized lip movements matching the voice-over audio, with duration matching the input audio file length. C) SUBAGENT INPUT: - PNG URL of the AI-generated avatar image - MP3 URL of the voice-over audio - Text transcription with timing data (for lip sync) E) SUBAGENT TASK SUMMARY: The workflow requires the following sequence: 1. First process the voice-over audio: Input MP3 > #198 (Get Transcription Of MP3 With Timings) > Transcription with precise timing data 2. Generate the base talking head: MP3 + Transcription > #168 (Generate Talking Head Video From MP3 & transcription) > Initial MP4 3. Enhance with visual elements: Avatar PNG + Initial MP4 > #199 (Add Images & Videos On Top Of Existing MP4) > Final MP4 F) SILOS: SILO 1: AUDIO PROCESSING - Purpose: Extract precise timing data for lip sync - Input: Voice-over MP3 - Skill: #198 - Output: Transcription with timings SILO 2: BASE VIDEO GENERATION - Purpose: Create initial talking head animation - Input: MP3 + Transcription - Skill: #168 - Output: Base MP4 video SILO 3: VISUAL ENHANCEMENT - Purpose: Overlay custom avatar and finalize - Input: Avatar PNG + Base MP4 - Skill: #199 - Output: Final MP4 video Note: This workflow utilizes existing skills to approximate lip-sync functionality through the combination of transcription timing data and the talking head generation capability. While not as sophisticated as a dedicated lip-sync engine, this approach should produce acceptable results for most use cases.
5 Template & Links
Expand Flow
I notice that you've asked me to analyze "subagent 6" but in the provided list of subagents, there were only 5 subagents (ScriptMaster, VoiceForge, AvatarVision, LipSyncWizard, and VideoAssemblerPro). There is no subagent 6 defined in the original workflow. Would you like me to: 1. Analyze one of the existing 5 subagents in detail? 2. Propose a new subagent 6 that might enhance the workflow? 3. Clarify which specific subagent you'd like me to focus on? Please let me know which option you prefer, and I'll provide a detailed analysis following the format you've outlined.
6 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
7 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
8 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
9 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
10 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
11 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
12 Template & Links
Expand Flow
Need To Start Afresh?
BACK TO REFINE
Tweaked & Good To Go?
PROCEED TO DEPLOY