Menu
NEW AGENT
MY AGENTS
ASSISTANTS
Step 1:
Topical Avatar Video Creator
1️⃣
Perfect output
- scan ALL
2️⃣ Add
output numbers
, then...
3️⃣ Add
Subagent Numbers
(work backwards
from output number!
)
4️⃣ Add
ACTUAL Skills
to subagent
✅ DONE..Copy x4 to Step 3...
SETTINGS
LOGOUT
What Shall We Build Next?
1
Describe
Describe your task
2
Refine
Refine the plan
3
SubAgents
Review all agents
4
Deploy
Deploy your agent
Sub Agent 1
Sub Agent 2
Sub Agent 3
Sub Agent 4
Sub Agent 5
Sub Agent 6
Sub Agent 7
Sub Agent 8
Let me break down the ScriptMaster subagent flow according to the guidelines: A) SUBAGENT SUMMARY: ScriptMaster takes a user's topic/description and generates an optimized, well-structured voice-over script specifically designed for talking head avatar videos. B) FINAL TASK OUTPUT: A text file containing a 100-300 word script that is: - Formatted for voice-over delivery (including pauses, emphasis) - Structured with clear introduction, body, and conclusion - Written in a conversational tone suitable for avatar delivery - Includes timing estimates for each section - Contains pronunciation guides for complex terms C) SUBAGENT INPUT: 1. User's topic/description 2. Optional style preferences (tone, formality level) 3. Target duration (if specified) 4. Any specific keywords or points to include D) SUBAGENT TASK SUMMARY: Input > #216 Research Topic Deeply > #223 Powerful LLM Prompt-to-Text Response > #171 Write Voice Over Script Based On Instructions > Output Specifically: 1. #216 will gather comprehensive topic research 2. #223 will structure the research into a preliminary script outline 3. #171 will convert the outline into the final voice-over script E) SILOS: SILO 1: RESEARCH PHASE - Input: User's topic/description - Skill: #216 Research Topic Deeply - Output: Research summary (1000-3000 characters) SILO 2: SCRIPT STRUCTURING - Input: Research summary from Silo 1 - Skill: #223 Powerful LLM Prompt-to-Text Response - Output: Structured script outline with sections marked SILO 3: FINAL SCRIPT GENERATION - Input: Structured outline from Silo 2 - Skill: #171 Write Voice Over Script Based On Instructions - Output: Final voice-over script file The silos are designed to ensure: 1. Comprehensive research forms the foundation 2. Proper structure is established 3. Final script is optimized for voice-over delivery Each silo's output feeds directly into the next silo's input, creating a smooth workflow that builds towards the final optimized script.
SubAgent #1 - Diagram
Expand Diagram
Let me break down the VoiceForge subagent according to the specified format: A) SUBAGENT SUMMARY: VoiceForge converts a text script into a high-quality voice-over MP3 file, with appropriate pacing and natural speech patterns suitable for a talking head video. B) FINAL TASK OUTPUT: A single MP3 file URL containing the voice-over narration, typically 1-5 minutes in length, with clear audio quality suitable for lip-syncing (44.1kHz sample rate, 16-bit depth, mono channel). C) SUBAGENT INPUT: - Primary Input: Text script (100-300 words) from ScriptMaster - Optional Input: Voice style preferences (if any specified in original user prompt) E) SUBAGENT TASK SUMMARY: The flow works like this: 1. Input text script > #170 (Turn Script Into Voice Over MP3) > First pass MP3 2. First pass MP3 > #198 (Get Transcription Of MP3 With Timings) > Timing verification 3. Timing data > #223 (Powerful LLM Prompt-to-Text Response) > Analysis of pacing/quality 4. If timing analysis shows issues > Return to step 1 with adjusted script 5. Final output: Verified MP3 URL F) SILOS: Silo 1: VOICE GENERATION - Input: Raw script text - Skill: #170 Turn Script Into Voice Over MP3 - Output: Initial MP3 URL Silo 2: QUALITY VERIFICATION - Input: Initial MP3 URL - Skill: #198 Get Transcription Of MP3 (With Timings) - Skill: #223 Powerful LLM Prompt-to-Text Response - Output: Quality analysis with timing verification Silo 3: FINAL DELIVERY - Input: Verified MP3 or request for regeneration - Action: Either approve MP3 for final output or loop back to Silo 1 - Output: Final approved MP3 URL This workflow ensures we don't just generate a voice-over, but verify its quality and timing suitability for lip-sync purposes, which is crucial for the final talking head video generation.
SubAgent #2 - Diagram
Expand Diagram
Let me break down the AvatarVision subagent in detail: A) SUBAGENT SUMMARY: AvatarVision generates a high-quality, themed AI avatar image that matches the video's topic and style, ensuring the avatar is front-facing with clear facial features suitable for lip-sync animation. B) FINAL TASK OUTPUT: A single 1024x1024 transparent PNG file of a front-facing avatar with clear facial features, particularly well-defined mouth area, saved with a transparent background to allow for animation flexibility. C) SUBAGENT INPUT: 1. User's topic/theme description 2. Style preferences for avatar (professional, casual, specific profession, etc.) 3. Any specific facial feature requirements D) SUBAGENT TASK SUMMARY: Input > #223 (Generate optimal avatar prompt) > #182 (Create initial avatar) > #176 (Analyze avatar for facial features) > #221 (Recreate if needed) > #191 (Resize if needed) > Final PNG Output Detailed Flow: 1. Use #223 (Powerful LLM) to convert user requirements into an optimized image generation prompt that emphasizes: - Front-facing position - Clear facial features - Well-defined mouth area - Theme-appropriate styling 2. Use #182 (Create Dalle Image) to generate the initial transparent avatar 3. Use #176 (Analyze Image) to verify: - Face is properly front-facing - Mouth area is well-defined - Features are clear enough for animation 4. IF analysis shows issues: Use #221 (Recreate Image) with refined prompt based on analysis 5. IF size adjustment needed: Use #191 (Resize Image) to ensure 1024x1024 dimensions E) SILOS: SILO 1: PROMPT ENGINEERING - Input: User requirements - Skill: #223 Powerful LLM - Output: Optimized avatar generation prompt SILO 2: IMAGE GENERATION - Input: Optimized prompt - Skill: #182 Create Dalle Image - Output: Initial avatar PNG SILO 3: QUALITY CONTROL - Input: Initial avatar PNG - Skill: #176 Analyze Image - Output: Analysis report - Conditional: If issues found, trigger SILO 4 SILO 4: REFINEMENT (if needed) - Input: Analysis report + original prompt - Skill: #221 Recreate Image - Output: Refined avatar PNG SILO 5: FORMATTING (if needed) - Input: Final/Refined avatar PNG - Skill: #191 Resize Image - Output: Properly sized avatar PNG Each silo is designed to ensure the final avatar meets the specific requirements needed for a talking head video, with particular emphasis on facial features that will work well with lip-sync animation.
SubAgent #3 - Diagram
Expand Flow
Let me break down the LipSyncWizard subagent specifically: A) SUBAGENT SUMMARY: LipSyncWizard processes an audio file to generate precise phoneme timing data and corresponding viseme mappings, creating a structured animation data file that determines when and how the avatar's mouth and facial features should move in sync with the speech. B) FINAL TASK OUTPUT: A JSON data file containing timestamped phoneme-to-viseme mappings, including: - Precise timing markers (in milliseconds) - Corresponding mouth shapes (visemes) - Optional head movement markers - Basic facial expression triggers C) SUBAGENT INPUT: - MP3 voice-over file URL - Transcription with word-level timing - Avatar image reference (for understanding mouth position boundaries) D) SUBAGENT TASK SUMMARY: 1. Extract detailed audio analysis: - Input MP3 > #198 Get Transcription Of MP3 (With Timings) > Detailed transcript - Input MP3 > #178 Convert 1-20 MP3s to wav > WAV file - WAV file > #179 Create Visual Waveform Of 60 second Wav/mp3 File > Waveform data 2. Generate phoneme timing data: - Input WAV > #219 Cut Wav/mp3 Audio into Multiple Pieces/Samples > Individual phoneme segments - Waveform + segments > #176 Analyze An Image With GPT Vision & Return Text > Amplitude analysis 3. Create animation mapping: - Transcription + timing data > #223 Powerful LLM Prompt-to-Text Response > Initial viseme mapping - Avatar image > #176 Analyze An Image With GPT Vision & Return Text > Mouth position constraints - Combined data > #185 Write Text (Or Copy) From Inputted Text > Final JSON animation data E) SILOS: SILO 1: AUDIO ANALYSIS • Input: MP3 file • Skills: #198, #178, #179 • Output: WAV file, transcription, waveform SILO 2: PHONEME EXTRACTION • Input: WAV file, waveform • Skills: #219, #176 • Output: Phoneme timing data SILO 3: ANIMATION MAPPING • Input: Timing data, avatar image • Skills: #223, #176, #185 • Output: Final JSON animation data This workflow creates a precise phoneme-to-viseme mapping that can be used by the video generation component to create naturally synchronized lip movements.
4 Template & Links
Expand Flow
Let me break down Subagent 5 (VideoAssemblerPro) according to the guidelines: A) SUBAGENT SUMMARY: VideoAssemblerPro combines an AI avatar image, voice-over audio, and lip-sync data to create a synchronized talking head video where the avatar's mouth and facial movements match the audio. B) FINAL TASK OUTPUT: MP4 video file (1920x1080), duration matching input audio length, featuring the AI avatar with synchronized lip movements and natural head movements, with clear audio quality at 48kHz. C) SUBAGENT INPUT: - PNG file URL of the AI-generated avatar image - MP3 file URL of the voice-over audio - JSON/text file containing lip-sync timing data - Original script text (for backup reference) E) SUBAGENT TASK SUMMARY: This subagent requires multiple parallel processes that then converge: 1. Audio Processing Silo: Input MP3 > #196 (Extract audio) > #198 (Get transcription with timings) > timing data 2. Avatar Processing Silo: Input Avatar PNG > #191 (Resize image to 1920x1080) > processed avatar 3. Final Assembly: Processed avatar + MP3 + timing data > #168 (Generate Talking Head Video) > Final MP4 F) SILOS: SILO 1: AUDIO PROCESSING Purpose: Prepare audio and extract precise timing data Input: MP3 URL Steps: 1. #196 - Extract MP3 Audio From MP4 File (to ensure clean audio) 2. #198 - Get Transcription Of MP3 (With Timings) Output: Clean MP3 + timing data SILO 2: AVATAR PREPARATION Purpose: Ensure avatar is properly sized/formatted Input: PNG URL Steps: 1. #191 - Resize Image (to 1920x1080) Output: Properly formatted avatar PNG SILO 3: FINAL ASSEMBLY Purpose: Combine all elements into final video Input: Processed avatar PNG + Clean MP3 + Timing data Steps: 1. #168 - Generate Talking Head Video From MP3 & transcription Output: Final MP4 video file The key insight here is that while the original subagent description suggested we needed new skills for lip-sync, we can actually leverage existing Skill #168 which has this capability built in, when provided with proper timing data from Skill #198. This makes the workflow much more efficient than originally conceived.
5 Template & Links
Expand Flow
I notice that you've asked me to analyze "subagent 6" but in the original workflow there were only 5 subagents listed (ScriptMaster, VoiceForge, AvatarVision, LipSyncWizard, and VideoAssemblerPro). There is no subagent 6 defined in the original workflow. Would you like me to: 1. Analyze one of the existing 5 subagents in detail 2. Propose a new subagent 6 that might enhance the workflow 3. Or would you like to specify a different subagent to analyze? Please let me know which option you'd prefer and I'll provide a detailed analysis using the format you've outlined.
6 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
7 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
8 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
9 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
10 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
11 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
12 Template & Links
Expand Flow
Need To Start Afresh?
BACK TO REFINE
Tweaked & Good To Go?
PROCEED TO DEPLOY