Menu
NEW AGENT
MY AGENTS
ASSISTANTS
Step 1:
Topical Avatar Video Creator
1️⃣
Perfect output
- scan ALL
2️⃣ Add
output numbers
, then...
3️⃣ Add
Subagent Numbers
(work backwards
from output number!
)
4️⃣ Add
ACTUAL Skills
to subagent
✅ DONE..Copy x4 to Step 3...
SETTINGS
LOGOUT
What Shall We Build Next?
1
Describe
Describe your task
2
Refine
Refine the plan
3
SubAgents
Review all agents
4
Deploy
Deploy your agent
Sub Agent 1
Sub Agent 2
Sub Agent 3
Sub Agent 4
Sub Agent 5
Sub Agent 6
Sub Agent 7
Sub Agent 8
I'll analyze ScriptMaster and break it down according to your requirements: A) SUBAGENT SUMMARY: ScriptMaster generates an optimized, engaging voice-over script from a user's topic description, incorporating research and best practices for spoken-word content. B) FINAL TASK OUTPUT: A text file containing a 100-300 word voice-over script, formatted specifically for audio narration, with natural speech patterns, appropriate pacing breaks, and clear pronunciation guides where needed. C) SUBAGENT INPUT: - Primary user topic/description - Optional style preferences (tone, length, target audience) - Optional technical requirements (max duration, specific terminology) E) SUBAGENT TASK SUMMARY: The optimal flow would be: user input > #216 Research Topic Deeply > #223 Powerful LLM Prompt-to-Text Response > #171 Write Voice Over Script Based On Instructions > final script output This chain ensures: 1. Deep research gathers comprehensive topic information 2. LLM processes research into initial script structure 3. Specialized voice-over writing skill finalizes the script F) SILOS: SILO 1: RESEARCH & PREPARATION - Input: User topic description - Skill: #216 Research Topic Deeply - Output: Comprehensive research summary SILO 2: INITIAL SCRIPT STRUCTURING - Input: Research summary - Skill: #223 Powerful LLM Prompt-to-Text Response - Output: Initial script draft with structure and key points SILO 3: VOICE-OVER OPTIMIZATION - Input: Initial script draft - Skill: #171 Write Voice Over Script Based On Instructions - Output: Final voice-over ready script This arrangement ensures each stage builds upon the previous one, with clear handoffs between silos. The research phase ensures accuracy, the LLM structures the content appropriately, and the final voice-over optimization makes it suitable for spoken delivery. The final output from Silo 3 will be ready to pass to the next subagent in the larger workflow (VoiceForge).
SubAgent #1 - Diagram
Expand Diagram
I'll analyze SUBAGENT 2: "VoiceForge" and break it down according to your requirements. A) SUBAGENT SUMMARY: VoiceForge converts a text script into a high-quality voice-over MP3 file, ensuring proper audio formatting and quality for subsequent lip-sync processing. B) FINAL TASK OUTPUT: An MP3 file URL containing clear voice-over audio, with duration between 30 seconds to 5 minutes (based on 100-300 word script), sampled at 44.1kHz, stereo format, with consistent volume levels suitable for lip-sync processing. C) SUBAGENT INPUT: - Primary input: Text script (100-300 words) from ScriptMaster - Optional input: Voice style preferences (if any specified in original user request) E) SUBAGENT TASK SUMMARY: The workflow needs multiple steps to ensure optimal audio quality: 1. Initial Voice Generation: input: text script > #170 (Turn Script Into Voice Over MP3) > MP3 URL 2. Audio Quality Check & Processing: MP3 URL > #178 (Convert MP3 to WAV) > WAV URL > #179 (Create Visual Waveform) > Waveform JPEG URL > #176 (Analyze Image With GPT Vision) > Audio quality analysis text 3. Final Audio Preparation: If quality check passes > original MP3 becomes final output If quality check fails > repeat #170 with adjusted parameters F) SILOS: SILO 1: VOICE GENERATION - Purpose: Create initial voice-over - Skill: #170 - Input: Text script - Output: MP3 URL SILO 2: QUALITY VERIFICATION - Purpose: Verify audio quality - Skills: #178 > #179 > #176 - Input: Initial MP3 - Output: Quality analysis text SILO 3: FINALIZATION - Purpose: Ensure final output meets requirements - Input: Quality analysis + MP3 - Output: Final verified MP3 URL This structure ensures we not only generate the voice-over but also verify its quality for optimal lip-sync processing in later stages. The quality check step is crucial as poor audio quality could negatively impact the final talking head animation.
SubAgent #2 - Diagram
Expand Diagram
Let me break down the AvatarVision subagent following the requested format: A) SUBAGENT SUMMARY: AvatarVision generates a high-quality, thematically appropriate AI avatar image that will serve as the base for the talking head video, ensuring the avatar matches the content topic and maintains professional visual quality. B) FINAL TASK OUTPUT: A single 1024x1024 transparent PNG file of a professional-looking AI avatar head/shoulders shot, with clear facial features suitable for lip-sync animation, saved with transparent background. C) SUBAGENT INPUT: - User's topic/theme description - Style preferences for avatar (if any) - Content context from ScriptMaster's script (to ensure avatar matches content theme) E) SUBAGENT TASK SUMMARY: 1. Input > Skill #223 (Powerful LLM) to convert user requirements into specific image generation prompts 2. Generated prompt > Skill #182 (Create Dalle Image Transparent Square) to create initial avatar 3. Initial avatar > Skill #176 (Analyze Image with GPT Vision) to verify avatar quality/suitability 4. If needed based on analysis > Skill #221 (Recreate New Image From Image URL) to refine/improve 5. Final avatar image > Skill #191 (Resize Image) to ensure exact 1024x1024 dimensions Output: Final transparent PNG avatar image URL F) SILOS: SILO 1 - PROMPT ENGINEERING • Input: User requirements + script context • Skill #223: Generate optimal image prompt • Output: Refined image generation prompt SILO 2 - IMAGE GENERATION • Input: Refined prompt from Silo 1 • Skill #182: Generate transparent avatar • Output: Initial avatar PNG SILO 3 - QUALITY CONTROL • Input: Initial avatar from Silo 2 • Skill #176: Analyze avatar quality • If needed, Skill #221: Recreate/refine avatar • Output: Quality-verified avatar SILO 4 - FORMAT OPTIMIZATION • Input: Quality-verified avatar • Skill #191: Ensure correct dimensions • Output: Final 1024x1024 transparent PNG avatar This workflow ensures we get a high-quality, appropriate avatar that's technically suitable for the subsequent video generation steps while maintaining strict quality control at each stage.
SubAgent #3 - Diagram
Expand Flow
Let me break down the LipSyncWizard subagent in detail: A) SUBAGENT SUMMARY: LipSyncWizard analyzes an audio file to create precise phoneme timing data and maps it to corresponding viseme (mouth shape) positions, generating animation data that will drive the avatar's lip movements and facial expressions. B) FINAL TASK OUTPUT: A JSON file containing timestamped phoneme-to-viseme mapping data, including: - Timestamp markers (in milliseconds) - Corresponding phoneme identifiers - Mouth shape/viseme positions (x,y coordinates) - Optional head movement/facial expression markers - Audio amplitude data for emphasis C) SUBAGENT INPUT: - MP3 voice-over file URL - Transcription with word timing data - Basic configuration parameters for animation style E) SUBAGENT TASK SUMMARY: 1. Convert MP3 to WAV for analysis > Skill #178 (Convert MP3 to WAV) 2. Get precise audio transcription with timing > Skill #198 (Get Transcription with Timings) 3. Generate audio waveform for amplitude analysis > Skill #179 (Create Visual Waveform) > Skill #176 (Analyze Image with GPT Vision) to extract amplitude data 4. Extract beat/tempo data for natural head movements > Skill #180 (Extract Beatpoints & Tempo) 5. Use LLM to convert transcription+timing into phoneme sequence > Skill #223 (Powerful LLM) to map words to phoneme sequences 6. Generate final JSON mapping using collected data > Skill #223 (Powerful LLM) to compile all data into structured format F) SILOS: SILO 1: AUDIO PREPARATION - Input: MP3 URL - Skill #178 > WAV file - Skill #198 > Timestamped transcription Output: WAV file + transcription SILO 2: MOVEMENT ANALYSIS - Input: WAV file - Skill #179 > Waveform image - Skill #176 > Amplitude data - Skill #180 > Beat/tempo data Output: Movement timing data SILO 3: PHONEME MAPPING - Input: Transcription - Skill #223 > Phoneme sequence Output: Timestamped phonemes SILO 4: FINAL COMPILATION - Input: All previous silo outputs - Skill #223 > Final JSON compilation Output: Complete lip-sync animation data file Each silo operates somewhat independently but feeds into the final compilation stage, where everything is merged into the final animation data format required by the video renderer.
4 Template & Links
Expand Flow
Let me break down the VideoAssemblerPro subagent following your guidelines: A) SUBAGENT SUMMARY: VideoAssemblerPro combines an AI-generated avatar image, voice-over audio, and lip-sync data to create a synchronized talking head video where the avatar's mouth and facial movements match the audio precisely. B) FINAL TASK OUTPUT: MP4 video file (1080x1080 square format) featuring the AI avatar with synchronized lip movements matching the voice-over audio, duration matching the input audio file length (typically 30-180 seconds), with high-quality 48kHz audio. C) SUBAGENT INPUT: - PNG URL of the AI-generated avatar image (transparent background) - MP3 URL of the voice-over audio - Text transcription with precise timing data for lip-sync - (Optional) Animation parameters for head movements E) SUBAGENT TASK SUMMARY: 1. Input avatar PNG > #191 Resize Image (to ensure 1080x1080) > Resized avatar 2. Input MP3 > #198 Get Transcription of MP3 (With Timings) > Detailed phoneme timing data 3. Input MP3 + transcription + avatar > #168 Generate Talking Head Video From MP3 & transcription > Initial MP4 4. Initial MP4 > #194 Cut Small Section From MP4 Video (trim any excess) > Final MP4 F) SILOS: SILO 1: AVATAR PREPARATION - Input: Original avatar PNG - Skill: #191 (Resize Image) - Output: Properly sized avatar PNG (1080x1080) SILO 2: AUDIO ANALYSIS - Input: Voice-over MP3 - Skill: #198 (Get Transcription of MP3) - Output: Precise timing data for lip-sync SILO 3: VIDEO GENERATION - Input: * Sized avatar from Silo 1 * MP3 and timing data from Silo 2 - Skill: #168 (Generate Talking Head Video) - Output: Raw talking head MP4 SILO 4: VIDEO FINALIZATION - Input: Raw MP4 from Silo 3 - Skill: #194 (Cut Small Section) - Output: Final trimmed MP4 with precise timing Note: This workflow assumes skill #168 (Generate Talking Head Video) has built-in capability to handle lip-sync animation. If not, additional skills for lip-sync generation would need to be added, but currently there isn't a specific skill in the list for this functionality.
5 Template & Links
Expand Flow
I notice that you've asked me to analyze "subagent 6" but no sixth subagent was defined in the original workflow. The original workflow only contained 5 subagents: 1. ScriptMaster 2. VoiceForge 3. AvatarVision 4. LipSyncWizard 5. VideoAssemblerPro Would you like me to: 1. Analyze one of these existing subagents in detail 2. Propose a new sixth subagent that might enhance the workflow 3. Clarify which specific subagent you'd like me to focus on Please let me know which option you'd prefer, and I'll provide a detailed analysis following the format you've specified.
6 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
7 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
8 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
9 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
10 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
11 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
12 Template & Links
Expand Flow
Need To Start Afresh?
BACK TO REFINE
Tweaked & Good To Go?
PROCEED TO DEPLOY