Menu
NEW AGENT
MY AGENTS
ASSISTANTS
Step 1:
Topical Avatar Video Creator
1️⃣
Perfect output
- scan ALL
2️⃣ Add
output numbers
, then...
3️⃣ Add
Subagent Numbers
(work backwards
from output number!
)
4️⃣ Add
ACTUAL Skills
to subagent
✅ DONE..Copy x4 to Step 3...
SETTINGS
LOGOUT
What Shall We Build Next?
1
Describe
Describe your task
2
Refine
Refine the plan
3
SubAgents
Review all agents
4
Deploy
Deploy your agent
Sub Agent 1
Sub Agent 2
Sub Agent 3
Sub Agent 4
Sub Agent 5
Sub Agent 6
Sub Agent 7
Sub Agent 8
I'll analyze SubAgent 1 "ScriptMaster" in detail: A) SUBAGENT SUMMARY: ScriptMaster generates a well-structured, conversational voice-over script (100-300 words) by researching the topic thoroughly and crafting content specifically optimized for spoken delivery. B) FINAL TASK OUTPUT: A single text file containing a polished, ready-to-record voice-over script of 100-300 words, formatted with proper punctuation and natural speech patterns, saved as plain text. C) SUBAGENT INPUT: - User's topic/description for the video - Optional style preferences (tone, length, target audience) E) SUBAGENT TASK SUMMARY: The subagent will follow this sequence: 1. Research Phase: user_input > #216 (Research Topic Deeply) > research_data 2. Keyword Enhancement: research_data > #218 (Brainstorm Related Keywords) > enhanced_keywords 3. Script Generation: [research_data + enhanced_keywords] > #171 (Write Voice Over Script Based On Instructions) > draft_script 4. Script Refinement: draft_script > #190 (Write or rewrite text based on instructions) with specific prompt for voice-over optimization > final_script F) SILOS: SILO 1: RESEARCH & PREPARATION - Input: User's topic/description - Skill #216: Research topic deeply - Skill #218: Brainstorm related keywords - Output: Research data + enhanced keywords SILO 2: SCRIPT CREATION - Input: Research data + keywords - Skill #171: Generate initial voice-over script - Output: Draft script SILO 3: OPTIMIZATION - Input: Draft script - Skill #190: Refine for voice-over delivery - Output: Final polished script This flow ensures thorough research, proper keyword integration, and voice-over-specific optimization, resulting in a script that's both informative and natural-sounding when spoken. The multiple refinement steps help create content that's specifically crafted for audio delivery rather than just written text.
SubAgent #1 - Diagram
Expand Diagram
Let me break down the VoiceForge subagent in detail: A) SUBAGENT SUMMARY: VoiceForge converts a text script into a high-quality voice-over MP3 file, with proper pacing and natural speech patterns suitable for an avatar talking head video. B) FINAL TASK OUTPUT: A single MP3 file (URL) containing the voice-over narration of the script, with duration typically between 30 seconds to 5 minutes (based on the 100-300 word input script limitation), saved in standard web audio format (44.1kHz, 128-320kbps). C) SUBAGENT INPUT: - Primary Input: Text script (100-300 words) from ScriptMaster subagent - Optional Input: Voice style preferences or specifications (if provided in original user prompt) E) SUBAGENT TASK SUMMARY: The workflow follows this sequence: 1. Input text script > #190 (Write or rewrite text based on instructions) to optimize the script for voice-over delivery (adding pauses, adjusting punctuation) > optimized script 2. Optimized script > #170 (Turn Script Into Voice Over MP3) > initial MP3 URL 3. Initial MP3 URL > #198 (Get Transcription Of MP3 With Timings) > transcription with timing verification 4. Transcription > #223 (Powerful LLM Prompt-to-Text Response) to verify timing and pacing matches requirements > verification report 5. If verification passes > final MP3 URL is outputted If verification fails > return to step 1 with adjusted parameters F) SILOS: SILO 1: SCRIPT OPTIMIZATION - Input: Raw script - Process: Skill #190 - Output: Voice-optimized script SILO 2: AUDIO GENERATION - Input: Optimized script - Process: Skill #170 - Output: MP3 file SILO 3: QUALITY VERIFICATION - Input: Generated MP3 - Process: Skills #198 and #223 - Output: Verification and timing check The subagent includes verification steps to ensure the voice-over meets quality standards before passing to the next subagent in the main workflow. This is crucial since the audio forms the foundation for the lip-sync animation in later steps.
SubAgent #2 - Diagram
Expand Diagram
Here's my complete analysis and workflow for the AvatarVision subagent: A) SUBAGENT SUMMARY: A specialized image generation subagent that creates a high-quality, contextually appropriate AI avatar image that matches the topic and style of the video content, with particular attention to facial features and head/shoulder framing suitable for a talking head video. B) FINAL TASK OUTPUT: A single 1024x1024 transparent PNG file of a professional-looking AI avatar with clear facial features, proper head/shoulders framing, and thematic styling relevant to the video content. C) SUBAGENT INPUT: 1. Topic/theme of the video content 2. Style preferences for the avatar (if any) 3. Any specific requirements about profession/expertise the avatar should convey D) SUBAGENT TASK SUMMARY: The workflow follows these steps: 1. Generate optimal prompt using LLM Input: Raw topic/theme/style preferences Skill: #223 (Powerful LLM Prompt-to-Text Response) Output: Optimized image generation prompt 2. Create initial avatar image Input: Optimized prompt from step 1 Skill: #182 (Create Dalle Image (Transparent Square)) Output: Initial 1024x1024 transparent PNG 3. Analyze generated image Input: PNG from step 2 Skill: #176 (Analyze An Image With GPT Vision & Return Text) Output: Analysis text of image quality/suitability 4. If needed, regenerate with refined prompt Input: Analysis feedback Skill: #223 (Powerful LLM Prompt-to-Text Response) Output: Refined prompt Then repeat skill #182 with refined prompt E) SILOS: SILO 1: PROMPT ENGINEERING - Input: Raw requirements - Skill #223: Generate optimal prompt - Output: Refined prompt SILO 2: IMAGE GENERATION - Input: Optimized prompt - Skill #182: Generate transparent avatar - Output: PNG file SILO 3: QUALITY CONTROL - Input: Generated PNG - Skill #176: Analyze image - Output: Quality assessment - If needed, return to SILO 1 for refinement The workflow includes a feedback loop where if the generated image doesn't meet quality standards (assessed by #176), it can return to the prompt engineering stage for refinement. This ensures the final avatar is optimally suited for talking head video use. Key considerations built into this workflow: - Ensures transparent background for video compatibility - Focuses on head/shoulders framing - Maintains professional appearance while incorporating thematic elements - Provides quality control check - Allows for refinement if needed
SubAgent #3 - Diagram
Expand Flow
Let me break down the LipSyncWizard subagent in detail: A) SUBAGENT SUMMARY: LipSyncWizard analyzes an audio file to generate precise phoneme timing data and maps it to corresponding viseme (mouth shape) positions, creating a structured animation data file that synchronizes speech with avatar mouth movements. B) FINAL TASK OUTPUT: A JSON data structure containing: - Timestamp markers (in milliseconds) - Corresponding phoneme identifiers - Mapped viseme positions (mouth shapes) - Optional head movement coordinates - Audio amplitude data for expression intensity C) SUBAGENT INPUT: - MP3 voice-over file URL - Transcription of the audio (with timing markers) E) SUBAGENT TASK SUMMARY: 1. Convert MP3 to WAV for precise audio analysis > #178 (Convert MP3 to WAV) 2. Generate visual waveform for amplitude analysis > #179 (Create Visual Waveform) > #176 (Analyze Image with GPT Vision) to extract amplitude patterns 3. Get precise transcription with timestamps > #198 (Get Transcription with Timings) 4. Extract audio characteristics: > #180 (Extract Beatpoints & Tempo) for natural head movement timing > #188 (Extract audio stems) to isolate clear voice track F) SILOS: SILO 1: AUDIO PREPROCESSING - Input: MP3 URL - Skill #178: Convert to WAV - Skill #188: Extract vocal stem Output: Clean WAV file for analysis SILO 2: AMPLITUDE MAPPING - Input: WAV from Silo 1 - Skill #179: Generate waveform - Skill #176: Analyze waveform Output: Amplitude data points SILO 3: SPEECH TIMING - Input: Clean vocal WAV - Skill #198: Get precise transcription - Skill #180: Extract rhythm points Output: Timestamped phoneme data SILO 4: DATA COMPILATION - Input: All silo outputs - Skill #223: Use LLM to structure final JSON Output: Final lip-sync data file Note: While this workflow creates the best possible preparation for lip-sync data, it's worth noting that the actual final step of converting this data into viseme mappings would require a custom skill. The current skill set can prepare all the necessary timing and audio analysis, but the final conversion to mouth position data would need additional capability.
4 Template & Links
Expand Flow
Let me break down the VideoAssemblerPro subagent according to your format: A) SUBAGENT SUMMARY: VideoAssemblerPro combines an AI-generated avatar image with voice-over audio and lip-sync data to create a synchronized talking head video where the avatar's mouth and facial movements match the spoken audio. B) FINAL TASK OUTPUT: An MP4 video file (16:9 aspect ratio) featuring the AI avatar speaking with synchronized lip movements, approximately matching the duration of the input audio file (typically 1-5 minutes), with the avatar centered in frame against a clean background. C) SUBAGENT INPUT: - PNG URL of the AI-generated avatar image - MP3 URL of the voice-over audio - Text transcription with timing data (for lip-sync mapping) E) SUBAGENT TASK SUMMARY: 1. First Silo - Audio Processing: - Input MP3 > #198 (Get Transcription of MP3 With Timings) > Detailed phoneme timing data - Input MP3 > #196 (Extract MP3 Audio From MP4 File) > Clean audio file for sync 2. Second Silo - Image Processing: - Input Avatar PNG > #191 (Resize Image) [to 1024x1024] > Properly sized avatar - Resized Avatar > #176 (Analyze Image With GPT Vision) > Facial feature mapping data 3. Third Silo - Video Generation: - Input all processed components > #168 (Generate Talking Head Video From MP3 & transcription) > Raw MP4 - Raw MP4 > #194 (Cut Small Section From MP4 Video) > Clean final MP4 F) SILOS: SILO 1: AUDIO PREPARATION Purpose: Process audio for precise timing Input: Original MP3 Output: Clean audio file + timing data SILO 2: IMAGE PREPARATION Purpose: Optimize avatar image for animation Input: Original PNG Output: Processed avatar image with mapping SILO 3: VIDEO ASSEMBLY Purpose: Generate and clean final video Input: All processed components Output: Final talking head MP4 Note: While this workflow uses existing skills where possible, it's worth noting that the actual lip-sync generation might require additional specialized skills not currently in the list. The current workflow uses skill #168 as the primary video generator, but this might need enhancement for more sophisticated lip-sync capabilities.
5 Template & Links
Expand Flow
I notice that you've asked me to analyze "subagent 6" but in the original list of subagents, there were only 5 subagents defined (ScriptMaster, VoiceForge, AvatarVision, LipSyncWizard, and VideoAssemblerPro). There is no "subagent 6" defined in the original workflow. Would you like me to: 1. Analyze one of the existing 5 subagents in detail? 2. Propose a new 6th subagent that might enhance the workflow? 3. Or would you like to specify a different subagent to analyze? Please let me know which option you'd prefer, and I'll provide a detailed analysis using the format you've outlined.
6 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
7 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
8 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
9 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
10 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
11 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
12 Template & Links
Expand Flow
Need To Start Afresh?
BACK TO REFINE
Tweaked & Good To Go?
PROCEED TO DEPLOY