Menu
NEW AGENT
MY AGENTS
ASSISTANTS
Step 1:
Topical Avatar Video Creator
1️⃣
Perfect output
- scan ALL
2️⃣ Add
output numbers
, then...
3️⃣ Add
Subagent Numbers
(work backwards
from output number!
)
4️⃣ Add
ACTUAL Skills
to subagent
✅ DONE..Copy x4 to Step 3...
SETTINGS
LOGOUT
What Shall We Build Next?
1
Describe
Describe your task
2
Refine
Refine the plan
3
SubAgents
Review all agents
4
Deploy
Deploy your agent
Sub Agent 1
Sub Agent 2
Sub Agent 3
Sub Agent 4
Sub Agent 5
Sub Agent 6
Sub Agent 7
Sub Agent 8
Let me break down the ScriptMaster subagent in detail: A) SUBAGENT SUMMARY: ScriptMaster takes a user's topic description and generates an optimized, well-structured voice-over script specifically designed for talking head videos, incorporating natural speech patterns and appropriate pacing. B) FINAL TASK OUTPUT: A text file containing a 100-300 word script, formatted with proper punctuation and natural speech breaks, optimized for voice-over recording, including clear paragraph breaks and timing markers. C) SUBAGENT INPUT: 1. User's topic/theme description 2. Target length preference (defaulting to 100-300 words if not specified) 3. Any specific tone/style requirements (defaulting to conversational if not specified) E) SUBAGENT TASK SUMMARY: The workflow follows this sequence: Initial Input > #216 Research Topic Deeply > #223 Powerful LLM Prompt-to-Text Response > #171 Write Voice Over Script Based On Instructions > Final Script Output Specifically: 1. First uses #216 to gather comprehensive topic research 2. Then uses #223 to structure this research into a preliminary script outline 3. Finally uses #171 to convert this into a properly formatted voice-over script F) SILOS: This subagent operates in three distinct silos: SILO 1: RESEARCH - Purpose: Gather comprehensive topic information - Skill: #216 (Research Topic Deeply) - Input: User's topic description - Output: 1000-3000 character research summary SILO 2: SCRIPT STRUCTURING - Purpose: Convert research into structured narrative - Skill: #223 (Powerful LLM Prompt-to-Text Response) - Input: Research summary from Silo 1 - Output: Structured script outline SILO 3: VOICE-OVER OPTIMIZATION - Purpose: Format for natural speech and timing - Skill: #171 (Write Voice Over Script Based On Instructions) - Input: Structured script from Silo 2 - Output: Final voice-over ready script Each silo builds upon the previous one, creating a refined and optimized script that will work effectively for the talking head video format. The process ensures the script is both content-rich and naturally speakable.
SubAgent #1 - Diagram
Expand Diagram
Here's my complete analysis and workflow for SUBAGENT 2 "VoiceForge": A) SUBAGENT SUMMARY: Converts a written script into high-quality voice-over audio, with additional quality checks and optimizations for the talking head video use case. B) FINAL TASK OUTPUT: A single MP3 file URL containing clear voice-over audio, optimized for lip-syncing, with normalized volume levels, typically 1-5 minutes in length based on the 100-300 word input script. C) SUBAGENT INPUT: - Primary Input: Text script (from ScriptMaster subagent) - Secondary Input: Any voice preference parameters (gender, accent, style, etc.) E) SUBAGENT TASK SUMMARY: The workflow requires several chained steps to ensure optimal audio quality for lip-syncing: 1. Initial Voice Generation: text script > #170 (Turn Script Into Voice Over MP3) > initial MP3 URL 2. Audio Quality Check & Optimization: MP3 URL > #178 (Convert MP3 to WAV) > WAV URL WAV URL > #179 (Create Visual Waveform) > waveform JPEG waveform JPEG > #176 (Analyze Image With GPT Vision) > audio quality analysis text 3. Volume Normalization (if needed based on analysis): If quality check indicates issues: WAV URL > #219 (Cut WAV into pieces) > multiple WAV segments multiple WAV segments > #178 (Convert MP3 to WAV) > final normalized MP3 URL F) SILOS: SILO 1: Voice Generation - Purpose: Create initial voice-over - Input: Text script - Skills: #170 - Output: Initial MP3 SILO 2: Quality Verification - Purpose: Analyze audio quality - Input: MP3 from Silo 1 - Skills: #178, #179, #176 - Output: Quality analysis and WAV file SILO 3: Audio Optimization (conditional) - Purpose: Normalize and optimize audio if needed - Input: WAV from Silo 2 - Skills: #219, #178 - Output: Final MP3 URL This workflow ensures we get consistently high-quality audio suitable for lip-syncing, with proper volume levels and clear pronunciation, while maintaining efficiency by only running optimization steps when needed based on quality analysis.
SubAgent #2 - Diagram
Expand Diagram
Let me break down the AvatarVision subagent following the requested format: A) SUBAGENT SUMMARY: AvatarVision generates a high-quality, thematically appropriate AI avatar image that will serve as the base for the talking head video, ensuring the avatar matches the topic and style requirements while maintaining professional quality suitable for lip-syncing. B) FINAL TASK OUTPUT: A 1024x1024 transparent PNG file of a professionally rendered AI avatar head/shoulders shot, with clear facial features (especially the mouth area), uniform lighting, and a clean background removal, saved as a URL. C) SUBAGENT INPUT: - Primary topic/theme description from user - Style preferences for avatar (gender, age, profession, etc.) - Any specific visual requirements (like "wearing a lab coat" for science videos) E) SUBAGENT TASK SUMMARY: 1. #223 (Powerful LLM Prompt-to-Text Response) → Takes user input and generates optimal image generation prompts → Creates three variations of prompts optimized for avatar creation 2. #222 (Make Image Especially With Text) x3 → Generates three different versions of the avatar using the optimized prompts → Each with slightly different angles/expressions to ensure optimal output 3. #176 (Analyze An Image With GPT Vision & Return Text) x3 → Analyzes each generated avatar for quality and suitability → Checks for clear facial features, proper framing, and professional appearance 4. #182 (Create Dalle Image Transparent Square) → Takes the best prompt (based on analysis) and generates final 1024x1024 transparent PNG → Ensures clean background removal and proper head/shoulders framing F) SILOS: SILO 1: PROMPT ENGINEERING - Input: User requirements - Skill: #223 - Output: 3x optimized prompts SILO 2: IMAGE GENERATION & TESTING - Input: 3x prompts from Silo 1 - Skills: #222 (x3 parallel runs) - Skills: #176 (x3 parallel analyses) - Output: Best performing prompt + quality assessment SILO 3: FINAL RENDER - Input: Best prompt from Silo 2 - Skill: #182 - Output: Final transparent PNG avatar This workflow ensures multiple attempts at creating the perfect avatar while maintaining quality control through analysis, ultimately delivering a production-ready avatar image suitable for the talking head video.
SubAgent #3 - Diagram
Expand Flow
Let me break down the LipSyncWizard subagent in detail: A) SUBAGENT SUMMARY: A specialized agent that analyzes audio files to generate precise phoneme timing data and maps these to corresponding viseme (mouth shape) positions, creating a structured animation dataset for lip-sync animation. B) FINAL TASK OUTPUT: A JSON data structure containing: - Timestamped phoneme sequences - Corresponding viseme mappings - Head position/movement keyframes - Facial expression keyframes Each with precise millisecond timing markers C) SUBAGENT INPUT: - Voice-over MP3 file URL - Transcription text with word timing data - Avatar image reference (for facial reference points) D) SUBAGENT TASK SUMMARY: 1. Convert MP3 to WAV for precise audio analysis (#178 - Convert 1-20 MP3s to wav) 2. Generate visual waveform for amplitude analysis (#179 - Create Visual Waveform Of 60 second Wav/mp3 File) 3. Extract precise audio timing data (#198 - Get Transcription Of MP3 With Timings) 4. Analyze waveform patterns (#176 - Analyze An Image With GPT Vision & Return Text) Used to analyze the waveform for amplitude patterns 5. Generate structured timing data (#223 - Powerful LLM Prompt-to-Text Response) Used to convert all collected data into final JSON structure E) SILOS: SILO 1: AUDIO PREPROCESSING - Input: MP3 URL - Skill #178: Convert to WAV - Skill #179: Generate waveform Output: WAV file + Waveform image SILO 2: TIMING EXTRACTION - Input: WAV file - Skill #198: Get precise transcription/timing - Skill #176: Analyze waveform patterns Output: Timing data + Amplitude patterns SILO 3: DATA STRUCTURING - Input: Timing data + Amplitude patterns - Skill #223: Convert to structured JSON - Output: Final JSON animation data structure This workflow addresses the technical challenge of generating precise lip-sync data by breaking down the audio analysis into distinct phases, each contributing specific data points needed for the final animation structure. The combination of audio conversion, waveform analysis, and structured data generation creates a comprehensive lip-sync dataset that can be used by the video assembly component.
4 Template & Links
Expand Flow
Let me analyze and break down the VideoAssemblerPro subagent in detail: A) SUBAGENT SUMMARY: VideoAssemblerPro combines an AI-generated avatar image, voice-over audio, and lip-sync data to create a synchronized talking head video where the avatar's mouth and facial movements match the audio speech patterns. B) FINAL TASK OUTPUT: MP4 video file (16:9 aspect ratio) featuring the avatar speaking with synchronized lip movements, with duration matching the input audio file length, at 30fps, with the avatar centered in frame against a clean background. C) SUBAGENT INPUT: - PNG file URL of the AI-generated avatar image - MP3 file URL of the voice-over audio - Text transcription with timing data (for lip sync matching) E) SUBAGENT TASK SUMMARY: 1. Input validation and preparation: - #198 (Get Transcription Of MP3) to generate precise timing data from the voice-over - #178 (Convert MP3 to WAV) to prepare audio for processing 2. Avatar processing: - #191 (Resize Image) to ensure avatar is properly sized for video frame - #221 (Recreate New Image) to optimize avatar for animation 3. Final video assembly: - #168 (Generate Talking Head Video From MP3 & transcription) to create the final synchronized video F) SILOS: SILO 1: AUDIO PREPARATION Input: Original MP3 voice-over > #198 Get Transcription Of MP3 > #178 Convert MP3 to WAV Output: WAV file + detailed transcription with timing data SILO 2: AVATAR PREPARATION Input: Original avatar PNG > #191 Resize Image (to video frame specs) > #221 Recreate New Image (optimization) Output: Processed avatar image ready for animation SILO 3: VIDEO GENERATION Input: Processed avatar + WAV + transcription > #168 Generate Talking Head Video Output: Final MP4 video with synchronized lip movements The workflow is designed to prepare both the audio and visual elements separately before combining them in the final video generation step, ensuring optimal quality and synchronization.
5 Template & Links
Expand Flow
I notice that you've asked me to analyze "subagent 6" but in the original workflow there were only 5 subagents listed (ScriptMaster, VoiceForge, AvatarVision, LipSyncWizard, and VideoAssemblerPro). There is no subagent 6 defined in the workflow. Would you like me to: 1. Analyze one of the existing 5 subagents (if so, which one?) 2. Propose a new subagent 6 that could enhance the workflow? 3. Or would you like to clarify which specific subagent you'd like me to focus on? Please let me know which option you prefer and I'll provide a detailed analysis following the format you've outlined.
6 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
7 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
8 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
9 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
10 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
11 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
12 Template & Links
Expand Flow
Need To Start Afresh?
BACK TO REFINE
Tweaked & Good To Go?
PROCEED TO DEPLOY