Menu
NEW AGENT
MY AGENTS
ASSISTANTS
Step 1:
Topical Avatar Video Creator
1️⃣
Perfect output
- scan ALL
2️⃣ Add
output numbers
, then...
3️⃣ Add
Subagent Numbers
(work backwards
from output number!
)
4️⃣ Add
ACTUAL Skills
to subagent
✅ DONE..Copy x4 to Step 3...
SETTINGS
LOGOUT
What Shall We Build Next?
1
Describe
Describe your task
2
Refine
Refine the plan
3
SubAgents
Review all agents
4
Deploy
Deploy your agent
Sub Agent 1
Sub Agent 2
Sub Agent 3
Sub Agent 4
Sub Agent 5
Sub Agent 6
Sub Agent 7
Sub Agent 8
Let me break down the ScriptMaster subagent based on your guidelines: A) SUBAGENT SUMMARY: ScriptMaster takes a user's topic description and generates a professionally structured voice-over script optimized for talking head videos, incorporating both research-based content and natural speaking patterns. B) FINAL TASK OUTPUT: A text file containing a 100-300 word voice-over script, formatted with proper punctuation and natural speech patterns, optimized for AI voice synthesis, and structured with clear introduction, body, and conclusion sections. C) SUBAGENT INPUT: - Primary user topic/description (text) - Optional style preferences (casual/formal/educational) - Optional target duration (in seconds/minutes) E) SUBAGENT TASK SUMMARY: The workflow requires three chained steps: 1. Research Phase: #216 (Research Topic Deeply) INPUT: user's topic description OUTPUT: comprehensive research summary 2. Keyword Enhancement: #218 (Brainstorm Related Keywords) INPUT: research summary from step 1 OUTPUT: relevant keywords and phrases to incorporate 3. Final Script Generation: #171 (Write Voice Over Script Based On Instructions) INPUT: combined research summary + keywords + specific instructions for voice-over formatting OUTPUT: final voice-over script F) SILOS: SILO 1 - RESEARCH & PREPARATION - Skill #216: Deep research of topic - Skill #218: Keyword enhancement SILO 2 - SCRIPT CREATION - Skill #171: Final script generation with specific instructions: * Include natural speech patterns * Format for lip-sync compatibility * Structure with clear intro/body/conclusion * Maintain 100-300 word limit * Optimize for AI voice synthesis The workflow ensures comprehensive research is done first, then enhanced with relevant keywords, before being transformed into a natural-sounding script that's optimized for AI voice synthesis and talking head video generation. This approach guarantees that the final script is both well-researched and properly formatted for the subsequent stages in the larger workflow (voice synthesis and video generation).
SubAgent #1 - Diagram
Expand Diagram
Let me break down the VoiceForge subagent in detail: A) SUBAGENT SUMMARY: VoiceForge converts a text script into high-quality voice-over audio, ensuring proper formatting and optimal audio quality for lip-sync compatibility. B) FINAL TASK OUTPUT: An MP3 file URL containing clear, well-paced voice-over audio with consistent volume levels, typically 1-5 minutes in duration, sampled at 44.1kHz with a bitrate of 192kbps or higher (required for clean lip-sync). C) SUBAGENT INPUT: - Primary Input: Text script (100-300 words) from ScriptMaster - Additional Input: Voice style preferences (if any) E) SUBAGENT TASK SUMMARY: The flow requires multiple steps to ensure optimal audio quality: 1. Initial Voice Generation: Input(text script) > #170 (Turn Script Into Voice Over MP3) > MP3_URL 2. Audio Quality Check: MP3_URL > #179 (Create Visual Waveform Of 60 second Wav/mp3 File) > Waveform_Image Waveform_Image > #176 (Analyze An Image With GPT Vision & Return Text) > Audio_Quality_Analysis 3. If Audio Quality Analysis indicates issues: - Convert format: MP3_URL > #178 (Convert 1-20 MP3s to wav) > WAV_URL - Extract crucial timing: WAV_URL > #198 (Get Transcription Of MP3 With Timings) > Timing_Data - Regenerate if needed: Return to step 1 with adjusted pacing based on timing data F) SILOS: SILO 1: VOICE GENERATION - Purpose: Initial voice creation - Skill: #170 - Input: Text script - Output: Initial MP3 SILO 2: QUALITY VERIFICATION - Purpose: Ensure audio meets lip-sync requirements - Skills: #179, #176, #178 - Input: Initial MP3 - Output: Quality analysis and WAV format SILO 3: TIMING OPTIMIZATION - Purpose: Verify proper pacing and timing - Skill: #198 - Input: WAV file - Output: Timing verification and final MP3 URL This structure ensures we get consistently high-quality voice output suitable for lip-syncing, with built-in quality checks and optimization steps.
SubAgent #2 - Diagram
Expand Diagram
Let me break down the AvatarVision subagent following the requested format: A) SUBAGENT SUMMARY: AvatarVision generates a high-quality, themed AI avatar image that matches the video's topic and style, optimized for talking head animation purposes. B) FINAL TASK OUTPUT: A square (1024x1024) transparent PNG file of a professional-looking avatar head/shoulders, with clean edges, neutral expression, and clear facial features suitable for animation, saved with transparent background. C) SUBAGENT INPUT: - Topic/theme of the video - Style preferences for avatar (professional, casual, specific profession, etc.) - Any specific visual requirements (age, gender, ethnicity, clothing, etc.) E) SUBAGENT TASK SUMMARY: The workflow chains together as follows: 1. Input description > #223 (Powerful LLM) to enhance/optimize the avatar prompt 2. Enhanced prompt > #222 (Make Image With Text) to generate initial avatar attempt 3. Initial avatar URL > #176 (Analyze Image with GPT Vision) to verify suitability 4. If needed based on analysis > #221 (Recreate Image) to refine/improve 5. Final image > #191 (Resize Image) to ensure 1024x1024 dimension 6. Output: Final transparent PNG avatar image URL F) SILOS: SILO 1 - PROMPT OPTIMIZATION - Input: Raw topic/style description - Skill: #223 (Powerful LLM) - Output: Optimized image generation prompt SILO 2 - IMAGE GENERATION - Input: Optimized prompt - Skill: #222 (Make Image With Text) - Output: Initial avatar image URL SILO 3 - QUALITY CONTROL - Input: Initial avatar URL - Skill: #176 (Analyze Image with GPT Vision) - Output: Analysis of avatar suitability - Conditional: If analysis indicates issues, use #221 (Recreate Image) SILO 4 - FINAL FORMATTING - Input: Best avatar image - Skill: #191 (Resize Image) to 1024x1024 - Output: Final formatted transparent PNG This structured approach ensures we get a high-quality, appropriately-themed avatar that's properly formatted for the subsequent animation process, with quality control measures built in.
SubAgent #3 - Diagram
Expand Flow
Let me break down the LipSyncWizard subagent in detail: A) SUBAGENT SUMMARY: LipSyncWizard processes an audio file to generate precise phoneme timing data and maps it to viseme (visual phoneme) positions for accurate lip-sync animation of an AI avatar. B) FINAL TASK OUTPUT: A structured JSON data file containing: - Timestamped phoneme sequences - Corresponding viseme mappings - Head movement timing markers - Facial expression change points - Audio amplitude data for emphasis C) SUBAGENT INPUT: - Voice-over MP3 file URL - Transcription with word-level timing - Avatar image reference (for motion constraints) E) SUBAGENT TASK SUMMARY: 1. Convert MP3 to WAV for analysis #178 - Convert 1-20 MP3s to wav 2. Generate detailed audio analysis #198 - Get Transcription Of MP3 (With Timings) #179 - Create Visual Waveform Of 60 second Wav/mp3 File #180 - Extract Beatpoints & Tempo of MP3 3. Process transcription for phoneme mapping #185 - Write Text (Or Copy) From Inputted Text [to convert transcription into phoneme sequences] 4. Generate amplitude visualization for emphasis points #176 - Analyze An Image With GPT Vision & Return Text [to analyze waveform] 5. Compile final data structure #223 - Powerful LLM Prompt-to-Text Response [to format all data into required JSON structure] F) SILOS: SILO 1 - AUDIO PREPROCESSING - Input: MP3 URL - Skill: #178 (Convert to WAV) - Output: WAV file URL SILO 2 - AUDIO ANALYSIS - Input: WAV URL - Skills: #198, #179, #180 - Outputs: * Detailed transcription with timing * Waveform visualization * Beat/tempo data SILO 3 - PHONEME MAPPING - Input: Transcription - Skill: #185 - Output: Phoneme sequence data SILO 4 - EMPHASIS DETECTION - Input: Waveform image - Skill: #176 - Output: Amplitude analysis SILO 5 - DATA COMPILATION - Inputs: All previous outputs - Skill: #223 - Output: Final JSON structure Each silo operates sequentially, with the final silo combining all data into the structured format required for avatar animation.
4 Template & Links
Expand Flow
Let me break down the VideoAssemblerPro subagent following your guidelines: A) SUBAGENT SUMMARY: A specialized video assembly system that combines an AI avatar image, voice-over audio, and lip-sync data to create a synchronized talking head video with natural-looking mouth movements. B) FINAL TASK OUTPUT: MP4 video file (1920x1080, 16:9 aspect ratio) featuring the AI avatar with synchronized lip movements matching the voice-over audio, duration matching the input audio file length. C) SUBAGENT INPUT: - PNG URL of the AI-generated avatar image - MP3 URL of the voice-over audio - Text transcription with timing data (for lip-sync matching) E) SUBAGENT TASK SUMMARY: Input > #198 Get Transcription Of MP3 (With Timings) > #178 Convert MP3 to WAV > #168 Generate Talking Head Video From MP3 & transcription > #191 Resize Image (if needed to match 16:9) > Output The flow works like this: 1. First, get precise phoneme timing data from the MP3 using transcription 2. Convert audio to WAV format for better processing 3. Generate the base talking head video using the core animation system 4. Resize if needed to ensure proper video dimensions 5. Return final MP4 F) SILOS: SILO 1 - AUDIO PREPARATION - Input: MP3 URL - Skill #198: Generate precise transcription with timings - Skill #178: Convert to WAV format - Output: WAV file + timing data SILO 2 - VIDEO GENERATION - Input: Avatar PNG, WAV file, timing data - Skill #168: Generate base talking head video - Skill #191: Resize to final dimensions if needed - Output: Final MP4 video NOTE: While this workflow utilizes existing skills, there may be limitations in the current skill set for creating truly naturalistic lip-sync. The #168 skill (Generate Talking Head Video) is doing much of the heavy lifting here, and its capabilities for lip-sync quality should be carefully evaluated. If enhanced lip-sync quality is required, additional specialized skills might need to be developed.
5 Template & Links
Expand Flow
I notice that you've asked me to analyze "subagent 6" but in the original subagent list, there were only 5 subagents defined (ScriptMaster, VoiceForge, AvatarVision, LipSyncWizard, and VideoAssemblerPro). There is no subagent 6 to analyze. Would you like me to: 1. Analyze one of the existing 5 subagents in detail 2. Propose a new subagent 6 that might enhance the workflow 3. Focus on a different aspect of the workflow Please let me know which direction you'd prefer to take, and I'll provide a detailed analysis following the format you've specified.
6 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
7 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
8 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
9 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
10 Template & Links
Expand Flow
Questions & Research Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
11 Template & Links
Expand Flow
Templates & Links Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
12 Template & Links
Expand Flow
Need To Start Afresh?
BACK TO REFINE
Tweaked & Good To Go?
PROCEED TO DEPLOY