Video Assembly Silo
Image Processing Silo
Audio Processing Silo
Combine Components
Generate Raw Video
Cut and Clean Video
Final MP4 Output
Input Avatar PNG
Resize to 1024x1024
GPT Vision Analysis
Feature Map Output
Input MP3
Get Transcription with Timings
Extract Clean Audio
Phoneme Data Output
User Input: Avatar & Audio Files
Final Talking Head Video