graph TD
A[Input] --> B[Avatar PNG]
A --> C[Voice-over MP3]
subgraph SILO1[Avatar Preparation]
B --> D[Skill 191: Resize Image]
D --> E[Resized Avatar 1080x1080]
end
subgraph SILO2[Audio Analysis]
C --> F[Skill 198: Get MP3 Transcription]
F --> G[Lip-sync Timing Data]
end
subgraph SILO3[Video Generation]
E --> H[Skill 168: Generate Video]
G --> H
C --> H
H --> I[Raw MP4]
end
subgraph SILO4[Video Finalization]
I --> J[Skill 194: Trim Video]
J --> K[Final MP4]
end
K --> L[Output: Talking Head Video]
style SILO1 fill:#f9f,stroke:#333,stroke-width:2px
style SILO2 fill:#bbf,stroke:#333,stroke-width:2px
style SILO3 fill:#bfb,stroke:#333,stroke-width:2px
style SILO4 fill:#fbf,stroke:#333,stroke-width:2px