graph TD A[Input] --> B[Avatar PNG] A --> C[Voice-over MP3] subgraph SILO1[Avatar Preparation] B --> D[Skill 191: Resize Image] D --> E[Resized Avatar 1080x1080] end subgraph SILO2[Audio Analysis] C --> F[Skill 198: Get MP3 Transcription] F --> G[Lip-sync Timing Data] end subgraph SILO3[Video Generation] E --> H[Skill 168: Generate Video] G --> H C --> H H --> I[Raw MP4] end subgraph SILO4[Video Finalization] I --> J[Skill 194: Trim Video] J --> K[Final MP4] end K --> L[Output: Talking Head Video] style SILO1 fill:#f9f,stroke:#333,stroke-width:2px style SILO2 fill:#bbf,stroke:#333,stroke-width:2px style SILO3 fill:#bfb,stroke:#333,stroke-width:2px style SILO4 fill:#fbf,stroke:#333,stroke-width:2px