graph TD Start[Start VideoAssembler] --> Input1[Receive MP3 URL] Start --> Input2[Receive PNG URL] Start --> Input3[Receive Script Text] subgraph SILO1[Audio Analysis] Input1 --> A1[Extract Audio Timing] A1 --> A2[Generate Timestamped Transcription] end subgraph SILO2[Video Generation] Input2 --> V1[Process Avatar Image] A2 --> V2[Map Timings to Lip Movements] V1 --> V3[Generate Initial MP4] V2 --> V3 Input3 --> V2 end subgraph SILO3[Quality Verification] V3 --> Q1[Extract Key Frames] V3 --> Q2[Verify Audio Sync] Q1 --> Q3[Quality Check] Q2 --> Q3 end Q3 --> Decision{Meets Quality Standards?} Decision -->|Yes| Output[Final MP4 Video] Decision -->|No| V2 Output --> End[End VideoAssembler]