graph TD
Start[Start VideoAssembler] --> Input1[Receive MP3 URL]
Start --> Input2[Receive PNG URL]
Start --> Input3[Receive Script Text]
subgraph SILO1[Audio Analysis]
Input1 --> A1[Extract Audio Timing]
A1 --> A2[Generate Timestamped Transcription]
end
subgraph SILO2[Video Generation]
Input2 --> V1[Process Avatar Image]
A2 --> V2[Map Timings to Lip Movements]
V1 --> V3[Generate Initial MP4]
V2 --> V3
Input3 --> V2
end
subgraph SILO3[Quality Verification]
V3 --> Q1[Extract Key Frames]
V3 --> Q2[Verify Audio Sync]
Q1 --> Q3[Quality Check]
Q2 --> Q3
end
Q3 --> Decision{Meets Quality Standards?}
Decision -->|Yes| Output[Final MP4 Video]
Decision -->|No| V2
Output --> End[End VideoAssembler]