graph TD
A[MP3 URL Input] --> B[Convert to WAV]
B --> C[Generate Waveform]
C --> D[Process Audio]
subgraph SILO1[Audio Preparation]
B
C
end
D --> E[Get Transcription]
D --> F[Extract Timing Data]
subgraph SILO2[Timing Analysis]
E
F
end
E --> G[Combine Audio Data]
F --> G
H[Avatar Reference Image] --> I[Generate Viseme Map]
G --> I
subgraph SILO3[Viseme Mapping]
I
end
I --> J[Create JSON Animation Data]
J --> K[Final Lipsync Output]
style SILO1 fill:#e6f3ff,stroke:#333,stroke-width:2px
style SILO2 fill:#f0fff0,stroke:#333,stroke-width:2px
style SILO3 fill:#fff0f0,stroke:#333,stroke-width:2px