graph TD
A[Input MP3] --> B[Convert to WAV]
A --> C[Get Transcription]
B --> D[Create Waveform]
subgraph SILO1[Audio Analysis]
B
C
D
end
D --> E[Cut Audio to Phonemes]
D --> F[Analyze Amplitude]
subgraph SILO2[Phoneme Extraction]
E
F
end
C --> G[Generate Viseme Map]
H[Input Avatar Image] --> I[Analyze Mouth Constraints]
G --> J[Create Final JSON]
I --> J
F --> J
E --> J
subgraph SILO3[Animation Mapping]
G
I
J
end
J --> K[Animation Data Output]
style SILO1 fill:#f9f,stroke:#333,stroke-width:2px
style SILO2 fill:#bbf,stroke:#333,stroke-width:2px
style SILO3 fill:#bfb,stroke:#333,stroke-width:2px