graph TD A[Input MP3] --> B[Convert to WAV] A --> C[Get Transcription] B --> D[Create Waveform] subgraph SILO1[Audio Analysis] B C D end D --> E[Cut Audio to Phonemes] D --> F[Analyze Amplitude] subgraph SILO2[Phoneme Extraction] E F end C --> G[Generate Viseme Map] H[Input Avatar Image] --> I[Analyze Mouth Constraints] G --> J[Create Final JSON] I --> J F --> J E --> J subgraph SILO3[Animation Mapping] G I J end J --> K[Animation Data Output] style SILO1 fill:#f9f,stroke:#333,stroke-width:2px style SILO2 fill:#bbf,stroke:#333,stroke-width:2px style SILO3 fill:#bfb,stroke:#333,stroke-width:2px