graph TD A[Input: MP3 URL + Avatar Image] --> B[Convert MP3 to WAV] B --> C[Generate Visual Waveform] B --> D[Extract Audio Timing] C --> E[Analyze Waveform Patterns] D --> F[Timing Data Collection] E --> F F --> G[Generate JSON Structure] G --> H[Final Lip-Sync Dataset] subgraph SILO1[Audio Preprocessing] B C end subgraph SILO2[Timing Extraction] D E F end subgraph SILO3[Data Structuring] G end style A fill:#f9f,stroke:#333 style H fill:#9ff,stroke:#333 style SILO1 fill:#ffe6e6 style SILO2 fill:#e6ffe6 style SILO3 fill:#e6e6ff