graph TD
A[Input: MP3 URL + Avatar Image] --> B[Convert MP3 to WAV]
B --> C[Generate Visual Waveform]
B --> D[Extract Audio Timing]
C --> E[Analyze Waveform Patterns]
D --> F[Timing Data Collection]
E --> F
F --> G[Generate JSON Structure]
G --> H[Final Lip-Sync Dataset]
subgraph SILO1[Audio Preprocessing]
B
C
end
subgraph SILO2[Timing Extraction]
D
E
F
end
subgraph SILO3[Data Structuring]
G
end
style A fill:#f9f,stroke:#333
style H fill:#9ff,stroke:#333
style SILO1 fill:#ffe6e6
style SILO2 fill:#e6ffe6
style SILO3 fill:#e6e6ff