graph TD
START([Start]) --> IN[Input: MP3 & Avatar Image]
IN --> CONV[Convert MP3 to WAV]
IN --> IMG[Analyze Avatar Reference Image]
subgraph AudioProcessing[Audio Processing Silo]
CONV --> TRANS[Generate Detailed Transcription]
TRANS --> WAVE[Create Audio Waveform]
WAVE --> BEAT[Extract Beat Points]
end
subgraph VisualAnalysis[Visual Analysis Silo]
IMG --> VREF[Vision Analysis of Avatar]
WAVE --> WANA[Vision Analysis of Waveform]
end
subgraph DataStructuring[Data Structuring Silo]
BEAT --> PROC[Process Data via LLM]
WANA --> PROC
VREF --> PROC
PROC --> JSON[Format to JSON]
end
JSON --> OUT([Output: Structured Lipsync Data])
style AudioProcessing fill:#f9f,stroke:#333
style VisualAnalysis fill:#bbf,stroke:#333
style DataStructuring fill:#bfb,stroke:#333