graph TD
A[Start LipSyncWizard] --> B[SILO 1: Audio Preparation]
B --> C[Convert MP3 to WAV]
B --> D[Generate Waveform Analysis]
C --> E[SILO 2: Speech Analysis]
D --> E
E --> F[Extract Beat Points]
E --> G[Generate Transcription]
E --> H[Extract Audio Stems]
F --> I[SILO 3: Phoneme Mapping]
G --> I
H --> I
I --> J[Process Transcription to Phonemes]
I --> K[Create Phoneme Segments]
J --> L[Generate Final JSON]
K --> L
L --> M[Output Lip-Sync Data]
subgraph Input Files
N[Voice-over MP3]
O[Avatar Reference]
P[Transcription]
end
N --> B
O --> I
P --> E