graph TD
A[Input MP3 & Script] --> B[SILO 1: Audio Preprocessing]
subgraph Preprocessing
B --> B1[Convert MP3 to WAV]
B1 --> B2[Extract Vocal Stem]
end
B2 --> C[SILO 2: Amplitude Mapping]
B2 --> D[SILO 3: Speech Timing]
subgraph Amplitude Analysis
C --> C1[Generate Waveform]
C1 --> C2[Analyze Visual Pattern]
C2 --> C3[Extract Amplitude Data]
end
subgraph Timing Analysis
D --> D1[Get Timestamped Transcription]
D1 --> D2[Extract Rhythm Points]
D2 --> D3[Map Phoneme Timings]
end
C3 --> E[SILO 4: Data Compilation]
D3 --> E
subgraph Final Processing
E --> E1[Structure JSON Output]
E1 --> E2[Map Viseme Positions]
end
E2 --> F[Output: Lip-Sync Data File]