graph TD A[Input MP3 & Script] --> B[SILO 1: Audio Preprocessing] subgraph Preprocessing B --> B1[Convert MP3 to WAV] B1 --> B2[Extract Vocal Stem] end B2 --> C[SILO 2: Amplitude Mapping] B2 --> D[SILO 3: Speech Timing] subgraph Amplitude Analysis C --> C1[Generate Waveform] C1 --> C2[Analyze Visual Pattern] C2 --> C3[Extract Amplitude Data] end subgraph Timing Analysis D --> D1[Get Timestamped Transcription] D1 --> D2[Extract Rhythm Points] D2 --> D3[Map Phoneme Timings] end C3 --> E[SILO 4: Data Compilation] D3 --> E subgraph Final Processing E --> E1[Structure JSON Output] E1 --> E2[Map Viseme Positions] end E2 --> F[Output: Lip-Sync Data File]