graph TD A[Start LipSyncWizard] --> B[SILO 1: Audio Preparation] B --> C[Convert MP3 to WAV] B --> D[Generate Waveform Analysis] C --> E[SILO 2: Speech Analysis] D --> E E --> F[Extract Beat Points] E --> G[Generate Transcription] E --> H[Extract Audio Stems] F --> I[SILO 3: Phoneme Mapping] G --> I H --> I I --> J[Process Transcription to Phonemes] I --> K[Create Phoneme Segments] J --> L[Generate Final JSON] K --> L L --> M[Output Lip-Sync Data] subgraph Input Files N[Voice-over MP3] O[Avatar Reference] P[Transcription] end N --> B O --> I P --> E