graph TD
A[Input: MP3 URL] --> B[SILO 1: AUDIO PREPARATION]
B --> C[Convert MP3 to WAV]
C --> D[WAV File URL]
D --> E[SILO 2: SPEECH ANALYSIS]
E --> F[Generate Transcription]
E --> G[Create Waveform]
F --> H[Transcription with Timings]
G --> I[Waveform Image]
H --> J[SILO 3: WAVEFORM ANALYSIS]
I --> J
J --> K[Analyze with GPT Vision]
J --> L[Extract Beatpoints]
K --> M[Amplitude Data]
L --> N[Rhythm Markers]
M --> O[SILO 4: DATA SYNTHESIS]
N --> O
H --> O
O --> P[LLM Processing]
P --> Q[Final JSON Output]
style B fill:#f9f,stroke:#333
style E fill:#f9f,stroke:#333
style J fill:#f9f,stroke:#333
style O fill:#f9f,stroke:#333