graph TD A[Receive Video Input] B[Extract MP3 from Video] C[Initialize Speech-to-Text] D[Process Audio Stream] E[Generate Initial Transcript] F[Add Timestamp Markers] G[Format as JSON] H[Validate JSON Structure] I[Output Final Transcript] A --> B B --> C C --> D D --> E E --> F F --> G G --> H H --> |Valid|I H --> |Invalid|G subgraph Transcription Processor B C D E F G H end