graph TD subgraph VideoProcessingAgent A[Video Input File] --> B[Extract Audio Track] B --> C[Convert to MP3] C --> D[Process through Speech-to-Text API] D --> E[Generate Initial Transcript] E --> F[Add Timestamp Markers] F --> G[Format to JSON Structure] G --> H[Validate JSON Format] H --> I[Clean Transcript Text] I --> J[Output Timestamped Transcript] end subgraph InputChecks K[Check Video Format] L[Verify File Size] M[Check Audio Quality] end subgraph ErrorHandling N[Log Processing Errors] O[Retry Failed Steps] P[Alert System] end K --> A L --> A M --> B D --> N N --> O O --> D N --> P