graph TD
A[Start] --> B[Check Video Input]
B --> C[Extract Audio Track]
C --> D[Convert Audio to MP3]
D --> E[Initialize Speech-to-Text]
E --> F[Process Audio Chunks]
F --> G[Generate Raw Transcript]
G --> H[Add Timestamps]
H --> I[Format to JSON]
I --> J[Validate JSON Structure]
J --> K[Save Timestamped Transcript]
K --> L[End]
subgraph Input Validation
B
end
subgraph Audio Processing
C
D
end
subgraph Speech Recognition
E
F
G
end
subgraph Timestamp Addition
H
end
subgraph JSON Formatting
I
J
K
end