Ok so I want to build a React app that lets me edit HTML pages using my voice.
This should be a progressive webapp (PWA), so we need manifest.json (even if service workers will be limited as Whisper API needs to be online).
TECH STACK #
Whisper API and OpenAI (via OpenRouter) will be needed.
We will want to handle voice recording with RecordRTC (to ensure voice recording compatibility with iOS).
This initial version will be single-user. We will save sessions with Firebase, but with no login/multi-user required.
OVERVIEW OF APP #
Here's how the app will work: I will import a block of HTML, a script will segment it into chunks, then I load up the HTML in an iframe. I select a chunk, talk out the changes, Whisper API transcribes my voice, then the transcription is wrapped inside a prompt (along with the selected code chunk and other instructions). The LLM then returns the code - edited as per my transcribed instructions - and the new code replaces the old code within the iframe. I can then save and export the edited page.
This is v1 so we are not adding all the features here at once (although they may appear in the UI frontend as placeholders).
OVERVIEW OF PAGES #
There will be 3 pages - import / current / sessions
IMPORT - user pastes in HTML to edit, it is imported with a script and saved to Firebase as a "new session"
CURRENT - displays current (most recent) session. HTML appears as iframe and user can select area to edit, talk out text to edit and it is edited.
SESSIONS - table which displays all sessions. user can download final HTML as a zip or continue editing.
NAVBAR #
We will need a navbar with `LOGO.png` on the left and import / current / sessions on the right.
Here is a detailed overview of each page:
FRONTEND - IMPORT PAGE #
The import page lets users import the HTML by pasting it into a big text area and clicking import.
As well as the `html to import` text area we will want a few extra fields:
- Title: (text field)
- Type: (radio buttons to select between `Agent X` `Raw text` and `all HTML` with `Agent X` being the default and only option to start (later on we will add different rules, and use different components at each stage when we bring on new types, but to start they must select `Agent X` as only checkable option)
- Folder URL (text field - although this is just a placeholder for now and isn't used)
- Tag to divide: (text field - although this is just a placeholder for now and isn't used)
Then there will be a big import button.
BACKEND - IMPORT SCRIPT #
Clicking `import` will run the import-agentx.js script, as follows (later we will add other scripts to import other types)
Firstly, we should create a new session - with an identifier (eg 12345), the title, and all the other fields the user entered. This session will be appended to with SVGs, HTML saves and more as described next...
Secondly, we search the HTML document for any SVG files and for each one, save them as svg1, svg2, etc, and replace everything inside with [SVG1], [SVG2], etc (so the HTML like will now be replaced with [svg1]
Then we need to save each of these SVGs to Firebase, associated with this session (we will want to readd the SVGs later).
Thirdly, we will want to insert separators, so that before each