What it does
Upload an audio file — MP3, WAV, M4A, FLAC — or hit record in the browser. You get structured notes back: speakers labeled, key points pulled out, timestamps throughout. Not a raw transcript.
ChatGPT doesn’t accept audio. Gemini takes audio uploads up to 100MB but doesn’t identify speakers and forces you to split anything over 30 minutes. This tool does transcription, speaker diarization, and note organization in one pass.
Runs in the browser — nothing to install, no bot joining a call. For video files or YouTube links, use the video to notes converter.
What’s in the notes:
- Speaker-labeled sections (up to 10 voices)
- Key points and action items, not a raw transcript
- Timestamps linking back to the audio
- 99 languages, auto-detected
Accuracy sits around 95% on clear audio. A 60-minute recording finishes in a few minutes. Files are encrypted and never used to train models (SOC 2 Type II).
How it works
- Upload or record — MP3, WAV, M4A, FLAC, OGG, or tap record in the browser.
- AI transcribes and labels — Speaker diarization runs automatically. Key points and action items get extracted.
- Review and export — PDF, Word, plain text, or Markdown. Timestamps stay clickable.
Context carries through long recordings, so a 2-hour interview stays coherent instead of losing track of who’s talking.
Setting up audio notes by meeting platform
Each major meeting platform stores recordings in a slightly different place and format. The walkthrough below is what works in May 2026.
Zoom
After the call ends, Zoom processes the recording for a few minutes, then drops an .m4a audio file (plus the .mp4 video) into your Zoom cloud or local Documents/Zoom/<meeting>/ folder. Upload the M4A directly: speaker diarization works best on Zoom audio because Zoom separates each participant into its own audio track when recording locally with “Record a separate audio file for each participant” enabled. Output is structured notes with per-attendee action items.
Google Meet
Google Meet recordings save to the host’s Google Drive as MP4. The audio is muxed into the video, so either upload the MP4 directly or use the audio extractor first. Free Meet tier does not include cloud recording, so a phone or laptop mic capture is the fallback. Output is a topic-grouped note with timestamps that link back to the moment in the recording.
Microsoft Teams
Teams recordings land in OneDrive (1:1 calls) or the channel SharePoint site (channel meetings) as MP4. Live transcripts from Teams can also be downloaded as VTT, which you can paste in for a faster pass that skips re-transcription. Output respects Teams speaker labels when a VTT transcript is provided alongside the audio.
In-person recording
Phone voice memos (iPhone records M4A, most Android phones record M4A or AAC), USB lapel mics, or a dedicated handheld recorder all work. Drop the file in. For multi-person in-person recordings, a 360-degree mic (like the Logitech BCC950 or a Jabra Speak) noticeably improves diarization because each voice arrives with a different room signature. Output is the same structured note format used for the remote platforms.
Built-in voice recorder
The browser recorder captures audio without a separate app. Hit record on your phone during a walk, on a laptop during a lecture, or on a desktop for a podcast interview. When you stop, the AI processes it automatically.
- Record in-browser on any device
- Multi-speaker detection
- Searchable timestamps
- Mobile recording with cloud sync
- Handles background noise and overlapping speech
See real notes from a 32-minute audio recording
Below is a real example of what ScreenApp’s notes look like when the input is audio. The source was a 32-minute podcast recording. The note-taker detected nine distinct topics, broke each into 2-3 bulleted key points, and produced a 3-page document that reads like notes a smart assistant would write. No transcript wall, no need to hunt through 15 pages of verbatim text for the moments that matter.
Notes export to the destinations note-takers actually use: Markdown for Notion or Obsidian, plain text for Slack or HubSpot, DOCX for Word if your workflow runs through Office, or PDF for archival. Voice memos, conference recordings, lecture audio, and podcast files all produce notes in this format.
Audio to notes vs other apps
| Feature | ScreenApp | Otter.ai | NoteGPT | meetergo |
|---|---|---|---|---|
| Free tier | 300 min/month | 15 AI actions/month free | 150 min/month | |
| Paid (annual) | Custom | $8.33/mo | $9/mo | $11/mo |
| Max length (free) | Unlimited | 30 min/session | Unlimited | Unlimited |
| File imports (free) | Unlimited | 3 lifetime | Unlimited | Unlimited |
| No download | Yes | No | Yes | No |
| No meeting bot | Yes | No | Yes | Yes |
| Structured notes | Yes | Limited | No | No |
| Speaker ID | Yes | Yes (basic) | Yes (basic) | Yes (basic) |
| Browser recorder | Yes | Yes | No | No |
| 99 languages | Yes | Yes | Yes | Limited |
- Otter.ai has a bigger free tier but requires a bot in your meetings and caps free file imports at 3 lifetime.
- NoteGPT gives raw transcription — no topic grouping or extracted action items.
- meetergo needs a desktop install and has the smallest free tier.
Who uses it
Students record lectures on their phone and get study-ready notes after class, grouped by topic with timestamps.
Business professionals upload recorded calls and voice memos. For live Zoom, Teams, or Meet calls, use the AI meeting note taker — no bot required.
Researchers run interview recordings through it. Speaker labels and citable timestamps make quote retrieval fast.
Content creators and podcasters repurpose episodes into show notes, blog posts, and pull quotes. Good fit for voice memos and field recordings too. If you only need a recap and not full notes, the audio summarizer is the standalone summarizer for trimmed episode digests, and the audio summary API handles programmatic summarization across a back catalogue.
Journalists document interviews and press conferences, then search across recordings by keyword.
FAQ
Is it free?
Yes. Free accounts get the full feature set: speaker labels, structured notes, timestamps, exports.
What file formats are supported?
MP3, WAV, M4A, FLAC, OGG, AAC, and most common audio formats. You can also extract audio from a video file — or use the video to notes converter directly.
How accurate is it?
Around 95% on clear recordings. Speaker diarization handles multiple voices, accents, and technical vocabulary. Low-confidence sections get flagged.
How long does it take?
a few minutes for a 60-minute recording. Clearer audio finishes faster.
Does it identify different speakers?
Yes. Up to 10 distinct speakers, labeled automatically — useful for interviews, podcasts, and multi-person meetings.
Can ChatGPT or Gemini do this?
ChatGPT doesn’t accept audio files. Gemini takes uploads up to 100MB but doesn’t identify speakers and requires you to split recordings over 30 minutes. Neither produces structured notes — only raw transcripts. This tool handles all of it in one step.
What languages does it support?
50+ including Spanish, French, German, Mandarin, Japanese, Portuguese, and Arabic. Language is auto-detected or you can set it manually.
Is it safe?
SOC 2 Type II compliant with AES-256 encryption. Files are never used to train AI models. Automatic deletion after 30 days, or delete manually any time. No meeting bots — you choose what to upload.
Can I export the notes?
PDF, Word, plain text, or Markdown. Copy-to-clipboard also works. Timestamps stay clickable in formats that support links.
Does it work with podcasts?
Yes. Upload any podcast audio file or drop in the URL if you’ve downloaded it. Speaker labels make host/guest tracking automatic.
Can I record voice memos and convert them?
Yes. Upload voice memo files from your phone, or use the in-browser recorder to capture voice memos directly. Either way, you get structured notes back.