Methodology
Accuracy, speed, and trust signals (with receipts)
This page is the source of truth for every accuracy, speed, and language claim on ScreenApp.io. Numbers come from our internal test corpus, the Groq engineering case study, OpenAI's Whisper benchmarks, and xAI's published Grok Speech-to-Text benchmarks. Last refreshed: May 2026.
The model stack
ScreenApp uses three specialized models, each picked because it leads its category for the job it does:
- Transcription: OpenAI Whisper Large-v3, hosted on Groq's inference infrastructure. Whisper has the broadest language coverage on the market (99 languages). Groq makes it fast enough to feel real-time.
- Speaker diarization: xAI Grok Speech-to-Text API, launched April 18, 2026. Provides word-level speaker IDs across 25 languages. Reports the lowest published error rate (5.0%) on phone-call entity recognition in its category, outperforming ElevenLabs, Deepgram, and AssemblyAI.
- Summarization, chat, and AI analysis: Google Gemini. Gemini handles the layer that turns a raw transcript into a structured summary, chapter markers, action items, Q&A answers, and the chat interface. We're NOT powered by GPT-4, ChatGPT, or Claude. The LLM layer is Gemini end-to-end.
Customer audio is never used to train any of these models. Audio is processed and deleted per your account's retention settings. Full data handling on the Trust Center.
Speed: the Groq case study
In 2025, ScreenApp moved from a self-hosted Whisper deployment on AWS to Groq's inference infrastructure. Groq published the case study; the numbers below are from their engineering team's measurements.
| Metric | Before Groq | After Groq | Change |
|---|---|---|---|
| 20-minute transcription job | ~20 minutes | ~15 seconds | 20x faster |
| Per-minute transcription cost | baseline | 1/15th | 15x cheaper |
| Free-to-paid conversion | baseline | +30% | uplift |
| Annual recurring revenue (year-over-year) | baseline | +405% | growth attributed to the speed and cost gains |
Source: ScreenApp + Groq case study (groq.com).
What this means in practice: a 60-minute meeting completes in roughly 3 minutes end-to-end (transcription, diarization, summary generation). A 2-hour video processes in about 6 minutes. These are end-to-end times that include summarization and chaptering, not just raw transcription.
Accuracy: word error rate benchmarks
Word error rate (WER) counts substitutions + deletions + insertions per 100 reference words. Lower is better. Baseline figures below come from the published benchmarks for each underlying model; the per-condition rows are from our own April 2026 retest on 18 hours of public-domain audio per language across three conditions: studio (single speaker, treated room), conference (multi-speaker, room mic), and field (handheld phone mic, ambient noise).
Published baselines
- Whisper Large-v3: 2.7% WER on LibriSpeech test-clean (audiobook-quality audio); 8 to 12% WER on real-world English meetings, podcasts, and call recordings. Source: openai/whisper-large-v3 model card.
- xAI Grok Speech-to-Text: 5.0% error rate on phone-call entity recognition, compared to ElevenLabs at 12.0%, Deepgram at 13.5%, and AssemblyAI at 21.3%. Source: xAI Grok STT launch announcement.
Per-language WER (April 2026 retest)
| Language | Locale | Studio WER | Conference WER | Field WER | Speakers tested |
|---|---|---|---|---|---|
| English (US) | en-US | 4.2% | 7.8% | 12.4% | 4 |
| Spanish (Latin Am.) | es-419 | 5.1% | 9.2% | 14.6% | 3 |
| Spanish (Spain) | es-ES | 5.4% | 9.8% | 15.1% | 3 |
| Portuguese (BR) | pt-BR | 5.8% | 10.1% | 15.8% | 3 |
| Portuguese (PT) | pt-PT | 6.4% | 11.2% | 17.0% | 2 |
| French | fr-FR | 5.9% | 10.4% | 16.2% | 3 |
| German | de-DE | 6.1% | 10.8% | 16.5% | 3 |
| Italian | it-IT | 6.3% | 11.0% | 17.1% | 3 |
| Japanese | ja-JP | 7.8% | 13.5% | 19.8% | 2 |
| Korean | ko-KR | 7.5% | 13.1% | 19.2% | 2 |
| Mandarin (Simplified) | zh-CN | 7.9% | 14.0% | 20.4% | 3 |
| Hindi | hi-IN | 9.2% | 15.8% | 23.1% | 3 |
| Arabic (MSA) | ar | 9.6% | 16.2% | 24.0% | 2 |
| Russian | ru-RU | 6.8% | 11.5% | 17.4% | 3 |
| Indonesian | id-ID | 7.1% | 12.4% | 18.5% | 2 |
Test methodology
- Corpus: 18 hours per language, drawn from Common Voice contributions, public lecture archives, and journalist transcript releases. No customer audio is ever included.
- Scoring: Aligned with the original transcript using jiwer, the same library AssemblyAI references. Punctuation and capitalization are not penalized. Speaker labels scored separately.
- Cadence: Retested quarterly. Last full run: April 22, 2026. Next: July 2026.
- Conditions defined: Studio = single speaker, lavalier or shotgun mic, treated room. Conference = multi-speaker, room mic, occasional overlap. Field = handheld phone mic, ambient crowd or traffic noise.
Speaker diarization (powered by xAI Grok STT)
Diarization is the process of attaching speaker IDs to each word or segment. ScreenApp routes multi-speaker audio through xAI's Grok Speech-to-Text API, which launched April 18, 2026, and provides word-level speaker IDs in 25 languages.
- Granularity: Word-level speaker IDs (not paragraph-level), so a one-sentence interjection in a multi-speaker meeting gets attributed correctly.
- Modes: Real-time streaming (for live captures) and batch (for uploads).
- Benchmark: 5.0% error rate on phone-call entity recognition (xAI published, April 2026).
- Strongest use cases per xAI's documentation: medical, legal, and financial multi-speaker audio where speaker attribution must be precise.
Supported languages
ScreenApp supports 99 languages for transcription via Whisper Large-v3. A subset of 25 of those also supports speaker diarization via xAI Grok STT (marked with †).
Full language list (Whisper Large-v3)
Afrikaans, Albanian, Amharic, Arabic †, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Cantonese, Catalan, Chinese (Mandarin) †, Croatian, Czech, Danish, Dutch †, English †, Estonian, Faroese, Finnish, French †, Galician, Georgian, German †, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi †, Hungarian, Icelandic, Indonesian †, Italian †, Japanese †, Javanese, Kannada, Kazakh, Khmer, Korean †, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese †, Punjabi, Romanian, Russian †, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish †, Sundanese, Swahili, Swedish, Tagalog †, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish †, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese †, Welsh, Yiddish, Yoruba.
† indicates languages with word-level speaker diarization via xAI Grok STT. Other languages are transcribed (text-only) without per-speaker attribution.
Whisper's language list source: github.com/openai/whisper. Grok STT language coverage source: xAI Voice docs.
User base
2,163,740 accounts registered as of May 2026. Number reflects unique verified-email accounts in our production database. We refresh this figure quarterly on this page.
We do not publish round-number marketing claims like "2 million users" without the verifiable underlying count, on this page or elsewhere. If you ever see an inflated or undated user-count claim on a ScreenApp page, that's a content quality issue and we'd like to know: contact us via the Trust Center.
Free access and pricing
Two ways to use ScreenApp without paying upfront:
- Free signup (Free Forever): Process one recording for free. No credit card. No expiry. After your first recording, you'll need a paid plan or the 7-day trial to keep going. This is what we mean when we say "Free Forever": the first recording is free indefinitely, not unlimited recordings free.
- 7-day Growth trial: Full access to the Growth plan for 7 days. Credit card required (we don't charge until day 8, and you can cancel anytime during the 7 days at no cost). After day 7, the card is billed $228/year.
Paid plans
- Growth: $19/month billed annually ($228/year). Unlimited recordings during the active subscription. The 7-day trial above is for this plan.
- Business: $34/month billed annually. Adds higher file-size caps, team workspaces, and SSO for enterprise plans.
- Monthly billing: Available at higher per-month rates without the 7-day trial.
Current pricing and feature breakdown on the pricing page. What we do NOT offer: a recurring monthly free tier with X minutes/month, a "no credit card" trial of the paid plan, or unlimited free recordings. If you see those claims anywhere on this site, that's a content quality issue — please flag it via the Trust Center and we'll fix the source page.
Security and compliance
SOC 2 Type 2 audited annually. 22 internal policies covering access control, data classification, secure development, and incident response. Continuous control monitoring.
Full live security posture, downloadable SOC 2 Type 2 report, and pre-filled security questionnaire at our Trust Center (trust.inc/screenapp).
Sources and external benchmarks
- ScreenApp + Groq case study: speed and cost numbers for the inference layer.
- xAI Grok Speech-to-Text launch: diarization benchmarks and language list.
- OpenAI Whisper Large-v3 model card: published WER baselines.
- openai/whisper on GitHub: full language list and model details.
- Artificial Analysis: Whisper WER index: independent third-party benchmarks.
- jiwer: the WER scoring library used in our per-language retests.
- ScreenApp Trust Center: SOC 2, policies, sub-processors, security questionnaire.
Errata and corrections
Numbers on ScreenApp pages should match this page. If you find a feature page that contradicts these figures, that's a content quality bug we want to fix. Report it via the Trust Center contact form and we'll update the source page within 7 days.