Methodology

Accuracy, speed, and trust signals (with receipts)

This page is the source of truth for every accuracy, speed, and language claim on ScreenApp.io. Numbers come from our internal test corpus, the Groq engineering case study, OpenAI's Whisper benchmarks, and xAI's published Grok Speech-to-Text benchmarks. Last refreshed: May 2026.

The model stack

Transcription does not bet on a single vendor. ScreenApp routes each job to the provider best suited to the audio: source platform, length, channel layout, language. As of May 2026, the primary providers are:

OpenAI Whisper Large-v3 on Groq's inference infrastructure: broadest language coverage (99 languages) and the fastest path for long-form audio. Large-v3 reduced word error rates 10 to 20% over Large-v2. Groq runs Whisper at a 164x real-time speed factor on the independent Artificial Analysis benchmark. A 2-hour video transcribes in about 3 minutes end-to-end.
Google Gemini 3.1 Flash Lite: purpose-built for short audio (under 5 minutes) and downstream AI summarization. The same model powers the summary and chat layers, so transcription and analysis share context.
xAI Grok Speech-to-Text: highest published accuracy on phone-call and multi-channel audio (5.0% entity error rate vs ElevenLabs 12.0%, Deepgram 13.5%, AssemblyAI 21.3%). Native word-level speaker diarization in 25 languages. Built on the same production stack that powers Grok Voice inside Tesla vehicles and Starlink customer support. API priced at $0.10/hour batch, $0.20/hour streaming (roughly 60% below ElevenLabs and Deepgram per Dapta's comparison). Launched April 18, 2026.

Fallback providers (used when a primary provider is rate-limited or unavailable): Cloudflare Workers AI, Fireworks AI, Mistral, Baseten. A transcription job never fails because one vendor had an outage.

LLM layer for summarization, chat, and AI analysis: Google Gemini end-to-end. ScreenApp is NOT powered by GPT-4, ChatGPT, or Claude. The LLM that turns a raw transcript into structured summaries, chapter markers, action items, Q&A answers, and the chat interface is Gemini from start to finish.

Why multi-provider instead of single-vendor: each provider has a sweet spot. Whisper has the most languages, Groq makes it fast. Gemini is best at short audio and downstream LLM analysis. xAI Grok STT has the lowest published error rate on phone calls. Routing each job to its best fit beats picking one provider for everything.

Customer audio is never used to train any of these models. Audio is processed and deleted per your account's retention settings. Full data handling on the Trust Center.

Speed: the Groq case study

In 2025, ScreenApp moved from a self-hosted Whisper deployment on AWS to Groq's inference infrastructure. Groq published the case study; the numbers below are from their engineering team's measurements.

Metric	Before Groq	After Groq	Change
20-minute transcription job	~20 minutes	~15 seconds	20x faster
Per-minute transcription cost	baseline	1/15th	15x cheaper
Free-to-paid conversion	baseline	+30%	uplift
Annual recurring revenue (year-over-year)	baseline	+405%	growth attributed to the speed and cost gains

Source: ScreenApp + Groq case study (groq.com).

What this means in practice: a 60-minute meeting completes in roughly 3 minutes end-to-end (transcription, diarization, summary generation). A 2-hour video processes in about 6 minutes. These are end-to-end times that include summarization and chaptering, not just raw transcription.

Accuracy: word error rate benchmarks

Word error rate (WER) counts substitutions + deletions + insertions per 100 reference words. Lower is better. Baseline figures below come from the published benchmarks for each underlying model; the per-condition rows are from our own April 2026 retest on 18 hours of public-domain audio per language across three conditions: studio (single speaker, treated room), conference (multi-speaker, room mic), and field (handheld phone mic, ambient noise).

Published baselines

Whisper Large-v3: 2.7% WER on LibriSpeech test-clean (audiobook-quality audio); 8 to 12% WER on real-world English meetings, podcasts, and call recordings. Source: openai/whisper-large-v3 model card.
xAI Grok Speech-to-Text: 5.0% error rate on phone-call entity recognition, compared to ElevenLabs at 12.0%, Deepgram at 13.5%, and AssemblyAI at 21.3%. Source: xAI Grok STT launch announcement.

Per-language WER (April 2026 retest)

Language	Locale	Studio WER	Conference WER	Field WER	iPhone mic WER †	Speakers tested
English (US)	en-US	4.2%	7.8%	12.4%	14.9%	4
Spanish (Latin Am.)	es-419	5.1%	9.2%	14.6%	17.5%	3
Spanish (Spain)	es-ES	5.4%	9.8%	15.1%	18.1%	3
Portuguese (BR)	pt-BR	5.8%	10.1%	15.8%	19.0%	3
Portuguese (PT)	pt-PT	6.4%	11.2%	17.0%	20.4%	2
French	fr-FR	5.9%	10.4%	16.2%	19.4%	3
German	de-DE	6.1%	10.8%	16.5%	19.8%	3
Italian	it-IT	6.3%	11.0%	17.1%	20.5%	3
Japanese	ja-JP	7.8%	13.5%	19.8%	23.8%	2
Korean	ko-KR	7.5%	13.1%	19.2%	23.0%	2
Mandarin (Simplified)	zh-CN	7.9%	14.0%	20.4%	24.5%	3
Hindi	hi-IN	9.2%	15.8%	23.1%	27.7%	3
Arabic (MSA)	ar	9.6%	16.2%	24.0%	28.8%	2
Russian	ru-RU	6.8%	11.5%	17.4%	20.9%	3
Indonesian	id-ID	7.1%	12.4%	18.5%	22.2%	2

† iPhone mic WER is a projection, not a measurement. Computed as Field WER × 1.2 to account for the noise-cancellation, beam-forming, and codec losses that iPhone built-in microphones introduce on top of a handheld phone mic. The July 2026 retest will replace this column with measured numbers from an iPhone 14, iPhone 15, and iPhone 16 across the same 18 hours per language.

Test methodology

Corpus: 18 hours per language, drawn from Common Voice contributions, public lecture archives, and journalist transcript releases. No customer audio is ever included.
Scoring: Aligned with the original transcript using jiwer, the same library AssemblyAI references. Punctuation and capitalization are not penalized. Speaker labels scored separately.
Cadence: Retested quarterly. Last full run: April 22, 2026. Next: July 2026.
Conditions defined: Studio = single speaker, lavalier or shotgun mic, treated room. Conference = multi-speaker, room mic, occasional overlap. Field = handheld phone mic, ambient crowd or traffic noise.

Speaker diarization

Diarization (attaching a speaker ID to each word) runs on whichever path fits the audio. Routing is automatic; you do not pick.

xAI Grok Speech-to-Text for phone calls, sales calls, depositions, and multi-channel audio. Word-level speaker IDs across 25 languages. 5.0% entity error rate on the published phone-call benchmark vs ElevenLabs 12.0%, Deepgram 13.5%, AssemblyAI 21.3%. On the video and podcast benchmark, Grok and ElevenLabs tie at 2.4% (Deepgram 3.0%, AssemblyAI 3.2%). Launched April 18, 2026.
Google Gemini for general meetings, uploaded video files, and conversational content where turn-taking is not channel-separated. Runs on the same Vertex AI pipeline as transcription, so diarized output is one response.

The pipeline reads the audio's channel layout, source platform, and content type, then picks the better path. Word-level granularity in both cases (not paragraph-level), so a one-sentence interjection in a multi-speaker meeting gets attributed correctly.

Modes: Real-time streaming (for live captures) and batch (for uploads).
Strongest use cases: medical consultations, legal depositions, sales calls, and panel discussions where speaker attribution must be precise.

Sources: xAI Grok STT and TTS API announcement, Google Gemini.

Supported languages

ScreenApp supports 99 languages for transcription via Whisper Large-v3. A subset of 25 of those also supports speaker diarization via xAI Grok STT (marked with †).

Full language list (Whisper Large-v3)

Afrikaans, Albanian, Amharic, Arabic †, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Cantonese, Catalan, Chinese (Mandarin) †, Croatian, Czech, Danish, Dutch †, English †, Estonian, Faroese, Finnish, French †, Galician, Georgian, German †, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi †, Hungarian, Icelandic, Indonesian †, Italian †, Japanese †, Javanese, Kannada, Kazakh, Khmer, Korean †, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese †, Punjabi, Romanian, Russian †, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish †, Sundanese, Swahili, Swedish, Tagalog †, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish †, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese †, Welsh, Yiddish, Yoruba.

† indicates languages with word-level speaker diarization via xAI Grok STT. Other languages are transcribed (text-only) without per-speaker attribution.

Whisper's language list source: github.com/openai/whisper. Grok STT language coverage source: xAI Voice docs.

Platform availability

ScreenApp ships native apps on iOS, macOS, and Android alongside the web app. The numbers below are pulled from each platform's canonical store listing or download endpoint and verified on May 15, 2026. This section is refreshed every release cycle.

iOS app (iPhone, iPad, Apple Silicon Mac via Catalyst)

Listed name: ScreenApp: AI Voice Recorder
App Store rating: 4.0 stars across 94 ratings (Apple iTunes Lookup, May 2026).
Current version: 1.4.25, released March 23, 2026.
First public release: March 18, 2025.
Minimum iOS: 15.0.
Installer size: 103.6 MB.
Supported devices: universal binary covering iPhone 5s through iPhone 17 Pro Max, every iPad from iPad Air onward (including the M4 and M5 iPad Pro lines), iPod touch 6th and 7th generation, and Apple Silicon Macs via Mac Catalyst.
Category and content rating: Productivity, age 4 plus, free.
Publisher: ScreenApp Pty Ltd.
App Store URL: apps.apple.com/us/app/screenapp-ai-voice-recorder/id6741723588.

iOS App Privacy nutrition label

This is the privacy declaration ScreenApp submits to Apple, rendered exactly as it appears on the App Store. Apple's listing is authoritative; if the table below ever diverges from the live App Store page, the App Store page wins and we will update this table within 7 days. ScreenApp declares no tracking data.

Group	Category	Data types
Data Used to Track You	None. ScreenApp does not declare any tracking data.
Data Linked to You	User Content	Photos or Videos, Audio Data
	Identifiers	User ID
	Diagnostics	Performance Data
Data Not Linked to You	Identifiers	Device ID
	Contact Info	Email Address, Name
	Diagnostics	Crash Data, Other Diagnostic Data

Canonical source: the App Privacy section on the ScreenApp App Store page. Full data handling policies on the Trust Center.

macOS (native desktop app)

Distribution: direct DMG download from screenapp.io/desktop. The Mac app is not currently on the Mac App Store; the DMG is the canonical install path.
Installer size: 67.9 MB (ScreenApp-latest.dmg).
Last build pushed: March 22, 2026.
Versioning: rolling release. The installer URL is ScreenApp-latest.dmg and always serves the current production build, so there is no separate version string to copy here.
Architectures: universal binary for Apple Silicon and Intel Macs.
iOS app on Mac: Apple Silicon Macs can also run the iOS app above via Mac Catalyst from the App Store. The native DMG is recommended for desktop workflows that need system audio capture, larger uploads, or background recording.

Android

Listed name: ScreenApp - AI Voice Recorder
Google Play rating: 3.9 stars across 678 ratings (Google Play structured data, May 2026).
Installs: 50,000 plus.
Last update: May 8, 2026.
Category and content rating: Productivity, age 3 plus, free.
Package name: io.screenapp.screenapp_mobile.
Play Store URL: play.google.com/store/apps/details?id=io.screenapp.screenapp_mobile.

Rating and review counts move daily on the App Store and Google Play. The numbers on this page are point-in-time snapshots, dated above. The live store listings are always the authoritative source; if the divergence ever exceeds 0.2 stars or 10 percent of reviews, please flag it via the Trust Center contact form and we will refresh sooner.

Selected customer reviews

A small sample of named, named-role customer reviews drawn from the public reviews page. These are samples, not the full corpus. Aggregate ratings on each app remain whatever the App Store and Google Play report (4.0 stars across 94 ratings on iOS, 3.9 stars across 678 ratings on Android), not the average of the three reviews below.

Mobile and Desktop Excellence

The desktop and mobile apps are fantastic. Recording meetings while I'm mobile has never been easier, and the dictation feature is a huge time-saver. It helps me develop faster and eliminates the guesswork of recalling meeting details.

Kelvin, Software Engineer

A terrific and pleasant recording system

Our overall experience with ScreenApp has been nothing but pleasant. Their support is terrific, and ScreenApp is a great recording system.

Aaron, Verified Capterra User

Game-Changer for Client Calls

Our team was drowning in client feedback until we found ScreenApp. Now we record every presentation and client call, and the AI summaries are spot-on. My team actually looks forward to review sessions now because everything is searchable and actionable.

Millie, Director

Production corpus

The numbers below are real production counts, pulled at build time from the same MongoDB cluster the dashboard reads from. They are not marketing rollups, not rounded, and not estimated. Refresh cadence: every deploy. Last pulled: May 25, 2026.

485,000

recordings processed

transcribed and analysed in production

1,680,000

speakers diarized

unique speaker turns identified across the corpus

275,000

AI Q&A sessions

questions asked against transcribed media

114,000

voice dictations

captured via browser, iOS, and Android

56,000

meeting-bot sessions

across Google Meet, Microsoft Teams, and direct integrations

299,000

analysed video metadata sets

meeting type, speakers, companies extracted

Recent activity (indexed proxy via videometainfo.createdAt): 1,500 recordings analysed in the last 24 hours, 9,500 in the last 7 days, 40,000 in the last 30 days. Daily rate of roughly 1,333 analyses per day.

Why videometainfo and not recordings directly: recordings._id is a UUID, so we cannot do indexed time-range queries on it. Each videometainfo doc maps 1:1 to a recording via the unique recordingId index, so the time-windowed counts above are a faithful proxy. Methodology and the open query module: below.

User base

7,370,623 accounts registered as of May 25, 2026. Number reflects unique verified-email accounts in our production database, pulled at build time from the same source the dashboard reads from. The figure on this page refreshes every deploy, not on a quarterly cadence.

We do not publish round-number marketing claims like "2 million users" without the verifiable underlying count, on this page or elsewhere. If you ever see an inflated or undated user-count claim on a ScreenApp page, that's a content quality issue and we'd like to know: contact us via the Trust Center.

How we count

Every numeric claim on this page and across screenapp.io that depends on production data follows the same pipeline. Numbers are not curated, edited, or rounded for marketing.

Source of truth: the ScreenApp production MongoDB cluster — the same database the dashboard, mobile apps, and backend services read from. No marketing database, no cached marketing CMS.
Build-time pull: the marketing site has no direct database access. At each deploy, the static-site build calls a read-only backend endpoint (GET /v2/site-data) that runs a small set of indexed aggregations and returns scalar counts. The query module is open inside the same repo at scripts/site-data-queries.ts.
Frozen for the build: the returned numbers are written to a local SQLite file (data/stats.db) and read synchronously by every page at static-generation time. Within a single deploy the numbers do not drift; between deploys they refresh.
Soft fail: if the endpoint is unavailable or returns an unexpected shape, the previous deploy's figures are reused and the build proceeds. The site never ships placeholder text in place of a missing number.
Last refresh: the data on this page was pulled on May 25, 2026.

If you ever spot a number on the site that disagrees with a figure on this page, please flag it via the Trust Center. A divergence is a bug.

Free access and pricing

Two ways to use ScreenApp without paying upfront:

Free signup (Free Forever): Process one recording for free. No credit card. No expiry. After your first recording, you'll need a paid plan or the 7-day trial to keep going. This is what we mean when we say "Free Forever": the first recording is free indefinitely, not unlimited recordings free.
7-day Growth trial: Full access to the Growth plan for 7 days. Credit card required (we don't charge until day 8, and you can cancel anytime during the 7 days at no cost). After day 7, the card is billed $228/year.

Paid plans

Growth: $19/month billed annually ($228/year). Unlimited recordings during the active subscription. The 7-day trial above is for this plan.
Business: $34/month billed annually. Adds higher file-size caps, team workspaces, and SSO for enterprise plans.
Monthly billing: Available at higher per-month rates without the 7-day trial.

Current pricing and feature breakdown on the pricing page. What we do NOT offer: a recurring monthly free tier with X minutes/month, a "no credit card" trial of the paid plan, or unlimited free recordings. If you see those claims anywhere on this site, that's a content quality issue, please flag it via the Trust Center and we'll fix the source page.

Security and compliance

SOC 2 Type 2 audited annually. 22 internal policies covering access control, data classification, secure development, and incident response. Continuous control monitoring.

Full live security posture, downloadable SOC 2 Type 2 report, and pre-filled security questionnaire at our Trust Center (trust.inc/screenapp).

Sources and external benchmarks

ScreenApp + Groq case study: speed and cost numbers for the inference layer.
Groq: Whisper at 164x real-time: independent Artificial Analysis benchmark of Whisper Large-v3 on Groq inference.
xAI Grok Speech-to-Text launch: diarization benchmarks, language list, Tesla and Starlink production references.
Dapta: Grok Voice API pricing comparison: independent confirmation that Grok STT undercuts Deepgram and ElevenLabs by roughly 60%.
MarkTechPost: xAI launches Grok STT and TTS APIs: third-party coverage of the launch.
OpenAI Whisper Large-v3 model card: published WER baselines and the 10 to 20% improvement over Large-v2.
openai/whisper on GitHub: full 99-language list and model details.
Artificial Analysis Speech-to-Text leaderboard: independent third-party WER and speed benchmarks across providers.
jiwer: the WER scoring library used in our per-language retests.
ScreenApp Trust Center: SOC 2 Type 2 report, 22 internal policies, sub-processors, security questionnaire.

Errata and corrections

Numbers on ScreenApp pages should match this page. If you find a feature page that contradicts these figures, that's a content quality bug we want to fix. Report it via the Trust Center contact form and we'll update the source page within 7 days.