Your Mac can transcribe speech with near-human accuracy — for free, without an internet connection, and without sending a single byte to the cloud. No subscription, no API key, no account. Just your Mac's own processor doing the work.
The technology behind it is called Whisper AI, and in the past two years it has quietly become the backbone of almost every speech-to-text app on macOS. Here's how it works, why on-device matters, and how to start using it today.
Whisper is an open-source automatic speech recognition model created by OpenAI. Released in September 2022, it was trained on over 5 million hours of multilingual audio scraped from the web — podcasts, audiobooks, interviews, lectures, and more.
The result is a model that supports 99+ languages with automatic language detection. You don't need to tell Whisper what language someone is speaking — it figures that out on its own.
In terms of accuracy, Whisper achieves an average word error rate of roughly 8%, meaning it gets about 92% of words correct across all conditions. In clean audio with a single speaker, accuracy can reach 95% or higher. That's competitive with professional human transcription services.
Under the hood, Whisper uses an encoder-decoder Transformer architecture. Audio is converted into a spectrogram (a visual representation of sound frequencies), which the encoder processes into a sequence of features. The decoder then generates text token by token, much like how a large language model writes text — except the input is sound instead of a text prompt.
Whisper comes in multiple model sizes: tiny, base, small, medium, and large. Smaller models run faster but sacrifice accuracy. Larger models are more accurate but demand more CPU and memory. This range of sizes is what makes Whisper practical on a Mac — you can pick the model that fits your hardware.
Most speech-to-text services — Google Speech-to-Text, Amazon Transcribe, Deepgram — work by streaming your audio to a remote server. That server does the processing and sends text back. It's fast and accurate, but it comes with trade-offs.
On-device transcription with Whisper eliminates all of them:
The privacy difference is binary. With cloud transcription, you're trusting a company's promise not to misuse your data. With on-device Whisper, there's nothing to trust — the audio physically never leaves your computer.
Most Mac apps that use Whisper let you choose between model sizes. Here's what each one offers:
| Model | Size | Speed | Best For |
|---|---|---|---|
| Light (small) | ~40 MB | Fastest | Quick captions, low-power Macs |
| Standard (medium) | ~140 MB | Balanced | Daily use, meetings |
| Pro (large-v3) | ~460 MB | Slower | Maximum accuracy, non-English audio |
The trade-off is straightforward: bigger model = better accuracy but more CPU and memory usage. If you have an M1 or newer MacBook, even the Pro model runs comfortably. On older Intel Macs or base-model MacBook Airs, the Light model is the safer choice.
A good rule of thumb: start with Small (the default). If you notice accents, technical jargon, or non-English speech being missed, step up to Medium or Large Turbo. Reserve Pro for when accuracy is non-negotiable — interviews you're publishing, multilingual meetings, or audio with heavy background noise. You can hot-swap models while captioning is running.
Several macOS apps have integrated Whisper for local speech-to-text. Here are the main options available for free:
Real-time live captions in the MacBook notch. NotchLive captures system audio and microphone input simultaneously and transcribes everything live using Whisper. Captions appear directly in the notch area, keeping your screen clean. The Pro tier adds real-time translation to 20 languages, but captioning is free. Download at notchlive.app.
Transcribe audio files by drag-and-drop. Import a recording, podcast episode, or voice memo and MacWhisper generates a timestamped transcript. The free tier supports smaller Whisper models. It's not real-time — you give it a file and wait for the transcript.
Similar to MacWhisper. A file-based transcription app available on the Mac App Store. Drop in audio or video files and get text back. Clean interface, but again — not real-time and not designed for live captions.
Voice dictation that types into any app. SuperWhisper listens to your microphone and types what you say into whichever text field is active. Great for hands-free writing, but it's a dictation tool — it doesn't caption audio playing on your Mac or from other people in a call.
Key distinction: NotchLive is the only free Whisper-based app that provides real-time live captions of system audio. The others are designed for file transcription or voice dictation — different use cases entirely.
Here's the fastest way to go from zero to live captions using NotchLive:
That's it. Everything runs on your Mac. No account to create, no cloud service to configure, no subscription to manage. The entire setup takes under two minutes.
macOS has its own speech recognition, most notably Apple Live Captions (introduced in macOS Ventura). How does it compare to Whisper?
| Feature | Whisper AI | Apple Live Captions |
|---|---|---|
| Source | Open-source (OpenAI) | Proprietary (SpeechAnalyzer) |
| Languages | 99+ | Expanding (English-focused) |
| Intel Mac support | Yes | Apple Silicon only |
| On-device privacy | Yes | Yes |
| Model improvements | Open-source community | Apple updates only |
| Real-time translation | Via apps like NotchLive | No |
| Model size choice | Tiny to large | Fixed |
Both Whisper and Apple's speech recognition run entirely on-device, so privacy is equal. The meaningful differences are language coverage and flexibility. Whisper supports 99+ languages out of the box and runs on Intel Macs too. Apple's system is tightly integrated with macOS but limited in language support and locked to Apple Silicon.
They're not mutually exclusive. NotchLive uses Whisper under the hood but can complement Apple's built-in captions. Some users keep Apple Live Captions as a system-level fallback while using NotchLive for meetings, lectures, and anything they want to translate or record.
Bottom line: Whisper AI gives your Mac free, private, and accurate speech-to-text in 99+ languages — no internet required. The fastest way to try it is to download NotchLive, grab the Light model, and press ⌥⌘C. You'll have live captions running in under two minutes.
NotchLive uses on-device Whisper AI to caption any audio in real time.
100% private. No account. No subscription. Free forever — Pro $14.99 one-time.