Open Mic
Real-time speech analysis built on the PAD framework
What this is
Open Mic captures your speech, analyzes it at the frame and word level, and generates a per-word stability score. It was designed for people who stutter — but the scoring engine operates on any speaker. Every session produces a structured record you can review, filter, and compare over time.
🎯 Calibration & Baseline — start here
Open Mic learns how you typically speak before it starts grading you. The first time you use the app, head to the Baseline tab and complete the calibration sessions there. Once your baseline is established, every future session is scored against your own typical production — not against a generic "average speaker" that may not represent you.
Three phases:
- Calibration. Record N sessions reading the same short passage (default 5; configurable 2–10). Speak naturally. The passage is locked across all calibration sessions for consistency.
- Baseline locks. Open Mic aggregates your calibration sessions into a 6-axis baseline shape — your typical Mean PAD, Fluency, WPM, Voiced ratio, Stability, and Block-free rate. This becomes your personal reference.
- Free use. Use the full suite freely. Every session shows a Progress score relative to baseline (positive = more fluent than typical, negative = less). The 3D PAD profile overlays your current session against the baseline shape so you can see exactly which dimensions improved or regressed.
Re-baseline anytime conditions change — after a therapy block, voice changes, mic changes. Old sessions stay in your history; only the reference shape resets.
⚠ Before you start: browser permissions
The first time you click Start Session on the Live tab, your browser will display a permission prompt asking to access your microphone and camera. The prompt usually appears at the top-left of the browser window, near the address bar.
You must click "Allow" for both microphone and camera, or the session will fail to start. If you miss the prompt or click it away, click the camera/lock icon in the address bar to re-grant access, then click Start Session again.
If you're in a session with a client and don't see audio levels moving on the waveform, the most common cause is that the permission prompt was dismissed without granting access. Reload the page and watch for the prompt at the top of the browser window when you click Start Session.
How to use it
Configure your Rubric
Open the Rubric tab. Enter any challenge words you want tracked (feared words, trigger vocabulary), and select any places of articulation where you experience motor difficulty. Pick a scoring mode (Relaxed, Standard, Strict). Save.
This step is optional but significantly improves scoring fidelity. Skip it to use defaults.
Record a session
Open the Live tab. Click Start Session. Watch the top of your browser window for the microphone + camera permission prompt and click Allow. Speak freely — a conversation, reading aloud, a presentation rehearsal, anything. The first ~20 words calibrate your baseline (the status shows "calibrating"). After that, scoring activates.
You'll see the waveform, spectrogram, DFS state bar, and PAD timeline all updating live. Hover the ? button on any graph to see what it means. If the waveform stays flat after Start Session, the permission prompt was likely missed — reload and try again.
Stop when done
Click Stop Session. You'll be auto-switched to the Analysis tab with your completed session loaded.
Review your data
In Analysis, play back the audio with synced transcript. Click any word for a full PAD breakdown. Click any stat on the right (Stutters, Disfluencies, Blocks, etc.) to filter the transcript and highlight matching words.
Revisit earlier sessions
The Sessions tab holds every session from this browser session (in-memory; refresh clears them). Click any card to reload it in Analysis.
What the scoring means
Every word you speak gets a PAD score from 0 to 100. The score starts at 100 and loses points for deviation from your rolling baseline — longer gaps, held syllables, repeated words, low recognition confidence, or flags on challenge words. The color tells you the category:
A word is classified as a stutter when multiple signals converge: severe single events, multiple flags on one word, block confirmed by acoustic analysis, or any flag on a challenge-tagged word.
Tap the ? icon in the top-right corner of any screen for a deep reference on the PAD framework, DFS (Disfluency Feature Stream), stutter detection rules, and phoneme-level analysis.
Waveform — Amplitude Envelope
Shows the loudness of your voice over the last 10 seconds, scrolling right to left.
The filled cyan area shows peak amplitude (loudest point in each frame). The solid line shows RMS amplitude (average energy).
What to look for: Steady, rhythmic peaks indicate fluent speech flow. Flat stretches during attempted speech suggest a block. Sharp transients without pattern suggest inconsistent voicing. The gap between peak and RMS lines indicates vocal dynamics — narrow gap = monotone, wide gap = expressive.
Spectrogram — Frequency × Time
Shows which frequencies are present in your voice at each moment. The X axis is time (scrolling right to left), Y axis is frequency (0 Hz bottom, 4 kHz top), color is intensity.
Color scale: black = silent, purple/red = moderate energy, yellow/white = high energy.
What to look for: Horizontal bands (formants) indicate vowel production — F1/F2 formant positions distinguish vowels. Vertical streaks indicate consonants. Smooth, continuous formant lines indicate stable articulation; broken or shifting formants often correlate with struggle or block behavior.
DFS — Disfluency Feature Stream
Frame-level acoustic classification running ~60× per second directly in your browser. Each vertical bar is one audio frame, classified into one of three states:
Silent — No voice energy. Normal pauses, breath.
Building — Energy rising but below voicing threshold. Articulatory preparation, airflow onset, or a blocked attempt to voice.
Voiced — Active vocalization. Clear speech signal.
Counters: RMS Peak = loudest moment so far. Onsets = transitions into voicing (articulatory launches). Voiced = total ms of active speech. Blocks = sustained building state over 400ms without reaching voice — an acoustic fingerprint of a motor block.
DFS auto-calibrates to your mic's noise floor in the first 1.5 seconds.
PAD Timeline — Per-Word Stability Score
Each dot is one word. Y axis is the PAD score for that word, 0 (bottom, unstable) to 100 (top, stable). X axis is time (last 30 seconds).
Score color reflects stability: green is stable, red is unstable.
Flag dots above the main line (at the top of the graph) indicate event types: red = block, orange = prolongation, yellow = repetition, gray = filler.
Each word starts at 100 and loses points for deviation from your rolling baseline — longer gaps, held syllables, repeated words, or low recognition confidence. The line shows your motor planning stability evolving over time. Clusters of drops often indicate fatigue, topic difficulty, or contextual stress.
Establish Your Speaker Baseline
Open Mic measures how you typically speak across a graded difficulty ladder. Each calibration session corresponds to one level — Level 1 is easy, the highest level is hard. Your baseline is built from all the levels combined, capturing your full operating envelope, not just relaxed-state production.
Each level escalates two dimensions: linguistic complexity and time pressure. The ladder below is what you'll record. Levels with a ✓ are complete; the next session will run the level marked NEXT.
Click to start your next calibration session. The level's passage and target WPM load into Live automatically. Read at the target pace.
How it works
- Record N sessions at escalating difficulty. Each one is a single ladder rung — Level 1 is conversational easy reading, Level N is at the limit of what most speakers can produce cleanly.
- Baseline becomes a vector of metrics — one point per level. Your typical Mean PAD at Level 1 vs Level 5 is captured separately, so progress can be measured at the difficulty you're working at.
- Use the suite freely after baseline locks. Every future session shows a Progress score relative to baseline. Positive = more fluent than typical, negative = less.
- Re-baseline anytime if conditions change (after a therapy block, voice changes, mic changes). Old sessions stay in history; only the reference resets.
Session Stats
Before launching the Live tab
Browser permissions: When you click Start Session for the first time, the browser will display a microphone + camera permission prompt at the top of the window. You must click Allow for the session to record. If the prompt is dismissed, click the camera/lock icon in the address bar to re-grant access.
Two-speaker sessions: If you'll be in a back-and-forth conversation with a client, enable Two-speaker mode on the Live screen and tap the User 1 / User 2 buttons to mark whose turn it is. This prevents speech-to-text from blending both voices into one transcript.
PAD Scoring Rubric
Configure how Open Mic scores your speech. Changes apply to the next session.
Challenge Words & Feared Sounds
Words or phrases you have difficulty producing — feared words, trigger words, or known-challenge vocabulary. When these appear in a session, the scoring engine applies heightened sensitivity: any instability on a challenge word is weighted as a significant event, and stable production is flagged as a win.
Challenging Places of Articulation
Select the places of articulation where you commonly experience motor difficulty. Any word whose onset phoneme uses one of these places will be scored with elevated sensitivity. This data also forms the substrate for future EEG integration — pre-SMA signals can be correlated with predicted challenge phonemes.
Target Pace
Your desired speaking rate. PAD adjusts duration baselines to your target — speaking deliberately at 80 WPM won't be penalized for slowness if that's your goal. Set to Auto to let calibration determine your natural pace.
Scoring Mode
Controls how aggressively PAD flags disfluencies. Relaxed is forgiving — good for early sessions or high-anxiety contexts. Strict catches subtle instabilities — useful for targeted practice on known-safe material.
Calibration Length
How many words before PAD scoring activates. More words = more stable baseline but longer delay before scores appear.
Fear-Free Scoring
When enabled, the scoring engine applies NO amplification for challenge words or challenging places of articulation. The PAD score reflects only the acoustic and timing signal — the motor planning reality — independent of anticipatory context or rubric-declared fear. Useful for measuring improvement in the underlying production system, separate from context-driven anxiety.
Rhythm Pacer
Optional on-screen rhythm wave that pulses at a chosen cadence. Use it to pace your speech to a consistent tempo — a fluency shaping technique that can reduce block frequency by anchoring motor timing to an external beat. Appears on the Live screen when enabled.
Session Mode
Tells Open Mic how to evaluate this session. Reading mode uses the script below as a reference for Azure pronunciation assessment — unlocking Insertion, Omission, and Completeness signals. Conversational mode (Open Mic / free talk) evaluates spontaneous speech with no reference text.
Teleprompter Script
Pre-load text to be displayed on the Live screen during your session. Words advance automatically at your target WPM, synchronized with the rhythm pacer.
Default passage: the Grandfather Passage — a standard clinical speech-assessment text developed for motor speech disorders research (Darley, Aronson & Brown, 1975). It contains all English phonemes in natural prose, which is why SLPs and researchers use it as a reference reading. It gives you a consistent baseline you can re-read across sessions to track real progress on identical material. You can replace it with any text you want — your own script, a favorite passage, conversation prompts, a presentation draft.
Ambient Rhythm Glow
When enabled alongside the rhythm pacer, the full viewport edges pulse with the beat. Designed for peripheral perception — you can feel the rhythm without looking at the pacer, which keeps your eyes free for the teleprompter or another focus.
Live Screen Graphs
Show or hide individual graphs on the Live screen. Disable any visualization you don't want to see during recording.
Speaker Baseline
First N sessions are aggregated into your speaker baseline — the reference shape for all future sessions. Once established, every new session shows a Progress score (positive = more fluent than baseline, negative = less). Re-baseline whenever conditions change (after a therapy block, voice change, mic change).