Open Mic

Real-time speech analysis built on the PAD framework

What this is

Open Mic captures your speech, analyzes it at the frame and word level, and generates a per-word stability score. It was designed for people who stutter — but the scoring engine operates on any speaker. Every session produces a structured record you can review, filter, and compare over time.

🎯 Calibration & Baseline — start here

Open Mic learns how you typically speak before it starts grading you. The first time you use the app, head to the Baseline tab and complete the calibration sessions there. Once your baseline is established, every future session is scored against your own typical production — not against a generic "average speaker" that may not represent you.

Three phases:

Calibration. Record N sessions reading the same short passage (default 5; configurable 2–10). Speak naturally. The passage is locked across all calibration sessions for consistency.
Baseline locks. Open Mic aggregates your calibration sessions into a 6-axis baseline shape — your typical Mean PAD, Fluency, WPM, Voiced ratio, Stability, and Block-free rate. This becomes your personal reference.
Free use. Use the full suite freely. Every session shows a Progress score relative to baseline (positive = more fluent than typical, negative = less). The 3D PAD profile overlays your current session against the baseline shape so you can see exactly which dimensions improved or regressed.

Re-baseline anytime conditions change — after a therapy block, voice changes, mic changes. Old sessions stay in your history; only the reference shape resets.

⚠ Before you start: browser permissions

The first time you click Start Session on the Live tab, your browser will display a permission prompt asking to access your microphone and camera. The prompt usually appears at the top-left of the browser window, near the address bar.

You must click "Allow" for both microphone and camera, or the session will fail to start. If you miss the prompt or click it away, click the camera/lock icon in the address bar to re-grant access, then click Start Session again.

If you're in a session with a client and don't see audio levels moving on the waveform, the most common cause is that the permission prompt was dismissed without granting access. Reload the page and watch for the prompt at the top of the browser window when you click Start Session.

How to use it

Configure your Rubric

Open the Rubric tab. Enter any challenge words you want tracked (feared words, trigger vocabulary), and select any places of articulation where you experience motor difficulty. Pick a scoring mode (Relaxed, Standard, Strict). Save.

This step is optional but significantly improves scoring fidelity. Skip it to use defaults.

Record a session

Open the Live tab. Click Start Session. Watch the top of your browser window for the microphone + camera permission prompt and click Allow. Speak freely — a conversation, reading aloud, a presentation rehearsal, anything. The first ~20 words calibrate your baseline (the status shows "calibrating"). After that, scoring activates.

You'll see the waveform, spectrogram, DFS state bar, and PAD timeline all updating live. Hover the ? button on any graph to see what it means. If the waveform stays flat after Start Session, the permission prompt was likely missed — reload and try again.

Stop when done

Click Stop Session. You'll be auto-switched to the Analysis tab with your completed session loaded.

Review your data

In Analysis, play back the audio with synced transcript. Click any word for a full PAD breakdown. Click any stat on the right (Stutters, Disfluencies, Blocks, etc.) to filter the transcript and highlight matching words.

Revisit earlier sessions

The Sessions tab holds every session from this browser session (in-memory; refresh clears them). Click any card to reload it in Analysis.

What the scoring means

Every word you speak gets a PAD score from 0 to 100. The score starts at 100 and loses points for deviation from your rolling baseline — longer gaps, held syllables, repeated words, low recognition confidence, or flags on challenge words. The color tells you the category:

80–100 Stable — smooth motor planning 60–79 Mid — minor deviation 40–59 Low — notable instability 0–39 Unstable — significant disruption

A word is classified as a stutter when multiple signals converge: severe single events, multiple flags on one word, block confirmed by acoustic analysis, or any flag on a challenge-tagged word.

Tap the ? icon in the top-right corner of any screen for a deep reference on the PAD framework, DFS (Disfluency Feature Stream), stutter detection rules, and phoneme-level analysis.

🎤 When you click Start Session, look at the top of your browser window for the microphone + camera permission prompt and click Allow. The session will not record without it.

Context:

Two-speaker mode

00:00

—

Rhythm Pacer60 BPM · smooth wave

Waveform · amplitude envelope (10s rolling window)

Waveform — Amplitude Envelope

Shows the loudness of your voice over the last 10 seconds, scrolling right to left.

The filled cyan area shows peak amplitude (loudest point in each frame). The solid line shows RMS amplitude (average energy).

What to look for: Steady, rhythmic peaks indicate fluent speech flow. Flat stretches during attempted speech suggest a block. Sharp transients without pattern suggest inconsistent voicing. The gap between peak and RMS lines indicates vocal dynamics — narrow gap = monotone, wide gap = expressive.

Spectrogram · frequency × time (0–4 kHz, hot = dB high)

Spectrogram — Frequency × Time

Shows which frequencies are present in your voice at each moment. The X axis is time (scrolling right to left), Y axis is frequency (0 Hz bottom, 4 kHz top), color is intensity.

Color scale: black = silent, purple/red = moderate energy, yellow/white = high energy.

What to look for: Horizontal bands (formants) indicate vowel production — F1/F2 formant positions distinguish vowels. Vertical streaks indicate consonants. Smooth, continuous formant lines indicate stable articulation; broken or shifting formants often correlate with struggle or block behavior.

Disfluency Feature Stream (DFS) · frame-level acoustic analysis

RMS Peak

0.00

Onsets

Voiced

0ms

Blocks

silent building voiced

DFS — Disfluency Feature Stream

Frame-level acoustic classification running ~60× per second directly in your browser. Each vertical bar is one audio frame, classified into one of three states:

Silent — No voice energy. Normal pauses, breath.

Building — Energy rising but below voicing threshold. Articulatory preparation, airflow onset, or a blocked attempt to voice.

Voiced — Active vocalization. Clear speech signal.

Counters: RMS Peak = loudest moment so far. Onsets = transitions into voicing (articulatory launches). Voiced = total ms of active speech. Blocks = sustained building state over 400ms without reaching voice — an acoustic fingerprint of a motor block.

DFS auto-calibrates to your mic's noise floor in the first 1.5 seconds.

PAD timeline (per word · adaptive baseline)

stable 80–100 mid 60–79 low 40–59 unstable 0–39 block prolong repeat filler

PAD Timeline — Per-Word Stability Score

Each dot is one word. Y axis is the PAD score for that word, 0 (bottom, unstable) to 100 (top, stable). X axis is time (last 30 seconds).

Score color reflects stability: green is stable, red is unstable.

Flag dots above the main line (at the top of the graph) indicate event types: red = block, orange = prolongation, yellow = repetition, gray = filler.

Each word starts at 100 and loses points for deviation from your rolling baseline — longer gaps, held syllables, repeated words, or low recognition confidence. The line shows your motor planning stability evolving over time. Clusters of drops often indicate fatigue, topic difficulty, or contextual stress.

3D PAD profile · live

composing

Transcript

🎯

Establish Your Speaker Baseline

Open Mic measures how you typically speak across a graded difficulty ladder. Each calibration session corresponds to one level — Level 1 is easy, the highest level is hard. Your baseline is built from all the levels combined, capturing your full operating envelope, not just relaxed-state production.

CALIBRATION PROGRESS

0 of 5 calibration sessions complete

CALIBRATION LEVELS (= SESSIONS REQUIRED)

2 minimum · 10 max · each session = one difficulty level

DIFFICULTY LADDER

Each level escalates two dimensions: linguistic complexity and time pressure. The ladder below is what you'll record. Levels with a ✓ are complete; the next session will run the level marked NEXT.

Click to start your next calibration session. The level's passage and target WPM load into Live automatically. Read at the target pace.

How it works

Record N sessions at escalating difficulty. Each one is a single ladder rung — Level 1 is conversational easy reading, Level N is at the limit of what most speakers can produce cleanly.
Baseline becomes a vector of metrics — one point per level. Your typical Mean PAD at Level 1 vs Level 5 is captured separately, so progress can be measured at the difficulty you're working at.
Use the suite freely after baseline locks. Every future session shows a Progress score relative to baseline. Positive = more fluent than typical, negative = less.
Re-baseline anytime if conditions change (after a therapy block, voice changes, mic changes). Old sessions stay in history; only the reference resets.

✓

Baseline Established

Your speaker baseline is locked in. From here, every session is scored against this reference.

Your Baseline 3D PAD Profile

Baseline Metrics

Calibration Sessions That Made This Baseline

Calibration is complete. Use the full Open Mic suite freely — every session will show your progress relative to baseline.

Speaker Baseline

Baseline 3D PAD Profile

Baseline Metrics

Recent Sessions vs Baseline

Want a fresh baseline?

Re-baseline whenever your typical production has changed — after a therapy block, voice changes, mic changes, or simply if your current baseline feels stale. Your existing sessions remain in history. Re-baselining starts a new calibration sequence.

Complete a session or select one from Sessions to review.

Session Stats

Select a session in the Review tab or Sessions list to see articulation detail.

Select sessions to compare

Open Mic

What this is

🎯 Calibration & Baseline — start here

⚠ Before you start: browser permissions

How to use it

Configure your Rubric

Record a session

Stop when done

Review your data

Revisit earlier sessions

What the scoring means

Challenge Passages

Waveform — Amplitude Envelope

Spectrogram — Frequency × Time

DFS — Disfluency Feature Stream

PAD Timeline — Per-Word Stability Score

Establish Your Speaker Baseline

How it works

Baseline Established

Speaker Baseline

Want a fresh baseline?

PAD Over Session

DFS Over Session

Session Stats

Session Comparison

Before launching the Live tab

PAD Scoring Rubric

Challenge Words & Feared Sounds

Challenging Places of Articulation

Target Pace

Scoring Mode

Calibration Length

Fear-Free Scoring

Rhythm Pacer

Session Mode

Teleprompter Script

Ambient Rhythm Glow

Live Screen Graphs

Speaker Baseline

PAD Scoring — How It Works

The PAD Framework

Disfluency Feature Stream (DFS)

Dual-Source Scoring

Stutter Detection

Challenge Words & Places of Articulation

Adaptive Baseline

Score Breakdown (per word, 0–100)

Score Colors

Session Metrics

How to Use This Data

What Others Hear (Phoneme View)