Open Mic FluentPlay · PAD v2

idle

Open Mic

Real-time speech analysis built on the PAD framework

What this is

Open Mic captures your speech, analyzes it at the frame and word level, and generates a per-word stability score. It was designed for people who stutter — but the scoring engine operates on any speaker. Every session produces a structured record you can review, filter, and compare over time.

🎯 Calibration & Baseline — start here

Open Mic learns how you typically speak before it starts grading you. The first time you use the app, head to the Baseline tab and complete the calibration sessions there. Once your baseline is established, every future session is scored against your own typical production — not against a generic "average speaker" that may not represent you.

Three phases:

  1. Calibration. Record N sessions reading the same short passage (default 5; configurable 2–10). Speak naturally. The passage is locked across all calibration sessions for consistency.
  2. Baseline locks. Open Mic aggregates your calibration sessions into a 6-axis baseline shape — your typical Mean PAD, Fluency, WPM, Voiced ratio, Stability, and Block-free rate. This becomes your personal reference.
  3. Free use. Use the full suite freely. Every session shows a Progress score relative to baseline (positive = more fluent than typical, negative = less). The 3D PAD profile overlays your current session against the baseline shape so you can see exactly which dimensions improved or regressed.

Re-baseline anytime conditions change — after a therapy block, voice changes, mic changes. Old sessions stay in your history; only the reference shape resets.

⚠ Before you start: browser permissions

The first time you click Start Session on the Live tab, your browser will display a permission prompt asking to access your microphone and camera. The prompt usually appears at the top-left of the browser window, near the address bar.

You must click "Allow" for both microphone and camera, or the session will fail to start. If you miss the prompt or click it away, click the camera/lock icon in the address bar to re-grant access, then click Start Session again.

If you're in a session with a client and don't see audio levels moving on the waveform, the most common cause is that the permission prompt was dismissed without granting access. Reload the page and watch for the prompt at the top of the browser window when you click Start Session.

How to use it

1

Configure your Rubric

Open the Rubric tab. Enter any challenge words you want tracked (feared words, trigger vocabulary), and select any places of articulation where you experience motor difficulty. Pick a scoring mode (Relaxed, Standard, Strict). Save.

This step is optional but significantly improves scoring fidelity. Skip it to use defaults.

2

Record a session

Open the Live tab. Click Start Session. Watch the top of your browser window for the microphone + camera permission prompt and click Allow. Speak freely — a conversation, reading aloud, a presentation rehearsal, anything. The first ~20 words calibrate your baseline (the status shows "calibrating"). After that, scoring activates.

You'll see the waveform, spectrogram, DFS state bar, and PAD timeline all updating live. Hover the ? button on any graph to see what it means. If the waveform stays flat after Start Session, the permission prompt was likely missed — reload and try again.

3

Stop when done

Click Stop Session. You'll be auto-switched to the Analysis tab with your completed session loaded.

4

Review your data

In Analysis, play back the audio with synced transcript. Click any word for a full PAD breakdown. Click any stat on the right (Stutters, Disfluencies, Blocks, etc.) to filter the transcript and highlight matching words.

5

Revisit earlier sessions

The Sessions tab holds every session from this browser session (in-memory; refresh clears them). Click any card to reload it in Analysis.

What the scoring means

Every word you speak gets a PAD score from 0 to 100. The score starts at 100 and loses points for deviation from your rolling baseline — longer gaps, held syllables, repeated words, low recognition confidence, or flags on challenge words. The color tells you the category:

80–100 Stable — smooth motor planning 60–79 Mid — minor deviation 40–59 Low — notable instability 0–39 Unstable — significant disruption

A word is classified as a stutter when multiple signals converge: severe single events, multiple flags on one word, block confirmed by acoustic analysis, or any flag on a challenge-tagged word.

Tap the ? icon in the top-right corner of any screen for a deep reference on the PAD framework, DFS (Disfluency Feature Stream), stutter detection rules, and phoneme-level analysis.

🎤 When you click Start Session, look at the top of your browser window for the microphone + camera permission prompt and click Allow. The session will not record without it.
00:00
0
0
0
Rhythm Pacer60 BPM · smooth wave
Waveform · amplitude envelope (10s rolling window)

Waveform — Amplitude Envelope

Shows the loudness of your voice over the last 10 seconds, scrolling right to left.

The filled cyan area shows peak amplitude (loudest point in each frame). The solid line shows RMS amplitude (average energy).

What to look for: Steady, rhythmic peaks indicate fluent speech flow. Flat stretches during attempted speech suggest a block. Sharp transients without pattern suggest inconsistent voicing. The gap between peak and RMS lines indicates vocal dynamics — narrow gap = monotone, wide gap = expressive.

Spectrogram · frequency × time (0–4 kHz, hot = dB high)

Spectrogram — Frequency × Time

Shows which frequencies are present in your voice at each moment. The X axis is time (scrolling right to left), Y axis is frequency (0 Hz bottom, 4 kHz top), color is intensity.

Color scale: black = silent, purple/red = moderate energy, yellow/white = high energy.

What to look for: Horizontal bands (formants) indicate vowel production — F1/F2 formant positions distinguish vowels. Vertical streaks indicate consonants. Smooth, continuous formant lines indicate stable articulation; broken or shifting formants often correlate with struggle or block behavior.

Disfluency Feature Stream (DFS) · frame-level acoustic analysis
RMS Peak
0.00
Onsets
0
Voiced
0ms
Blocks
0
silent building voiced

DFS — Disfluency Feature Stream

Frame-level acoustic classification running ~60× per second directly in your browser. Each vertical bar is one audio frame, classified into one of three states:

Silent — No voice energy. Normal pauses, breath.

Building — Energy rising but below voicing threshold. Articulatory preparation, airflow onset, or a blocked attempt to voice.

Voiced — Active vocalization. Clear speech signal.

Counters: RMS Peak = loudest moment so far. Onsets = transitions into voicing (articulatory launches). Voiced = total ms of active speech. Blocks = sustained building state over 400ms without reaching voice — an acoustic fingerprint of a motor block.

DFS auto-calibrates to your mic's noise floor in the first 1.5 seconds.

PAD timeline (per word · adaptive baseline)
stable 80–100 mid 60–79 low 40–59 unstable 0–39 block prolong repeat filler

PAD Timeline — Per-Word Stability Score

Each dot is one word. Y axis is the PAD score for that word, 0 (bottom, unstable) to 100 (top, stable). X axis is time (last 30 seconds).

Score color reflects stability: green is stable, red is unstable.

Flag dots above the main line (at the top of the graph) indicate event types: red = block, orange = prolongation, yellow = repetition, gray = filler.

Each word starts at 100 and loses points for deviation from your rolling baseline — longer gaps, held syllables, repeated words, or low recognition confidence. The line shows your motor planning stability evolving over time. Clusters of drops often indicate fatigue, topic difficulty, or contextual stress.

3D PAD profile · live
composing
Transcript
🎯

Establish Your Speaker Baseline

Open Mic measures how you typically speak across a graded difficulty ladder. Each calibration session corresponds to one level — Level 1 is easy, the highest level is hard. Your baseline is built from all the levels combined, capturing your full operating envelope, not just relaxed-state production.

CALIBRATION PROGRESS
0 of 5 calibration sessions complete
CALIBRATION LEVELS (= SESSIONS REQUIRED)
2 minimum · 10 max · each session = one difficulty level
DIFFICULTY LADDER

Each level escalates two dimensions: linguistic complexity and time pressure. The ladder below is what you'll record. Levels with a ✓ are complete; the next session will run the level marked NEXT.

Click to start your next calibration session. The level's passage and target WPM load into Live automatically. Read at the target pace.

How it works

  1. Record N sessions at escalating difficulty. Each one is a single ladder rung — Level 1 is conversational easy reading, Level N is at the limit of what most speakers can produce cleanly.
  2. Baseline becomes a vector of metrics — one point per level. Your typical Mean PAD at Level 1 vs Level 5 is captured separately, so progress can be measured at the difficulty you're working at.
  3. Use the suite freely after baseline locks. Every future session shows a Progress score relative to baseline. Positive = more fluent than typical, negative = less.
  4. Re-baseline anytime if conditions change (after a therapy block, voice changes, mic changes). Old sessions stay in history; only the reference resets.
Complete a session or select one from Sessions to review.

Session Stats

Select a session in the Review tab or Sessions list to see articulation detail.
Select sessions to compare

Before launching the Live tab

Browser permissions: When you click Start Session for the first time, the browser will display a microphone + camera permission prompt at the top of the window. You must click Allow for the session to record. If the prompt is dismissed, click the camera/lock icon in the address bar to re-grant access.

Two-speaker sessions: If you'll be in a back-and-forth conversation with a client, enable Two-speaker mode on the Live screen and tap the User 1 / User 2 buttons to mark whose turn it is. This prevents speech-to-text from blending both voices into one transcript.

PAD Scoring Rubric

Configure how Open Mic scores your speech. Changes apply to the next session.

Challenge Words & Feared Sounds

Words or phrases you have difficulty producing — feared words, trigger words, or known-challenge vocabulary. When these appear in a session, the scoring engine applies heightened sensitivity: any instability on a challenge word is weighted as a significant event, and stable production is flagged as a win.

0 challenge words configured

Challenging Places of Articulation

Select the places of articulation where you commonly experience motor difficulty. Any word whose onset phoneme uses one of these places will be scored with elevated sensitivity. This data also forms the substrate for future EEG integration — pre-SMA signals can be correlated with predicted challenge phonemes.

0 places of articulation flagged

Target Pace

Your desired speaking rate. PAD adjusts duration baselines to your target — speaking deliberately at 80 WPM won't be penalized for slowness if that's your goal. Set to Auto to let calibration determine your natural pace.

120

Scoring Mode

Controls how aggressively PAD flags disfluencies. Relaxed is forgiving — good for early sessions or high-anxiety contexts. Strict catches subtle instabilities — useful for targeted practice on known-safe material.

Block: 3× median gap · Prolongation: 2× median duration · Standard sensitivity

Calibration Length

How many words before PAD scoring activates. More words = more stable baseline but longer delay before scores appear.

20 words

Fear-Free Scoring

When enabled, the scoring engine applies NO amplification for challenge words or challenging places of articulation. The PAD score reflects only the acoustic and timing signal — the motor planning reality — independent of anticipatory context or rubric-declared fear. Useful for measuring improvement in the underlying production system, separate from context-driven anxiety.

Rhythm Pacer

Optional on-screen rhythm wave that pulses at a chosen cadence. Use it to pace your speech to a consistent tempo — a fluency shaping technique that can reduce block frequency by anchoring motor timing to an external beat. Appears on the Live screen when enabled.

60 BPM
Ready

Session Mode

Tells Open Mic how to evaluate this session. Reading mode uses the script below as a reference for Azure pronunciation assessment — unlocking Insertion, Omission, and Completeness signals. Conversational mode (Open Mic / free talk) evaluates spontaneous speech with no reference text.

Pick a mode to see what it does.

Teleprompter Script

Pre-load text to be displayed on the Live screen during your session. Words advance automatically at your target WPM, synchronized with the rhythm pacer.

Default passage: the Grandfather Passage — a standard clinical speech-assessment text developed for motor speech disorders research (Darley, Aronson & Brown, 1975). It contains all English phonemes in natural prose, which is why SLPs and researchers use it as a reference reading. It gives you a consistent baseline you can re-read across sessions to track real progress on identical material. You can replace it with any text you want — your own script, a favorite passage, conversation prompts, a presentation draft.

0 words · 0s at 120 WPM

Ambient Rhythm Glow

When enabled alongside the rhythm pacer, the full viewport edges pulse with the beat. Designed for peripheral perception — you can feel the rhythm without looking at the pacer, which keeps your eyes free for the teleprompter or another focus.

Live Screen Graphs

Show or hide individual graphs on the Live screen. Disable any visualization you don't want to see during recording.

Speaker Baseline

First N sessions are aggregated into your speaker baseline — the reference shape for all future sessions. Once established, every new session shows a Progress score (positive = more fluent than baseline, negative = less). Re-baseline whenever conditions change (after a therapy block, voice change, mic change).

PAD Scoring — How It Works

The PAD Framework

PAD (Predictive Adaptive Detection) models stuttering as predictable instability in pre-articulatory motor planning — the neural process that assembles a speech plan before articulation begins. The framework treats disfluency not as a binary event but as a continuous signal: every syllable carries a measurable planning-stability signature.

The core insight: stuttering reflects instability in the pre-SMA (supplementary motor area), not at articulation onset. PAD quantifies this instability as a computable, treatable signal.

Disfluency Feature Stream (DFS)

The DFS panel runs a frame-level acoustic analysis (~60 frames/second) directly in the browser. Every frame is classified as one of three states:

Silent — No voice energy detected. Normal pauses, breath.

Building — Energy rising but below voicing threshold. Articulatory preparation, airflow onset, or a blocked attempt to voice.

Voiced — Active vocalization. Clear speech signal.

From these frame classifications, DFS derives four features: RMS Peak (maximum signal intensity), Onsets (transitions into voicing — each onset is one articulatory launch), Voiced ms (total time in active speech), and Blocks (sustained building state >400ms without reaching voicing — an acoustic fingerprint of a motor block).

DFS calibrates to your microphone's noise floor in the first 1.5 seconds. Thresholds adapt to your environment automatically.

Dual-Source Scoring

PAD scores combine two independent data streams: Azure word-level timing (gaps, durations, confidence) and DFS acoustic features (frame-level voicing state). A block can be detected by Azure (long inter-word gap) or by DFS (sustained building state), or both. This dual-source approach catches events that either system alone would miss.

Stutter Detection

Not every disfluency is a stutter. A long pause, a single prolonged syllable, or a filler word can occur in fluent speech. Open Mic classifies a word as a stutter only when multiple signals converge — making the classification clinically meaningful rather than over-sensitive.

A word is flagged as a stutter when any of the following conditions are met:

• Final PAD score below 45 with at least one disfluency flag

• Two or more disfluency flags on the same word (e.g., block AND prolongation)

• Azure-detected block CONFIRMED by DFS acoustic block (both signals agree)

• Any disfluency flag on a challenge word or challenge place-of-articulation word (challenge-tagged words are scored with heightened sensitivity)

Detected stutters are highlighted with a red border, bold white text, a ⚠ badge, and an aggressive shake-and-flash animation when they first appear live. They are counted separately from disfluencies in the Analysis view.

Challenge Words & Places of Articulation

The Rubric tab lets you configure challenge words (feared words, trigger vocabulary, known-difficulty phrases) and challenging places of articulation (bilabial, alveolar, velar, etc.). When a word in a session matches either, the scoring engine:

• Tightens the block threshold by 25% (easier to trigger)

• Tightens the prolongation threshold by 20% (easier to trigger)

• Amplifies all penalty values by 35%

• Any disfluency flag on a challenge word counts as a stutter

• A clean production (score ≥85, no flags) on a challenge word is marked as a challenge win with a ★ badge — measurable evidence of successful production on known-difficult material.

Place-of-articulation detection maps each word's onset phoneme to one of nine POA categories from its orthography. This is an English-orthography approximation, not IPA-derived — it's intended as a practical first pass that will be refined when Azure phoneme data is integrated more deeply.

This data structure is also the substrate for future EEG integration: pre-SMA activation signals can be time-locked to predicted challenge phonemes to study planning-instability patterns at the neural level.

Adaptive Baseline

Open Mic calibrates to YOUR speech during the first 20 words. All scoring is relative to your own patterns — not a fixed standard. A fast speaker and a slow speaker can both score 100 if their timing is internally consistent. The baseline updates on a rolling 30-word window, so it adapts as you warm up or change pace.

Score Breakdown (per word, 0–100)

Every word starts at 100. Penalties are deducted based on deviation from your baseline:

Block — Gap before a word exceeds 2.5× your median inter-word gap. Indicates a planning interruption. Chunk boundaries (natural sentence breaks) are excluded. Max penalty: 35.

Prolongation — Per-syllable duration exceeds 1.8× your median. Indicates articulatory hold. Max penalty: 30.

Repetition — Same word appears back-to-back rapidly (<300ms gap for 2×, any speed for 3×+). Flat penalty: 20.

Filler — Common hesitation markers (um, uh, er, ah, hmm). Low penalty: 8. Tracked separately from disfluencies.

Confidence — Azure recognition confidence below 1.0 adds a small penalty (up to 20), as low confidence often correlates with atypical production.

Score Colors

80–100: Stable — smooth motor planning

60–79: Mid — minor timing deviation

40–59: Low — notable instability

0–39: Unstable — significant planning disruption

Session Metrics

Words/min — Speaking rate. Not scored, just context.

Session PAD — Headline session score. Computed as stability floor − event cost, both syllable-weighted. Rewards clean production across the session while pricing in the severity and duration of stutter events. Does not dilute with word count.

Stability Floor — Syllable-weighted mean of per-word PAD scores. Shows how stable the motor signal was overall, independent of discrete events.

Event Cost — Points deducted from the floor, scaled to the severity and syllable weight of each stutter-flagged word. Zero if no stutters fired.

Disfluencies — Count of blocks (B), prolongations (P), and repetitions (R).

Fillers — Count of filler words. Tracked separately because fillers indicate planning load but are not the same as motor disruptions.

Fluency Index — Stability floor weighted by disfluency rate. Legacy metric, preserved for continuity with older sessions.

Stability (σ) — Standard deviation of PAD scores. Low σ = consistent performance. High σ = variable — some words smooth, some disrupted.

How to Use This Data

Hover any word in the transcript to see its full PAD breakdown. Click any word in the playback transcript to jump to that moment in the recording. Use the PAD timeline to spot patterns — clusters of drops may indicate fatigue, topic difficulty, or emotional load.

What Others Hear (Phoneme View)

When you hover a word, the tooltip shows the individual phonemes (speech sounds) that Azure detected, displayed in IPA notation with accuracy scores. This reveals how your production was perceived at the acoustic level — which sounds came through clearly and which were ambiguous.

Phoneme colors follow the same scale: 80+ clear, 60–79 acceptable, 40–59 unclear, below 40 significantly distorted.

"Also heard as" shows Azure's alternative interpretations of what you said — if multiple alternatives appear, your production was acoustically ambiguous, which is useful signal regardless of whether you stuttered.