Speech analytics in contact centres
Manual QA reviews 1–3% of calls. Speech analytics reviews 100%. The technology converts unstructured audio into structured operational data — which phrases were used, how long holds lasted, whether a disclosure was made. When applied to the right problems, the ROI is strong. When applied to vendor promises about emotion detection, it is not.
Note on legal jurisdiction
This guide describes UK GDPR and data protection obligations as they apply to contact centres operating in Great Britain. Data protection law varies by jurisdiction. Always verify the requirements applicable to your operation with your Data Protection Officer or legal counsel before changing data handling practices. This guide is for operational context, not legal advice.
Phonetic search vs. transcription-based analysis
Phonetic search
How it works: Searches the audio recording directly by sound pattern. The system does not first convert speech to text — it looks for phoneme sequences that match the search term.
Accuracy: 85–92% for common English phrases in clear audio. Drops significantly with accents, background noise, or technical vocabulary.
Speed: Fast — can search 1,000 hours of audio in minutes without transcription step.
Best for: High-volume keyword and phrase search (specific word detection, prohibited phrase monitoring, competitor name mentions).
Limitation: Cannot handle context — 'not happy' and 'very happy' produce similar phonemes in parts. No sentence-level understanding.
Transcription-based analysis
How it works: Converts speech to text first (using ASR — Automatic Speech Recognition), then analyses the text using NLP. Produces a full text transcript that can be searched, categorised, and fed to LLM-based models.
Accuracy: 92–98% for clear UK English audio on modern ASR models (Google, AWS Transcribe, Azure Cognitive Services). Lower for accents, cross-talk, poor audio quality.
Speed: Slower than phonetic — transcription takes processing time. Batch or near-real-time rather than instant.
Best for: Auto-QA scoring, sentiment analysis, topic categorisation, emerging theme detection, integration with generative AI for call summarisation.
Limitation: Accuracy drops significantly with poor audio quality or strong regional accents. Transcription errors compound in downstream analysis.
Use case matrix: what speech analytics can and cannot do
| Use case | Technical approach | Realistic accuracy | Operational value |
|---|---|---|---|
| Mandatory disclosure detection (FCA, GDPR, TCF) | Phonetic or transcription keyword search | 90–96% on standard phrases | High — compliance monitoring at 100% coverage vs. 1-3% manual sample; regulatory evidence on demand |
| Prohibited phrase detection ('guaranteed', 'risk-free') | Phonetic search | 85–92% | High — risk phrase alerts for coaching and compliance before they become FCA findings |
| Silence and hold detection (AHT analysis) | Audio signal analysis (no transcription needed) | 95–99% | Very high — silence patterns reveal hold abuse, system navigation delays, knowledge gaps. Directly actionable for AHT reduction |
| Call categorisation by topic | Transcription + NLP topic modelling | 80–90% for top 10 topics | High — replaces manual wrap code entry; reduces ACW; more accurate categorisation of contact types |
| Auto-QA scoring (objective criteria) | Transcription + checklist matching | 85–95% | High — covers 100% of contacts for objective items (disclosure, script adherence, resolution code) |
| Emotion/sentiment detection (customer distress, frustration) | Transcription + acoustic analysis | 60–80% — tone and text combined | Medium — useful as a flag for supervisory review; not reliable enough for standalone performance assessment |
| Agent empathy and tone quality scoring | Transcription + LLM evaluation | 60–75% alignment with manual QA | Medium — directional signal only; manual QA still required for nuanced quality judgements |
| Competitor name and churn intent detection | Phonetic or transcription keyword search | 85–95% | Medium-high — feeds save team routing and competitive intelligence |
Silence detection as an AHT diagnostic tool
Silence analysis breaks AHT into components that reveal specific causes
Hold silence
Agent placed the call on hold deliberately. If average hold duration >2 minutes in a specific contact type, investigate: is the knowledge base inadequate? Is an approval needed? Is the system slow?
Action: Process redesign, knowledge improvement, system performance review.
Dead air / mutual silence
Neither agent nor customer speaking. Common during agent desktop navigation between systems — especially legacy multi-application desktops. If >30 seconds, indicates system friction.
Action: Desktop simplification, application consolidation, faster navigation training.
Agent monologue (no customer response)
Agent speaking for >2 minutes without customer interruption. May indicate agent is reading from a script without checking comprehension, or customer is disengaged.
Action: Coaching — pacing, comprehension checks, dialogue structure.
Long ACW silence (post-call)
Recording ends but ACW code not entered — agent is navigating post-call admin. If >90 seconds after call end, system or process friction.
Action: ACW process simplification; ACW system access review.
Realistic ROI: where speech analytics pays and where it does not
High-ROI applications
- ✓Compliance monitoring at 100% coverage — saves cost of regulatory fines and remediation that manual sampling misses
- ✓AHT reduction via silence analysis — identifying specific silence types and addressing root causes typically delivers 5–15% AHT reduction
- ✓Auto-categorisation replacing wrap codes — reduces ACW and improves data quality for forecasting
- ✓Churn intent detection feeding save team routing — incremental save revenue vs. cost of analytics licence
Overstated in vendor pitches
- ✗Emotion detection as a performance management tool — accuracy too low; legal challenge risk (Equality Act); GDPR special category data concerns
- ✗Auto-QA replacing manual QA entirely — 60-75% alignment on subjective criteria means 25-40% wrong. Keep manual QA for nuanced quality dimensions
- ✗100% of contacts auto-scored for agent performance review — auto-QA at this scope requires significant calibration effort and creates industrial relations risk if not validated by human review
- ✗Speech analytics as a cost-reduction tool in isolation — it finds problems; fixing them requires operational change effort that is separate from the analytics licence cost
Speech analytics questions
What is speech analytics in a contact centre?
Technology that analyses call recordings to extract structured data: words/phrases used, silence duration, regulatory disclosures made, tone/emotion signals, topic categorisation. There are two approaches: phonetic search (searches audio directly by sound pattern — fast but less accurate) and transcription-based (converts speech to text first, then analyses — more accurate but slower). Both enable analysis of 100% of calls vs. 1-3% manual QA sampling.
What is auto-QA and how accurate is it?
Auto-QA uses speech analytics to score call recordings against a quality framework without human review. Accuracy for objective items (script adherence, disclosure compliance): 85–95%. Accuracy for subjective quality (empathy, tone, customer experience quality): 60–80%. Best practice: auto-QA on 100% of calls for objective criteria; manual QA on 8–10 per agent per month for subjective dimensions. Do not use auto-QA alone as a standalone performance assessment tool.
Related guides
QA framework
Manual QA design and calibration
CC technology
Technology stack overview
AI in contact centres
AI and speech analytics integration
Compliance guide
FCA and regulatory monitoring
AHT guide
AHT reduction strategies
GDPR & data protection
Recording analytics and GDPR
AHT calculator
Use speech analytics findings to validate and improve AHT baselines
FCR calculator
Measure FCR from speech analytics contact resolution categorisation