Contact centre quality management
Quality management is the process of measuring whether customer interactions meet your standard — and systematically improving them when they don't. It sits at the intersection of operations, compliance, and WFM: a quality problem is also a volume problem (low FCR means more repeat calls) and a capacity problem (agents rushing to hit AHT targets at the expense of resolution quality).
QA scorecard framework
A QA scorecard typically covers 4–6 dimensions, each weighted to reflect its importance to your operation. Compliance items are typically auto-fail: any breach results in a zero score regardless of other performance.
Opening and identification
Agent introduces themselves correctly, verifies customer identity per policy, and sets the appropriate tone.
Common failure mode: Missing verification, incorrect greeting, no name given
Understanding and empathy
Agent acknowledges the customer's situation, avoids scripted phrases that feel hollow, and demonstrates active listening.
Common failure mode: Interrupting the customer, scripted empathy ('I understand how you feel'), ignoring emotional cues
Resolution accuracy
The information given was correct, the action taken was appropriate, and the customer's primary need was addressed.
Common failure mode: Wrong information given, incorrect account action, promise not logged in CRM
Compliance and disclosure
Mandatory disclosures were made (FCA, GDPR, recording notice), prohibited phrases avoided, script adherence met for regulated topics.
Common failure mode: Missing FCA disclosure, prohibited promise, mis-statement of terms — auto-fail on any breach
Closing and FCR
Call was closed with confirmation of resolution, next steps communicated, and customer not likely to call back for the same reason.
Common failure mode: Abrupt close, unresolved query without explanation, follow-up not booked when required
Tone and professionalism
Agent maintained appropriate professional tone throughout, avoided jargon or condescension, and did not display exasperation.
Common failure mode: Sighing audibly, talking over the customer, inappropriate informality
Weights should reflect your operation's priorities. FCA-regulated operations typically weight compliance at 25–30%. Pure-CX operations with lower regulatory burden may weight resolution accuracy at 35–40%.
Calibration: making QA scores mean something
Without calibration, QA scores measure the analyst's interpretation of the scorecard as much as the agent's actual performance. Agents subjected to two analysts applying the same framework differently experience the QA process as arbitrary — which harms engagement and makes quality feedback harder to act on.
Calibration session structure
Pre-calibration
Each analyst independently scores the same 2–3 selected calls using the current scorecard. No discussion until all scores are submitted.
Score comparison
A facilitator (QA manager) collects scores and reveals the distribution for each dimension. Significant divergences (>10pp) are flagged for discussion.
Dimension-by-dimension discussion
For each dimension with divergence, analysts explain their scoring rationale. The group agrees on what constitutes each score level for this dimension.
Scoring guidance update
Calibration outputs are documented as 'scoring exemplars' — real examples from calibration calls illustrating what each score level looks like.
Inter-rater reliability tracking
Track the Pearson or Spearman correlation between analyst scores across calibration sessions. Target r > 0.85 across the team. Declining reliability signals scorecard ambiguity.
Calibration frequency: monthly minimum, weekly during scorecard changes
Monthly calibration sessions maintain inter-rater reliability for stable scorecards. When a new scorecard is introduced or a dimension is modified, hold weekly calibration sessions for the first 4–6 weeks until analyst scores converge.
Sampling strategy
Which calls to evaluate, and how many, determines whether QA data is statistically meaningful or noise dressed up as a performance metric.
Random sampling (baseline)
Best for: Standard ongoing quality monitoringSelect calls randomly from the agent's total volume. Provides a representative picture of typical performance. The minimum meaningful sample is 4–6 calls per agent per month. Fewer than 2 calls produces results too noisy to act on.
Stratified sampling
Best for: When contact type distribution is uneven and each type needs quality coverageSample proportionally from contact types (e.g. complaints, sales, billing). If complaints are 20% of volume, 20% of QA evaluations should be complaints. Pure random sampling under-represents low-volume contact types.
Triggered sampling (speech analytics)
Best for: Identifying known quality risk patterns; supplementing random samplingUse automated call tagging to flag calls meeting specific criteria (long hold, negative sentiment keywords, certain products). Evaluate only flagged calls. Efficient but creates survivorship bias — the QA picture reflects problems, not typical performance.
Performance-weighted sampling
Best for: Resource-constrained QA teams; targeted development programmesEvaluate more calls for agents in ramp, on performance plans, or with recent quality flags. Established high performers may receive fewer evaluations. Reduces QA analyst time while focusing resource where it has the most impact.
Quality management and WFM — the connections
Quality decisions directly affect WFM capacity. A QA programme that drives the wrong behaviours creates staffing problems that are invisible in the quality scorecard.
FCR and volume
High FCR is a quality indicator and a volume reduction lever. Every 1% improvement in FCR removes ~1–1.5% of total inbound volume. Quality programmes that improve agent resolution quality directly reduce the headcount needed to serve the same customer base.
FCR guide →AHT and resolution quality
Quality pressure and AHT targets interact dangerously. Agents told to keep calls short often reduce AHT by cutting resolution corners — producing lower FCR and higher repeat contact volume. The right metric to optimise is not AHT alone but AHT × (1 + repeat contact rate).
AHT guide →Attrition and QA culture
QA programmes perceived as punitive rather than developmental are a driver of agent attrition. High attrition means more agents always in ramp — costing effective FTE and throughput. A QA culture where feedback leads to coaching and development retains agents and protects WFM capacity.
Attrition guide →Schedule adherence and monitoring
Agents who know their calls are monitored and evaluated tend to have better schedule adherence — the correlation between quality engagement and adherence is consistently observed. A strong QA culture that agents buy into also improves the operational discipline that schedule adherence measures.
Adherence guide →Quality management questions
What should a contact centre QA scorecard include?
Typically 4–6 dimensions: opening/identification (10–15%), understanding and empathy (15–20%), resolution accuracy (25–35%), compliance/disclosure (15–25%, auto-fail on breach), closing/FCR (10–15%), tone and professionalism (5–10%). Weights depend on your operation's priorities. Regulated operations weight compliance higher. Pure-CX operations weight resolution accuracy higher.
What is QA calibration in a contact centre?
Calibration is the process of ensuring all QA analysts apply the same scorecard consistently. All analysts independently score the same calls, then compare and discuss divergences. The output is a shared understanding of what each score level means per dimension. Without calibration, QA scores measure analyst interpretation rather than agent performance. Monthly minimum; weekly during scorecard changes.
How many calls should you QA per agent per month?
4–6 calls per month is the minimum for statistically valid assessment of typical performance. Agents in ramp or on performance plans benefit from 8–12 calls per month. Fewer than 2 calls per month produces results too noisy to be meaningful. Speech analytics tools allow a smaller number of targeted manual reviews to cover more quality risk efficiently.
How does quality management connect to WFM metrics?
Quality decisions affect WFM through FCR (high FCR reduces repeat contacts and volume), AHT (rushing to hit AHT targets reduces FCR and increases repeat contacts), attrition (punitive QA culture increases attrition which harms effective FTE), and adherence (quality-engaged agents tend to have better operational discipline). Quality that optimises AHT at the expense of FCR typically creates a net-negative WFM impact.
Model the WFM impact of quality improvement
FCR improvement and self-service deflection both reduce inbound contact volume — and both can be modelled in the FCR impact calculator.
Related guides
FCR guide
FCR improvement through quality management
AHT guide
The AHT vs. quality trade-off
Attrition guide
How QA culture affects agent retention
Staffing ratios
QA analyst to agent ratio benchmarks
CC reporting guide
Reporting quality alongside WFM metrics
CC benchmarks
FCR and AHT benchmarks for QA targets