Skip to main content
TurnellaBeta
Operations guide

Contact centre quality management

Quality management is the process of measuring whether customer interactions meet your standard — and systematically improving them when they don't. It sits at the intersection of operations, compliance, and WFM: a quality problem is also a volume problem (low FCR means more repeat calls) and a capacity problem (agents rushing to hit AHT targets at the expense of resolution quality).

QA scorecard framework

A QA scorecard typically covers 4–6 dimensions, each weighted to reflect its importance to your operation. Compliance items are typically auto-fail: any breach results in a zero score regardless of other performance.

DimensionTypical weightAuto-fail?

Opening and identification

Agent introduces themselves correctly, verifies customer identity per policy, and sets the appropriate tone.

10–15%No

Common failure mode: Missing verification, incorrect greeting, no name given

Understanding and empathy

Agent acknowledges the customer's situation, avoids scripted phrases that feel hollow, and demonstrates active listening.

15–20%No

Common failure mode: Interrupting the customer, scripted empathy ('I understand how you feel'), ignoring emotional cues

Resolution accuracy

The information given was correct, the action taken was appropriate, and the customer's primary need was addressed.

25–35%No

Common failure mode: Wrong information given, incorrect account action, promise not logged in CRM

Compliance and disclosure

Mandatory disclosures were made (FCA, GDPR, recording notice), prohibited phrases avoided, script adherence met for regulated topics.

15–25%Yes — any breach

Common failure mode: Missing FCA disclosure, prohibited promise, mis-statement of terms — auto-fail on any breach

Closing and FCR

Call was closed with confirmation of resolution, next steps communicated, and customer not likely to call back for the same reason.

10–15%No

Common failure mode: Abrupt close, unresolved query without explanation, follow-up not booked when required

Tone and professionalism

Agent maintained appropriate professional tone throughout, avoided jargon or condescension, and did not display exasperation.

5–10%No

Common failure mode: Sighing audibly, talking over the customer, inappropriate informality

Weights should reflect your operation's priorities. FCA-regulated operations typically weight compliance at 25–30%. Pure-CX operations with lower regulatory burden may weight resolution accuracy at 35–40%.

Calibration: making QA scores mean something

Without calibration, QA scores measure the analyst's interpretation of the scorecard as much as the agent's actual performance. Agents subjected to two analysts applying the same framework differently experience the QA process as arbitrary — which harms engagement and makes quality feedback harder to act on.

Calibration session structure

1

Pre-calibration

Each analyst independently scores the same 2–3 selected calls using the current scorecard. No discussion until all scores are submitted.

2

Score comparison

A facilitator (QA manager) collects scores and reveals the distribution for each dimension. Significant divergences (>10pp) are flagged for discussion.

3

Dimension-by-dimension discussion

For each dimension with divergence, analysts explain their scoring rationale. The group agrees on what constitutes each score level for this dimension.

4

Scoring guidance update

Calibration outputs are documented as 'scoring exemplars' — real examples from calibration calls illustrating what each score level looks like.

5

Inter-rater reliability tracking

Track the Pearson or Spearman correlation between analyst scores across calibration sessions. Target r > 0.85 across the team. Declining reliability signals scorecard ambiguity.

Calibration frequency: monthly minimum, weekly during scorecard changes

Monthly calibration sessions maintain inter-rater reliability for stable scorecards. When a new scorecard is introduced or a dimension is modified, hold weekly calibration sessions for the first 4–6 weeks until analyst scores converge.

Sampling strategy

Which calls to evaluate, and how many, determines whether QA data is statistically meaningful or noise dressed up as a performance metric.

Random sampling (baseline)

Best for: Standard ongoing quality monitoring

Select calls randomly from the agent's total volume. Provides a representative picture of typical performance. The minimum meaningful sample is 4–6 calls per agent per month. Fewer than 2 calls produces results too noisy to act on.

Stratified sampling

Best for: When contact type distribution is uneven and each type needs quality coverage

Sample proportionally from contact types (e.g. complaints, sales, billing). If complaints are 20% of volume, 20% of QA evaluations should be complaints. Pure random sampling under-represents low-volume contact types.

Triggered sampling (speech analytics)

Best for: Identifying known quality risk patterns; supplementing random sampling

Use automated call tagging to flag calls meeting specific criteria (long hold, negative sentiment keywords, certain products). Evaluate only flagged calls. Efficient but creates survivorship bias — the QA picture reflects problems, not typical performance.

Performance-weighted sampling

Best for: Resource-constrained QA teams; targeted development programmes

Evaluate more calls for agents in ramp, on performance plans, or with recent quality flags. Established high performers may receive fewer evaluations. Reduces QA analyst time while focusing resource where it has the most impact.

Quality management and WFM — the connections

Quality decisions directly affect WFM capacity. A QA programme that drives the wrong behaviours creates staffing problems that are invisible in the quality scorecard.

FCR and volume

High FCR is a quality indicator and a volume reduction lever. Every 1% improvement in FCR removes ~1–1.5% of total inbound volume. Quality programmes that improve agent resolution quality directly reduce the headcount needed to serve the same customer base.

FCR guide

AHT and resolution quality

Quality pressure and AHT targets interact dangerously. Agents told to keep calls short often reduce AHT by cutting resolution corners — producing lower FCR and higher repeat contact volume. The right metric to optimise is not AHT alone but AHT × (1 + repeat contact rate).

AHT guide

Attrition and QA culture

QA programmes perceived as punitive rather than developmental are a driver of agent attrition. High attrition means more agents always in ramp — costing effective FTE and throughput. A QA culture where feedback leads to coaching and development retains agents and protects WFM capacity.

Attrition guide

Schedule adherence and monitoring

Agents who know their calls are monitored and evaluated tend to have better schedule adherence — the correlation between quality engagement and adherence is consistently observed. A strong QA culture that agents buy into also improves the operational discipline that schedule adherence measures.

Adherence guide

Quality management questions

What should a contact centre QA scorecard include?

Typically 4–6 dimensions: opening/identification (10–15%), understanding and empathy (15–20%), resolution accuracy (25–35%), compliance/disclosure (15–25%, auto-fail on breach), closing/FCR (10–15%), tone and professionalism (5–10%). Weights depend on your operation's priorities. Regulated operations weight compliance higher. Pure-CX operations weight resolution accuracy higher.

What is QA calibration in a contact centre?

Calibration is the process of ensuring all QA analysts apply the same scorecard consistently. All analysts independently score the same calls, then compare and discuss divergences. The output is a shared understanding of what each score level means per dimension. Without calibration, QA scores measure analyst interpretation rather than agent performance. Monthly minimum; weekly during scorecard changes.

How many calls should you QA per agent per month?

4–6 calls per month is the minimum for statistically valid assessment of typical performance. Agents in ramp or on performance plans benefit from 8–12 calls per month. Fewer than 2 calls per month produces results too noisy to be meaningful. Speech analytics tools allow a smaller number of targeted manual reviews to cover more quality risk efficiently.

How does quality management connect to WFM metrics?

Quality decisions affect WFM through FCR (high FCR reduces repeat contacts and volume), AHT (rushing to hit AHT targets reduces FCR and increases repeat contacts), attrition (punitive QA culture increases attrition which harms effective FTE), and adherence (quality-engaged agents tend to have better operational discipline). Quality that optimises AHT at the expense of FCR typically creates a net-negative WFM impact.

Model the WFM impact of quality improvement

FCR improvement and self-service deflection both reduce inbound contact volume — and both can be modelled in the FCR impact calculator.

Related guides