Contact centre quality scorecard design
A QA scorecard that measures what is easy to observe rather than what matters most to the customer will produce high scores from agents who tick boxes and low satisfaction from customers whose problem wasn't solved. Scorecard design determines agent behaviour more directly than coaching.
Scorecard structure: four to six sections
A quality scorecard with more than six sections typically measures process compliance rather than quality. Each section should reflect a genuine quality dimension — not a compliance checklist in disguise.
| Section | What it measures | Typical weight | Example criteria |
|---|---|---|---|
| Regulatory and compliance | Mandatory items required by law, regulator, or contract. Failure here may have legal or regulatory consequences. These are typically auto-fail or heavily weighted. | 15–25% (or auto-fail for critical items) | ID&V completed correctly; GDPR consent captured; FCA/FOS required language used; recording disclosure given; vulnerable customer protocol followed |
| Resolution quality | Whether the agent actually resolved the customer's issue correctly and completely on the first contact. The most operationally important section — directly linked to FCR and repeat contacts. | 25–35% | Correct information provided; issue fully resolved or escalated appropriately; customer would not need to call again; next steps clearly communicated |
| Communication and listening | How the agent communicated: active listening, clarity, empathy, tone, language. These behaviours are the softest to assess objectively and require the clearest criteria definitions. | 20–30% | Listened without interrupting; confirmed understanding; matched language to customer; appropriate empathy for contact type; professional tone throughout |
| Process adherence | Whether the agent followed the correct process steps — system navigation, hold protocols, transfer process, escalation criteria, documentation. Distinguished from compliance: these are internal process requirements. | 15–20% | Correct system entries made; hold used appropriately (not for avoidance); wrap code accurate; CRM notes complete and accurate |
| Customer experience | The overall customer experience, typically assessed holistically at the end of the form rather than through binary criteria. Often the section where assessor calibration requires the most work. | 10–15% | Call opened professionally; rapport appropriate for contact type; call closed clearly; customer appeared satisfied with the outcome |
Critical items: auto-fail criteria
Critical items are scorecard criteria where a single failure results in the entire contact being scored zero — regardless of how well the agent performed on every other criterion. They exist for behaviours where partial success is not acceptable: a contact where ID&V was not completed correctly is a compliance failure whether or not the agent scored 90% on everything else.
Legitimate auto-fail criteria
- ✕ID&V not completed or incorrectly completed
- ✕Misleading or factually incorrect information that could cause financial harm
- ✕Regulatory disclosure not given (FCA, insurance, financial advice)
- ✕GDPR breach — sharing personal data with an unauthorised party
- ✕Vulnerable customer protocol not followed when triggered
- ✕Fraud indicator not reported per procedure
Inappropriate auto-fail criteria
These should be weighted criteria, not auto-fail — failure is significant but not catastrophic.
- ⚠Script not followed word-for-word (unless regulatory language)
- ⚠Call not opened with the exact greeting phrase
- ⚠ACW completed after the system-set target time
- ⚠Offer not made to every customer regardless of contact reason
- ⚠CSAT survey not offered at the end of every call
Calibration: making assessors consistent
Calibration is the process by which the assessor population reaches consistent application of the same scorecard criteria. Without calibration, a QA score reflects assessor preference more than agent performance. An agent assessed consistently by a lenient assessor will score 5–10pp higher than an identically performing agent assessed by a strict assessor — making scores unusable for performance comparison.
Monthly calibration sessions
4–8 assessors independently score the same call (typically 2–3 calls per session) without seeing each other's scores. Scores are then revealed and compared. Discrepancies of more than ±5% on the total score are discussed to reach consensus on the criteria definition.
Calibrate by criterion, not just by total
Two assessors can reach the same total score by different criteria scores. If assessor A gives resolution quality 8/10 and empathy 6/10, and assessor B gives resolution quality 6/10 and empathy 8/10, the total matches but the criteria application is inconsistent. Calibration must review criterion-level scores, not just totals.
Calibrate with the management population
Team leaders and operations managers who give feedback based on QA scores must calibrate alongside assessors. If a TL disagrees with the score, they must be calibrated — not allowed to override the assessor informally.
Track inter-rater reliability over time
Calculate the average absolute score difference between assessors across a calibration period. Target: ±5% average absolute difference. An operation running at ±12% average difference has an assessor consistency problem that makes scores unreliable.
Five common scorecard design errors
Measuring compliance instead of quality
Consequence
Agents learn to tick the compliance boxes while delivering a poor customer experience. QA scores rise while CSAT falls. The scorecard has measured the wrong thing.
Fix
Weight resolution quality and customer outcome sections most heavily. Include at least one holistic 'would this customer call again?' criterion that forces the assessor to make a judgment about the actual outcome.
Treating every criterion as equally important
Consequence
A minor communication issue (used 'can' instead of 'may') carries the same weight as a resolution failure (gave incorrect information). The score does not reflect what actually matters.
Fix
Weight criteria explicitly and differently. Resolution quality items should score out of 25; tone items might score out of 5. Agents will optimise for high-weight items.
Criteria that are ambiguous at the margin
Consequence
Assessors disagree on borderline cases — 'did the agent show empathy?' has a dozen defensible interpretations. Inter-rater reliability deteriorates. Scores are contested by agents and managers.
Fix
Define criteria with explicit observable behaviours: 'empathy demonstrated by acknowledging emotional content of the customer's situation using language that validates their experience (not just 'I understand')'. Include worked examples of pass/fail for marginal cases.
Linking the QA score directly to a bonus without moderation
Consequence
Agents challenge every low score because their bonus depends on it. Assessors experience pressure to inflate scores. The QA function becomes a source of conflict rather than a development tool.
Fix
Use QA scores as one input to performance management, not as the direct determinant of financial outcomes. Require at least 5 scored contacts per month before using the average as a performance measure. Include a moderation step for challenged scores.
The scorecard is not updated when products, processes, or regulations change
Consequence
Assessors cannot reliably assess criteria that reference old processes. Agents are scored against outdated standards. The QA score becomes a measure of how well agents follow superseded procedures.
Fix
QA scorecard must be in the scope of the change management process. When a process change is approved, the QA scorecard update is part of the implementation plan, not an afterthought.
Quality scorecard questions
How do you design a contact centre quality scorecard?
Five steps: (1) Define 4–6 sections — regulatory/compliance, resolution quality, communication, process adherence, customer experience; (2) Classify critical (auto-fail) items — limit to 5–8 genuinely critical behaviours (regulatory breach, serious customer detriment, legal risk); (3) Weight sections by importance — resolution quality and customer outcome 30–40% combined, compliance 15–25%, communication 20–30%; (4) Set the pass mark — typically 80–85%; validate against CSAT data (high QA should correlate with high CSAT); (5) Calibrate — monthly sessions, ±5% inter-rater reliability target, calibrate by criterion not just by total score.
What is calibration in contact centre quality management?
Calibration is the process of ensuring assessors apply scorecard criteria consistently, producing reliable scores. In monthly sessions, 4–8 assessors independently score the same 2–3 calls, then compare and discuss discrepancies over ±5%. Calibration must be done by criterion (not just total score) and must include team leaders and managers who act on scores. An operation with ±12% average inter-rater difference has a consistency problem that makes scores unreliable for performance management.
Related guides
Quality management guide
The broader QA function and process
QA framework guide
Quality framework design and governance
Coaching guide
Using QA scores to drive coaching
Incentive schemes
Why QA score is the safest incentive metric
Performance management
Linking QA to formal performance process
Speech analytics
Automating QA through call recording analysis
FCR calculator
Include FCR as an outcome metric on the scorecard
AHT calculator
Set the AHT range that passes the scorecard quality standard