Skip to main content
TurnellaBeta
WFM guideQuality

Contact centre quality scorecard design

A QA scorecard that measures what is easy to observe rather than what matters most to the customer will produce high scores from agents who tick boxes and low satisfaction from customers whose problem wasn't solved. Scorecard design determines agent behaviour more directly than coaching.

Scorecard structure: four to six sections

A quality scorecard with more than six sections typically measures process compliance rather than quality. Each section should reflect a genuine quality dimension — not a compliance checklist in disguise.

SectionWhat it measuresTypical weightExample criteria
Regulatory and complianceMandatory items required by law, regulator, or contract. Failure here may have legal or regulatory consequences. These are typically auto-fail or heavily weighted.15–25% (or auto-fail for critical items)ID&V completed correctly; GDPR consent captured; FCA/FOS required language used; recording disclosure given; vulnerable customer protocol followed
Resolution qualityWhether the agent actually resolved the customer's issue correctly and completely on the first contact. The most operationally important section — directly linked to FCR and repeat contacts.25–35%Correct information provided; issue fully resolved or escalated appropriately; customer would not need to call again; next steps clearly communicated
Communication and listeningHow the agent communicated: active listening, clarity, empathy, tone, language. These behaviours are the softest to assess objectively and require the clearest criteria definitions.20–30%Listened without interrupting; confirmed understanding; matched language to customer; appropriate empathy for contact type; professional tone throughout
Process adherenceWhether the agent followed the correct process steps — system navigation, hold protocols, transfer process, escalation criteria, documentation. Distinguished from compliance: these are internal process requirements.15–20%Correct system entries made; hold used appropriately (not for avoidance); wrap code accurate; CRM notes complete and accurate
Customer experienceThe overall customer experience, typically assessed holistically at the end of the form rather than through binary criteria. Often the section where assessor calibration requires the most work.10–15%Call opened professionally; rapport appropriate for contact type; call closed clearly; customer appeared satisfied with the outcome

Critical items: auto-fail criteria

Critical items are scorecard criteria where a single failure results in the entire contact being scored zero — regardless of how well the agent performed on every other criterion. They exist for behaviours where partial success is not acceptable: a contact where ID&V was not completed correctly is a compliance failure whether or not the agent scored 90% on everything else.

Legitimate auto-fail criteria

  • ID&V not completed or incorrectly completed
  • Misleading or factually incorrect information that could cause financial harm
  • Regulatory disclosure not given (FCA, insurance, financial advice)
  • GDPR breach — sharing personal data with an unauthorised party
  • Vulnerable customer protocol not followed when triggered
  • Fraud indicator not reported per procedure

Inappropriate auto-fail criteria

These should be weighted criteria, not auto-fail — failure is significant but not catastrophic.

  • Script not followed word-for-word (unless regulatory language)
  • Call not opened with the exact greeting phrase
  • ACW completed after the system-set target time
  • Offer not made to every customer regardless of contact reason
  • CSAT survey not offered at the end of every call
Design rule: If everything is auto-fail, nothing is. Contact centres with 20+ auto-fail criteria produce an assessor population that grades auto-fails inconsistently, agents who are too stressed to perform naturally, and scores that do not differentiate agent capability. Limit auto-fail criteria to 5–8 genuinely critical items.

Calibration: making assessors consistent

Calibration is the process by which the assessor population reaches consistent application of the same scorecard criteria. Without calibration, a QA score reflects assessor preference more than agent performance. An agent assessed consistently by a lenient assessor will score 5–10pp higher than an identically performing agent assessed by a strict assessor — making scores unusable for performance comparison.

1.

Monthly calibration sessions

4–8 assessors independently score the same call (typically 2–3 calls per session) without seeing each other's scores. Scores are then revealed and compared. Discrepancies of more than ±5% on the total score are discussed to reach consensus on the criteria definition.

2.

Calibrate by criterion, not just by total

Two assessors can reach the same total score by different criteria scores. If assessor A gives resolution quality 8/10 and empathy 6/10, and assessor B gives resolution quality 6/10 and empathy 8/10, the total matches but the criteria application is inconsistent. Calibration must review criterion-level scores, not just totals.

3.

Calibrate with the management population

Team leaders and operations managers who give feedback based on QA scores must calibrate alongside assessors. If a TL disagrees with the score, they must be calibrated — not allowed to override the assessor informally.

4.

Track inter-rater reliability over time

Calculate the average absolute score difference between assessors across a calibration period. Target: ±5% average absolute difference. An operation running at ±12% average difference has an assessor consistency problem that makes scores unreliable.

Five common scorecard design errors

Measuring compliance instead of quality

Consequence

Agents learn to tick the compliance boxes while delivering a poor customer experience. QA scores rise while CSAT falls. The scorecard has measured the wrong thing.

Fix

Weight resolution quality and customer outcome sections most heavily. Include at least one holistic 'would this customer call again?' criterion that forces the assessor to make a judgment about the actual outcome.

Treating every criterion as equally important

Consequence

A minor communication issue (used 'can' instead of 'may') carries the same weight as a resolution failure (gave incorrect information). The score does not reflect what actually matters.

Fix

Weight criteria explicitly and differently. Resolution quality items should score out of 25; tone items might score out of 5. Agents will optimise for high-weight items.

Criteria that are ambiguous at the margin

Consequence

Assessors disagree on borderline cases — 'did the agent show empathy?' has a dozen defensible interpretations. Inter-rater reliability deteriorates. Scores are contested by agents and managers.

Fix

Define criteria with explicit observable behaviours: 'empathy demonstrated by acknowledging emotional content of the customer's situation using language that validates their experience (not just 'I understand')'. Include worked examples of pass/fail for marginal cases.

Linking the QA score directly to a bonus without moderation

Consequence

Agents challenge every low score because their bonus depends on it. Assessors experience pressure to inflate scores. The QA function becomes a source of conflict rather than a development tool.

Fix

Use QA scores as one input to performance management, not as the direct determinant of financial outcomes. Require at least 5 scored contacts per month before using the average as a performance measure. Include a moderation step for challenged scores.

The scorecard is not updated when products, processes, or regulations change

Consequence

Assessors cannot reliably assess criteria that reference old processes. Agents are scored against outdated standards. The QA score becomes a measure of how well agents follow superseded procedures.

Fix

QA scorecard must be in the scope of the change management process. When a process change is approved, the QA scorecard update is part of the implementation plan, not an afterthought.

Quality scorecard questions

How do you design a contact centre quality scorecard?

Five steps: (1) Define 4–6 sections — regulatory/compliance, resolution quality, communication, process adherence, customer experience; (2) Classify critical (auto-fail) items — limit to 5–8 genuinely critical behaviours (regulatory breach, serious customer detriment, legal risk); (3) Weight sections by importance — resolution quality and customer outcome 30–40% combined, compliance 15–25%, communication 20–30%; (4) Set the pass mark — typically 80–85%; validate against CSAT data (high QA should correlate with high CSAT); (5) Calibrate — monthly sessions, ±5% inter-rater reliability target, calibrate by criterion not just by total score.

What is calibration in contact centre quality management?

Calibration is the process of ensuring assessors apply scorecard criteria consistently, producing reliable scores. In monthly sessions, 4–8 assessors independently score the same 2–3 calls, then compare and discuss discrepancies over ±5%. Calibration must be done by criterion (not just total score) and must include team leaders and managers who act on scores. An operation with ±12% average inter-rater difference has a consistency problem that makes scores unreliable for performance management.

Related guides