WFM guideQuality

Contact centre quality scorecard design

Q: How do you design a contact centre quality scorecard?

A contact centre QA scorecard should be designed in five steps: (1) Define the categories — typically 4-6 sections covering regulatory/compliance, resolution quality, communication skills, process adherence, and customer experience. Each section should reflect a genuine quality dimension, not an internal control checklist; (2) Classify critical items — identify any items where failure is a regulatory breach, a serious customer detriment, or a significant legal risk. These should be 'auto-fail' items: a single failure scores the entire call zero regardless of other scores. Critical items must be limited to genuinely critical behaviours — if everything is critical, nothing is; (3) Weight sections by importance — resolution quality and customer outcome sections should carry the highest weight (typically 30-40% combined). Compliance items that are required but not differentiating should carry lower weight than customer experience items that drive satisfaction; (4) Set the pass mark — typically 80-85% for most contact centre operations. Below 70% should be treated as failing. The pass mark should be validated against QA scores and customer satisfaction data: agents with high QA scores should have higher CSAT scores; (5) Calibrate — assessors must regularly score the same call independently and compare results. Calibration sessions should target ±5% assessor agreement on overall score.

Q: What is calibration in contact centre quality management?

Calibration in contact centre quality management is the process of ensuring that different quality assessors apply the same standards to the same contacts, producing consistent scores. Without calibration, QA scores reflect assessor preference rather than agent performance — an agent assessed by a lenient assessor receives higher scores than an agent with identical performance assessed by a strict assessor. Calibration sessions involve multiple assessors (typically 4-8) independently scoring the same call, then comparing scores and discussing discrepancies. The target is ±5% on overall score between any two assessors. Calibration should occur monthly at minimum. The most common calibration failure is assessing agreement on the final score rather than on individual scorecard criteria — two assessors can reach the same total through different criteria scores, which means the criteria definitions are ambiguous and will produce divergent results across the broader assessor population.

A QA scorecard that measures what is easy to observe rather than what matters most to the customer will produce high scores from agents who tick boxes and low satisfaction from customers whose problem wasn't solved. Scorecard design determines agent behaviour more directly than coaching.

Scorecard structure: four to six sections

A quality scorecard with more than six sections typically measures process compliance rather than quality. Each section should reflect a genuine quality dimension — not a compliance checklist in disguise.

Section	What it measures	Typical weight	Example criteria
Regulatory and compliance	Mandatory items required by law, regulator, or contract. Failure here may have legal or regulatory consequences. These are typically auto-fail or heavily weighted.	15–25% (or auto-fail for critical items)	ID&V completed correctly; GDPR consent captured; FCA/FOS required language used; recording disclosure given; vulnerable customer protocol followed
Resolution quality	Whether the agent actually resolved the customer's issue correctly and completely on the first contact. The most operationally important section — directly linked to FCR and repeat contacts.	25–35%	Correct information provided; issue fully resolved or escalated appropriately; customer would not need to call again; next steps clearly communicated
Communication and listening	How the agent communicated: active listening, clarity, empathy, tone, language. These behaviours are the softest to assess objectively and require the clearest criteria definitions.	20–30%	Listened without interrupting; confirmed understanding; matched language to customer; appropriate empathy for contact type; professional tone throughout
Process adherence	Whether the agent followed the correct process steps — system navigation, hold protocols, transfer process, escalation criteria, documentation. Distinguished from compliance: these are internal process requirements.	15–20%	Correct system entries made; hold used appropriately (not for avoidance); wrap code accurate; CRM notes complete and accurate
Customer experience	The overall customer experience, typically assessed holistically at the end of the form rather than through binary criteria. Often the section where assessor calibration requires the most work.	10–15%	Call opened professionally; rapport appropriate for contact type; call closed clearly; customer appeared satisfied with the outcome

Critical items: auto-fail criteria

Critical items are scorecard criteria where a single failure results in the entire contact being scored zero — regardless of how well the agent performed on every other criterion. They exist for behaviours where partial success is not acceptable: a contact where ID&V was not completed correctly is a compliance failure whether or not the agent scored 90% on everything else.

Legitimate auto-fail criteria

✕ID&V not completed or incorrectly completed
✕Misleading or factually incorrect information that could cause financial harm
✕Regulatory disclosure not given (FCA, insurance, financial advice)
✕GDPR breach — sharing personal data with an unauthorised party
✕Vulnerable customer protocol not followed when triggered
✕Fraud indicator not reported per procedure

Inappropriate auto-fail criteria

These should be weighted criteria, not auto-fail — failure is significant but not catastrophic.

⚠Script not followed word-for-word (unless regulatory language)
⚠Call not opened with the exact greeting phrase
⚠ACW completed after the system-set target time
⚠Offer not made to every customer regardless of contact reason
⚠CSAT survey not offered at the end of every call

Design rule: If everything is auto-fail, nothing is. Contact centres with 20+ auto-fail criteria produce an assessor population that grades auto-fails inconsistently, agents who are too stressed to perform naturally, and scores that do not differentiate agent capability. Limit auto-fail criteria to 5–8 genuinely critical items.

Calibration: making assessors consistent

Calibration is the process by which the assessor population reaches consistent application of the same scorecard criteria. Without calibration, a QA score reflects assessor preference more than agent performance. An agent assessed consistently by a lenient assessor will score 5–10pp higher than an identically performing agent assessed by a strict assessor — making scores unusable for performance comparison.

Monthly calibration sessions

4–8 assessors independently score the same call (typically 2–3 calls per session) without seeing each other's scores. Scores are then revealed and compared. Discrepancies of more than ±5% on the total score are discussed to reach consensus on the criteria definition.

Calibrate by criterion, not just by total

Two assessors can reach the same total score by different criteria scores. If assessor A gives resolution quality 8/10 and empathy 6/10, and assessor B gives resolution quality 6/10 and empathy 8/10, the total matches but the criteria application is inconsistent. Calibration must review criterion-level scores, not just totals.

Calibrate with the management population

Team leaders and operations managers who give feedback based on QA scores must calibrate alongside assessors. If a TL disagrees with the score, they must be calibrated — not allowed to override the assessor informally.

Track inter-rater reliability over time

Calculate the average absolute score difference between assessors across a calibration period. Target: ±5% average absolute difference. An operation running at ±12% average difference has an assessor consistency problem that makes scores unreliable.

Five common scorecard design errors

Measuring compliance instead of quality

Consequence

Agents learn to tick the compliance boxes while delivering a poor customer experience. QA scores rise while CSAT falls. The scorecard has measured the wrong thing.

Fix

Weight resolution quality and customer outcome sections most heavily. Include at least one holistic 'would this customer call again?' criterion that forces the assessor to make a judgment about the actual outcome.

Treating every criterion as equally important

Consequence

A minor communication issue (used 'can' instead of 'may') carries the same weight as a resolution failure (gave incorrect information). The score does not reflect what actually matters.

Fix

Weight criteria explicitly and differently. Resolution quality items should score out of 25; tone items might score out of 5. Agents will optimise for high-weight items.

Criteria that are ambiguous at the margin

Consequence

Assessors disagree on borderline cases — 'did the agent show empathy?' has a dozen defensible interpretations. Inter-rater reliability deteriorates. Scores are contested by agents and managers.

Fix

Define criteria with explicit observable behaviours: 'empathy demonstrated by acknowledging emotional content of the customer's situation using language that validates their experience (not just 'I understand')'. Include worked examples of pass/fail for marginal cases.

Linking the QA score directly to a bonus without moderation

Consequence

Agents challenge every low score because their bonus depends on it. Assessors experience pressure to inflate scores. The QA function becomes a source of conflict rather than a development tool.

Fix

Use QA scores as one input to performance management, not as the direct determinant of financial outcomes. Require at least 5 scored contacts per month before using the average as a performance measure. Include a moderation step for challenged scores.

The scorecard is not updated when products, processes, or regulations change

Consequence

Assessors cannot reliably assess criteria that reference old processes. Agents are scored against outdated standards. The QA score becomes a measure of how well agents follow superseded procedures.

Fix

QA scorecard must be in the scope of the change management process. When a process change is approved, the QA scorecard update is part of the implementation plan, not an afterthought.

Quality scorecard questions

How do you design a contact centre quality scorecard?

Five steps: (1) Define 4–6 sections — regulatory/compliance, resolution quality, communication, process adherence, customer experience; (2) Classify critical (auto-fail) items — limit to 5–8 genuinely critical behaviours (regulatory breach, serious customer detriment, legal risk); (3) Weight sections by importance — resolution quality and customer outcome 30–40% combined, compliance 15–25%, communication 20–30%; (4) Set the pass mark — typically 80–85%; validate against CSAT data (high QA should correlate with high CSAT); (5) Calibrate — monthly sessions, ±5% inter-rater reliability target, calibrate by criterion not just by total score.

What is calibration in contact centre quality management?

Calibration is the process of ensuring assessors apply scorecard criteria consistently, producing reliable scores. In monthly sessions, 4–8 assessors independently score the same 2–3 calls, then compare and discuss discrepancies over ±5%. Calibration must be done by criterion (not just total score) and must include team leaders and managers who act on scores. An operation with ±12% average inter-rater difference has a consistency problem that makes scores unreliable for performance management.