Forecast data cleansing
A forecasting model cannot tell real demand from a one-off anomaly — it learns from whatever you feed it. An outage day, a one-off media spike, or a data gap, left in the history, teaches the model a pattern that is not real. Cleansing the history is the unglamorous step that decides whether the forecast can be trusted.
Garbage in, confidently-wrong out
The danger of dirty data is not that the model breaks — it is that the model produces a forecast that looksperfectly reasonable but is built on patterns that are not real. A sophisticated forecasting method trained on uncleansed history will confidently reproduce an outage day as a recurring quiet period, or a one-off spike as a seasonal peak. The error is invisible in the output and only shows up as a mysterious miss when reality doesn't match. Cleansing is upstream insurance: the most valuable forecasting work often happens before the model runs, in deciding which history is genuine signal and which is noise to remove.
Five anomaly types and how to treat them
System outage / downtime
What it looks like
An interval or day where volume drops to near zero (or spikes afterward) because the phone system, website, or a dependent service was down — customers couldn't get through, then re-contacted once it recovered.
How to treat it
Replace the suppressed period with a representative value from a comparable normal day/week. Also treat the post-recovery spike (pent-up re-contacts) as anomalous — it is not a recurring pattern. Flag the date so it is excluded from future baselines.
One-off event spike
What it looks like
A sharp, isolated volume spike from a non-recurring cause — a viral news story, a product recall, a billing error affecting many customers, a marketing send that won't repeat.
How to treat it
Cap or replace the spike with a normal value if the event will not recur. If similar events recur predictably (e.g. an annual sale), keep them but model them as a separate event multiplier rather than as part of the baseline trend.
Data gap / missing intervals
What it looks like
Intervals or days with no data at all — a feed failure, a logging gap, a system migration window. Easily mistaken for zero demand by a model that reads absence as zero.
How to treat it
Do not let missing data be read as zero — that teaches the model a false low. Either exclude the gap from training, or interpolate using the same interval from comparable normal periods. Document which periods were interpolated.
Double-counting / data duplication
What it looks like
Volume that is inflated because the same contacts were counted twice — a reporting glitch, overlapping queues being summed, or a system migration where both old and new systems logged the same period.
How to treat it
Identify and de-duplicate at source. Double-counting is insidious because the inflated figures look plausible — they just quietly raise the baseline, leading to systematic over-staffing. Reconcile total volume against an independent source (e.g. billing or CRM contact counts) to catch it.
Known calendar effects mistaken for trend
What it looks like
Bank holidays, seasonal closures, or a leap-year/payday effect appearing as if they were part of the underlying trend, because they fall in the training window.
How to treat it
Don't cleanse these away — they are real and recurring — but model them explicitly (as holiday flags or event multipliers) rather than letting them blur into the trend. Cleansing is for non-recurring noise; recurring calendar effects should be captured, not deleted.
Cleansing principles
Remove non-recurring noise; keep recurring signal
The test for whether to cleanse something: will it happen again on a predictable basis? A one-off outage or viral spike won't — cleanse it. A bank holiday or annual sale will — keep it, but model it explicitly as an event rather than letting it blur into the trend.
Replace, don't just delete
Deleting an anomalous interval leaves a gap the model may read as zero. Better to replace the anomaly with a representative value — the same interval from a comparable normal week — so the series stays continuous and the model trains on a plausible value.
Document every adjustment
Keep a log of what was cleansed, when, and why. An undocumented adjustment is indistinguishable from a data error later, and makes the forecast impossible to audit. The log is also how you build institutional knowledge of recurring anomaly sources.
Cleanse, but don't over-smooth
Real demand is genuinely variable — not every spike is an anomaly. Over-aggressive cleansing removes real signal and produces a forecast that is too smooth to capture genuine peaks. Cleanse identifiable, explainable anomalies; leave unexplained-but-plausible variation alone.
Data cleansing questions
Why do you need to cleanse historical data before forecasting contact volume?
Because a forecasting model can't tell genuine demand from a one-off anomaly — it learns a pattern from whatever it is given. An untreated outage day teaches the model that period is naturally quiet (under-forecasting it next time); a one-off spike from a news story or billing error teaches the model to expect a recurring peak (over-forecasting); data gaps can be read as zero demand; double-counted intervals quietly inflate the baseline and cause systematic over-staffing. None are real repeatable patterns, but the model treats them as signal unless cleansed. Cleansing means identifying each anomaly and removing or replacing it (e.g. with the same interval from a comparable normal week) so the model trains only on genuine demand. It is the least glamorous forecasting step and the one that most determines whether the output can be trusted — a sophisticated model trained on dirty data produces confidently wrong forecasts.
Related guides
Volume forecasting
The forecasting process cleansing feeds
WFM data quality
Data quality across the WFM function
Forecasting methods
The methods that consume clean data
Event forecasting
Modelling recurring events explicitly
Forecast accuracy benchmarks
What good accuracy looks like
Telephony fundamentals
Where the raw ACD data comes from
Forecast accuracy calculator
Measure WAPE improvement after cleansing the underlying data
Erlang C calculator
Apply cleansed volume and AHT inputs for accurate FTE calculations