Safety Indicators and Metrics

Maintenance Backlog Case: How 250+ Projects Moved Overdue Work Into Risk Decisions

A case-study article for EHS, operations, and maintenance leaders who need to turn overdue work orders into risk decisions, not only backlog counts.

By 7 min read
metrics dashboard representing maintenance backlog case how 250 projects moved overdue work into risk decisions — Maintenance

Key takeaways

  1. 01Maintenance backlog becomes a safety indicator only when overdue work is connected to exposure and control strength.
  2. 02The strongest triage separates nuisance backlog from backlog that weakens critical controls.
  3. 03Aging alone is not enough because a short delay can matter more than a long delay when the affected control protects high-consequence work.
  4. 04Leaders need a shared decision rule between maintenance, operations, and EHS, otherwise backlog ownership stays fragmented.
  5. 05The practical output is a backlog review that changes work release, funding, shutdown scope, or control verification frequency.

Maintenance backlog is the accumulated work that has been identified, planned, or requested but not yet completed. In safety governance, the critical question is not how many work orders are overdue. The critical question is which overdue items weaken controls that protect people from serious exposure.

Many plants treat maintenance backlog as an efficiency problem. The dashboard shows open work orders, aging buckets, planner load, and completion rate. Those numbers matter, but they can hide the safety question that should disturb leaders most. Which overdue items are sitting behind lockout reliability, machine guarding, ventilation, emergency response, lifting equipment, fixed access, pressure relief, or traffic separation?

The thesis of this case is direct. A backlog does not become a safety indicator because EHS adds a red color to a maintenance chart. It becomes a safety indicator when leaders can see the control that is degrading, the exposure that is becoming more credible, the owner who can release resources, and the field evidence that proves whether work can continue. Across 25+ years in executive EHS roles and more than 250 cultural transformation projects supported by Andreza Araujo, this shift has often separated organizations that manage paperwork from organizations that manage risk.

Key takeaways

  • Maintenance backlog becomes a safety indicator only when overdue work is connected to exposure and control strength.
  • The strongest triage separates nuisance backlog from backlog that weakens critical controls.
  • Aging alone is not enough because a five-day delay can matter more than a ninety-day delay when the affected control protects high-consequence work.
  • Leaders need a shared decision rule between maintenance, operations, and EHS, otherwise backlog ownership stays fragmented.
  • The practical output is a backlog review that changes work release, funding, shutdown scope, or control verification frequency.

Initial scenario: the backlog looked operational, not dangerous

The initial pattern appears in many industrial sites. Maintenance owns the backlog report, operations owns the production pressure, and EHS sees the consequences only when a failure has already entered an incident report. The dashboard may be technically correct while the risk meaning remains invisible.

One common example is a site with hundreds of overdue work orders split by area, craft, and age. The oldest items receive attention because they make the chart look weak. Yet the highest-risk items are not always the oldest. A missing guard fastener on a frequently accessed machine, a damaged dock light, a bypassed interlock awaiting repair, or a deteriorated ventilation component may carry more safety significance than a cosmetic repair that has been open for months.

In The Illusion of Compliance, the English gloss of Andreza Araujo's A Ilusao da Conformidade, the warning is that formal evidence can comfort leaders while the worksite has already moved. Maintenance backlog creates the same trap. A count proves that the system records defects, but it does not prove that leaders understand the risk accumulating behind delayed repairs.

Decision: classify backlog by control impact

The first decision in the transformation was to stop asking only how old the work order was. Age stayed visible, but it became the second question. The first question became whether the overdue item affected a control that prevented serious injury, fatal exposure, environmental release, emergency-response failure, or uncontrolled restart.

This changed the conversation because the backlog was no longer a maintenance queue alone. A delayed repair to a fixed ladder, a traffic mirror, a gas detector docking station, a machine guard, a hoist limit switch, or a confined-space ventilation point had to be read as a control question. The maintenance planner could explain feasibility, operations could explain work pressure, and EHS could explain exposure significance.

The decision rule had four labels. Critical-control affected meant the overdue work weakened a control relied on for high-consequence exposure. Exposure-increasing meant the task did not defeat a named critical control but made people work closer to energy, traffic, height, chemical, or ergonomic stress. Reliability-only meant operational continuity was affected without a current safety control concern. Cosmetic or improvement meant the item should not consume safety escalation attention.

Execution: connect work orders to field evidence

The execution step was deliberately practical. Leaders did not rewrite the entire maintenance system first. They selected the highest-risk areas, usually maintenance shutdown tasks, energy isolation points, vehicle-pedestrian interfaces, machine guarding, chemical handling, emergency equipment, and fixed access. Then they sampled overdue work orders against field conditions.

The sample had to answer three questions. What control is affected? What exposure becomes more credible if the work stays open? What evidence proves the current condition? A work order that said repair guard was too thin. The useful version named the machine, the guard function, the access frequency, the temporary restriction, the last field verification, and the person who could decide whether production could continue.

The Headline guide on running an FMEA for high-risk maintenance is useful here because it pushes the team to name failure modes rather than accept a generic defect label. The same discipline belongs in backlog triage. Leaders should know whether the delayed work could create access to moving parts, stored energy release, loss of ventilation, uncontrolled traffic conflict, or emergency delay.

Measured result: the dashboard started changing decisions

The measured result was not a prettier backlog chart. The real signal was that the dashboard began changing decisions. Some overdue work orders moved into shutdown scope. Some triggered temporary operating limits. Some required field verification before the next shift. Some lost safety escalation because the evidence showed no current exposure increase.

That distinction matters. A safety indicator has value only when it changes the next decision. If the backlog report keeps the same colors every week and no one changes work release, budget, supervision, temporary controls, or verification frequency, the metric is reporting anxiety rather than managing risk.

This is why backlog age, exposure severity, and control evidence should be shown separately. A ninety-day cosmetic work order should not compete with a six-day defect affecting a critical isolation point. A thirty-day item with strong temporary control should not be treated the same as a thirty-day item whose temporary control has never been tested in the field.

Before and after: what changed in the backlog review

DimensionBefore the shiftAfter the shift
Main questionHow many work orders are overdue?Which overdue items weaken controls that protect people?
Priority logicAge, craft availability, production pressureControl impact, exposure severity, evidence age, and owner authority
EHS roleReviews incidents after failures become visibleHelps classify exposure before failures reach people
Operations roleRequests uptime and negotiates access windowsOwns work release decisions when controls are degraded
Useful outputAged-work-order reportDecision list for shutdown scope, restrictions, funding, and verification

Generalizable lesson 1: age is a weak proxy for risk

The first lesson is that age is a weak proxy for risk. It is easy to measure, but it often rewards the wrong attention. The oldest backlog item may be harmless, while a recent defect can weaken the exact control that prevents a severe event.

Use age as a pressure signal, not as the whole risk rule. A useful dashboard shows aging beside consequence potential, affected control, temporary-control status, and last verification date. The Headline article on safety dashboard latency makes the same point from another angle because late information can delay executive action even when the chart looks disciplined.

Generalizable lesson 2: backlog ownership must match decision rights

The second lesson is that ownership must match decision rights. If maintenance owns every overdue item, operations may keep releasing work under degraded conditions. If EHS owns every safety-tagged item, maintenance may treat the issue as an audit concern rather than a control restoration concern.

A stronger model names three owners for high-risk backlog. Maintenance owns technical restoration. Operations owns work release under the degraded condition. EHS owns exposure challenge and verification discipline. James Reason's work on latent failures is useful here because the defect rarely sits in one department. It sits in the spaces between planning, supervision, production pressure, maintenance resources, and leadership attention.

Generalizable lesson 3: temporary controls need expiry decisions

The third lesson is that temporary controls can become invisible. A spotter, barricade, temporary guard, manual check, alternate route, portable fan, or extra supervisor may be valid for a short period. It becomes weak when nobody names the condition that ends it.

The related Headline article on screening temporary field changes before work continues fits this case because backlog often creates temporary work conditions. Every temporary control attached to an overdue work order should have an expiry date, a verification method, and a stop-work threshold if the temporary control cannot be maintained.

What to apply in your operation this month

Start with a small slice rather than the full backlog. Select twenty overdue work orders from high-risk areas, then classify each one by affected control, exposure potential, evidence age, temporary-control status, and decision owner. If the team cannot classify an item without visiting the field, that is useful information. The system has been making decisions with weak evidence.

Then build a weekly backlog-risk review that is shorter than the normal maintenance meeting. The agenda should ask which overdue items require work restriction, which need field verification, which should enter shutdown scope, which need funding escalation, and which can return to routine planning because field evidence shows no exposure increase.

Finally, connect the output to existing indicators. Backlog risk should not live as another spreadsheet. It should feed critical-control verification, action closure quality, shutdown readiness, and management review. The Headline comparison of action closure rate, recurrence rate, and verification pass rate helps leaders avoid treating closure as proof when the control has not been tested.

FAQ

Is maintenance backlog a safety indicator?

Maintenance backlog becomes a safety indicator when overdue work is connected to exposure, control strength, field evidence, and decision ownership. A simple count of overdue work orders is an operational indicator, but it is not enough for safety governance.

Which backlog items should EHS review first?

EHS should review overdue items that affect critical controls, high-energy work, machine guarding, vehicle-pedestrian separation, emergency equipment, ventilation, fixed access, lifting equipment, energy isolation, or hazardous chemical controls.

Why is backlog age not enough for safety prioritization?

Backlog age is not enough because risk depends on what control is affected and what exposure becomes credible. A recent defect on a critical control can matter more than an old cosmetic work order.

Who owns safety-significant maintenance backlog?

Maintenance owns technical restoration, operations owns work release under degraded conditions, and EHS owns exposure challenge and verification discipline. High-risk backlog usually needs all three roles because the risk sits across the operating system.

What is the fastest way to improve backlog triage?

Sample twenty overdue work orders from high-risk areas and classify each by affected control, exposure potential, temporary-control status, last field evidence, and decision owner. The gaps in that sample usually show where the backlog system is blind.

Topics safety-indicators-and-metrics maintenance-backlog critical-controls work-order-aging risk-governance headline-podcast

Frequently asked questions

Is maintenance backlog a safety indicator?
Maintenance backlog becomes a safety indicator when overdue work is connected to exposure, control strength, field evidence, and decision ownership. A simple count of overdue work orders is an operational indicator, but it is not enough for safety governance.
Which backlog items should EHS review first?
EHS should review overdue items that affect critical controls, high-energy work, machine guarding, vehicle-pedestrian separation, emergency equipment, ventilation, fixed access, lifting equipment, energy isolation, or hazardous chemical controls.
Why is backlog age not enough for safety prioritization?
Backlog age is not enough because risk depends on what control is affected and what exposure becomes credible. A recent defect on a critical control can matter more than an old cosmetic work order.
Who owns safety-significant maintenance backlog?
Maintenance owns technical restoration, operations owns work release under degraded conditions, and EHS owns exposure challenge and verification discipline. High-risk backlog usually needs all three roles because the risk sits across the operating system.
What is the fastest way to improve backlog triage?
Sample twenty overdue work orders from high-risk areas and classify each by affected control, exposure potential, temporary-control status, last field evidence, and decision owner. The gaps in that sample usually show where the backlog system is blind.

About the author

Andreza Araújo

Safety Culture Expert | Senior EHS Executive

Andreza Araújo is a safety culture expert and senior EHS executive with more than 25 years of experience in environment, health and safety. She is a Civil Engineer and Occupational Safety Engineer from Unicamp, holds a Master's degree in Environmental Diplomacy from the University of Geneva, and completed sustainability studies at IMD Switzerland. Andreza has served in Global Head of EHS roles in Fortune 500 environments, leading cultural transformation programs across multinational operations. She has represented Brazil as a speaker at the United Nations in Paris and has spoken at the International Labour Organization in Turin. She is the author of more than 16 books on safety culture in Portuguese, Spanish, English and German. Her work has earned more than 10 EHS awards, including two recognitions from Indra Nooyi, former PepsiCo CEO.

  • Civil & Safety Engineer (Unicamp)
  • M.A. Environmental Diplomacy (University of Geneva)
  • Sustainability Cert (IMD Switzerland)
  • People Management & Coaching (Ohio University)
  • UN Paris speaker representative for Brazil
  • ILO Turin speaker
  • LinkedIn Top Voice
  • Indra Nooyi PepsiCo CEO recognition (2x)

Documentaries

Watch Andreza's documentaries

Three productions on safety culture, organizational failure and the human lessons behind major disasters.

Podcasts

Listen to Andreza's podcasts

She hosts three shows on safety leadership, EHS and organizational culture, in English and Portuguese.

Summarize with AI