Eskom Is Chasing Its Own Tail As Routine Maintenance And Emergency Repair Compete For Attention

I learned of the passing of Marthinus Bezuidenhout on August 14, 2021, due to Covid-19 related complications. Marthinus Bezuidenhout was a corporate consultant in my team at Eskom specialising in physical metallurgy. He was part of the A-Team that stopped load shedding in 2015. I write this article in his honour.
Critical components in a power station have an implicit end-of-life that dictates when they no longer can be regarded as fit for purpose. Eskom and many utilities in the world operate power stations beyond their design lives. This engineering fact raised important questions for Marthinus Bezuidenhout regarding energy availability and plant safety; questions that have been dramatically underscored by the recent significant events at Eskom.
I learned from Marthinus Bezuidenhout that the traditional bathtub curve is a reasonable, qualitative illustration of the failure modes that are prevalent in a power station. Marthinus Bezuidenhout introduced me to the works of Dennis J Wilkins who had just published a two-part series focusing on the traditional bathtub curve and the product failure behaviour. This series was published by Reliability HotWire, the e-magazine for the reliability professionals in December 2002.
The useful life of components in a power station consists of three operating periods: an infant mortality period with a decreasing failure rate followed by a normal useful life with a low and relatively constant failure rate and concluding with a wear-out period that exhibits an increasing failure rate.
The normal life period is characterized by a low and relatively constant failure rate with failures that are random. It has a reliability baseline of 90:7:3, meaning 90% availability, 7% planned outages and 3% unplanned outages. This baseline was achieved in Eskom in 1998 under the late executive director for generation, Bruce Crookes. It equated to the performance of the best quartile of power generating units in the world.
In power plant operations, critical components are very reliable until a certain period has passed. Wear-out failure mechanisms are triggered after a wear-out period of up to 40 years during which the wear-out failure mechanisms will not cause failure. This useful life is followed by a period during which the failure rates from such failure mechanisms rises. The reliability baseline at a fleet level for the wear-out period is 80:10:10, meaning 80% availability, 10% planned outages and 10% unplanned outages. We achieved this baseline in 2017 and 2018. It has been on a downhill since then with 2020 being the worst in history.
Considering the years of operating history of electric power generating stations, it is unlikely that the wear-out failure mechanisms that have occurred since 2018 or will occur in the future have not already been experienced, either by the affected power station or elsewhere within Eskom or outside the utility. While evaluating wear-out failure events for cause, it is usually discovered that information was generally available that — had it been known and used — could have helped to minimize the consequences of these events or possibly avoid them altogether.
It is common cause in reliability engineering that when the preventive maintenance plans are performed, many of the wear-out contributions are greatly attenuated. The only failures which still occur are those which “leak through” the preventive maintenance defences because of the failure mechanisms which are simply not addressed by the individual preventive maintenance actions and also because of the maintenance errors of commission which cause a failure that would not otherwise occur.
Preventive maintenance tasks exist for all the critical components in a power station. They are either time- or condition-driven tasks that are meant to restore critical components to their design basis. Eskom should have no difficulty in achieving its reliability objective of 80:10:10 for as long as it implements the existing preventive maintenance tasks diligently.
If Eskom is really trying to implement the existing preventive maintenance tasks that are meant to restore critical components to their design basis, then that effort is not good enough. Alternatively, it means that there are chronic maintenance errors of commission that are causing the current failures which would not otherwise occur.
On August 19, 2021, Eskom announced that it may be forced to implement power cuts at short notice because the unplanned maintenance due to the plant breakdowns totalled 13 557MW while planned maintenance was 4 074MW. The available generating capacity was 31 093MW against the total electricity demand of 30 751MW. We can deduce from these numbers that the installed generating capacity is 48 724MW.
The ratio of the unplanned maintenance work to the planned maintenance work is at least 3:1. Objectively, this can only mean that the corrective maintenance work comprises most of the maintenance work at the power stations. I estimate that the preventive maintenance compliance runs between 30-50% depending on the area of the plant. The preventive maintenance tasks are sacrificed due to the emergent work that is caused by the plant breakdowns that are embarrassingly higher than the norm. The power stations need to approach 100% on preventive maintenance compliance for critical equipment.
An additional effect that is very important is the occurrence of maintenance errors. Only planned maintenance work is scheduled and once a schedule is set it should not be modified. In other words, if it gets on the schedule, it must be done. However, breakdown maintenance is not scheduled work but emergent work. Preventive maintenance tasks are often sacrificed due to emergent work. When this happens it generally harms plant reliability by introducing unnecessary maintenance errors. Maintenance errors are a significant contributor to plant breakdowns.
Marthinus Bezuidenhout was extremely pained by the fact that significant failures “leak through” the preventive maintenance defences. To him the age of plant was used as an excuse. The preventive maintenance tasks to deal with wear-out failure mechanisms were in place. The new Eskom culture is the biggest hindrance.
Marthinus Bezuidenhout would always say this to me: “Matshela, you just don’t get it. Things are not the same anymore. A lot has changed.”