Since the industrial revolution, equipment and manufacturing processes have dramatically grown in complexity. This led to a major evolution in maintenance engineering, which went from basic lubrication to the traditional approach of replacing worn out pieces before they fail. Ultimately, these processes have evolved into Reliability Centered Maintenance (RCM).
RCM changed the paradigm and the concept of maintenance. Thorough studies revealed that the pieces that actually wear out are only a small percentage of the total and moreover, they usually wear out in a random failure distribution. This means that you cannot predict when they are most likely to wear out in order to replace them before they fail. In fact, in some cases by doing so, it just increases the probability of failures.
This article will review some of the core concepts that are key to understanding the RCM methodology and to be able to study it in depth.
Every time someone buys a piece of equipment it is because they want that equipment for a specific purpose. For example, manufacturers purchase a packaging machine to wrap up their products. In the same way, every component in the equipment has its specific purpose.
Equipment or components may have more than one function.
- The main one is called primary function (e.g. wrap up the products).
- and the other/s are secondary functions (e.g. keep products free from contamination, etc).
It is paramount to clearly state the function. Not only do you need to specify the purpose of the asset (e.g. again, to wrap the products) but also quantify what you expect it to do (e.g. number of packages per minute, etc). This is crucial because it will be further used to determine if the asset is actually doing what you want it to do.
Although this may not sound complex, defining the function in the correct way is not an easy task. It requires experience and must be carried out carefully.
The function is always dependent on the operating context, especially at the time of defining and quantifying the expected performance.
Consider two identical pumps. One is installed in a nuclear reactor to pump the cooling water to the core and the other is used to transfer rainwater in a large facility. It is clear that the requirements and operating context will be totally different, as will be the maintenance approach, for each case.
2. Functional Failure
When the asset doesn’t fulfill its function, it undergoes a functional failure. This functional failure can be total if the asset does not perform its function at all. Or it can be partial if the asset is still working, but not to the expected performance.
In the packaging machine example, if the machine is not working at all, this is a total functional failure. However, if it’s working but cannot reach the required speed, it can be said that this is a Partial Functional Failure.
Notice that the operational context also plays a key role here, especially for partial functional failures. The wrapping machine can be still delivering what a small plant needs, whereas this would not be enough for a bigger plant with a higher speed requirement.
3. Failure Mode
Assets can fail in many ways. Traditionally, when there were problems with an asset, people would say that it just failed. On the other hand, in RCM, there is a clear differentiation between the functional failure (the machine is not delivering what it should) and the failure modes, which are the events that actually produce the failure.
When failure modes are analyzed, these three situations or categories need to be considered:
1) When the asset’s performance drops below the desired value and the asset is no longer fulfilling its function. The most common reasons for this to happen are:
- deterioration or wearing out parts
- lack of lubrication
- disassembling of components (due to problems in rivets, bolts, welds, etc.)
- and human errors that have an impact on the asset capabilities.
2) When the operating context starts demanding more from the asset and the desired performance increases: In this case, either the asset is unable to provide the results, or in order to deliver them it starts wearing out due to the increased stress.
3) When the asset was not capable of delivering the required performance in the first place: In this case, there might be a deficient design or a failure in determining the requirements at the time of acquiring the asset.
All in all, you can expect between one and 30 failure modes for every functional failure, so you need to be careful in the level of detail in the analysis.
You need to tackle only the probable failure modes. This includes the ones in the failure history, the failure modes that are being managed by the current maintenance plans and any probable failures that haven’t happened yet, but are likely to happen in the future.
4. Failure Effects
When a failure mode occurs, of course, it is not an isolated event. Failure effects describe what happens in the asset and in the operating context when a failure mode occurs. It will provide the information that you will use to later analyze the consequences of the failures.
You need to include any relevant information related to:
- Evidence that the failure has occurred. This includes any physical sign or clue that can be used to determine that the failure has occurred (a liquid stopped flowing, activated warning lights or alarms, etc.).
- Threats to the environment or people’s safety as well as details about to what extent it may affect them.
- Impact on operations like delays, defective products, rework, etc.
- Damages resulted from the failure. This might be to the asset itself and also damages to facilities, products or other equipment.
- Steps necessary to repair the asset and put it back into operation, including resources, spares and costs.
5. Failure Consequences
In the failure effects, the facts associated with the failure modes are stated. But when the failure consequences are analyzed, a qualitative evaluation to determine the importance of that failure mode for the operations is performed. This analysis will reveal how much this failure matters.
The way to proceed will depend on the type of function being analyzed since functions can be
In the case of evident functions, the asset’s functional failure can be detected by the operator and its consequences can have different degrees of importance.
Put people and environmental safety first. Then look at the operational issues with higher costs related to productivity and/or quality. Lastly, examine the non-operational consequences with the cost that is related only to the asset repair.
Conversely, hidden functional failures cannot be detected by the operators. This category is closely related to protective and safety devices. Due to the increase in complexity, assets usually include protective devices in order to minimize the consequences of the different failures.
However, their inclusion adds complexity to the RCM analysis and the maintenance strategy in general.
When it can be detected if the protective device is in a failed state, you should consider it a fail-safe device and the analysis become less critical. But when it is the opposite, trust that the device will act when something goes wrong, but the piece might had been in failure state for a long time and you won’t notice it until you need it.
This leads to a complex analysis that involves using the probability of failure of the protective and protected devices and other information to determine how you can make the system safer.
RCM perspective to maintenance strategy
The process of analyzing all these concepts in detail to use them to design the maintenance strategy is called Failure Mode and Effect Analysis (FMEA) which is frequently mentioned but not always fully understood.
As mentioned before, it is important to choose the right level of analysis. An FMEA can include between 3,000 and 10,000 failure modes with their correspondent effects and consequences.
This is an enormous amount of information and work, so it is important to select the right level of detail as well as the assets that are important enough to justify this effort.
In RCM, the focus on avoiding the failure consequences, not the failure modes. That is why you may decide to leave a failure mode ‘unattended’ and repair it when it breaks down, while in other cases you conclude that the only option is to redesign the system to prevent serious consequences.
The process is not easy and requires much more than simply knowing what the concepts mean. However, being aware of these concepts helps us understand the difference between the RCM approach compared with the classical approach. With this awareness, you can start viewing the maintenance strategy from a different perspective.