Maintenance management: Don’t let small problems combine into a failure event
When investigating why a machine fails repeatedly and seemingly at random, it’s good to remember that more than one deviation (or failure cause) may be involved. Letting deviations combine is risky, and if acceptance of deviations becomes the new normal, so to speak, failure risks can easily escalate to the point of danger.
In the great majority of machines operating in modern process plants, three principles pertain and should be kept in mind:
- When deviations combine, serious failure events can approach rapidly
- All failures have causes. Unless causes are found and addressed, more failures will occur.
- Most root causes of machine component failure can be uncovered and remedied by a properly led and trained workforce (Ref. 1).
To illustrate the point, let’s review a case history involving a small process machine. In this instance, several deviations proved troublesome in a 200 hp, vertically oriented, integrally geared high-speed low-flow (HSLF) single-stage centrifugal compressor. The machine had experienced at least five costly failures in the span of two years before an experienced reliability professional was asked to investigate. As could be expected, more than one deviation was found to be responsible for the problems encountered with this fluid machine.
To get started, the reliability professional – whom we will refer to as the troubleshooter/failure analyst (“TFA”) – examined parts, pieces, and data that the equipment owner-operator had collected. There was considerable evidence that this HSLF had suffered from multiple issues. Moreover, it was soon evident that each failure incident had been treated along the lines of “part failed, part must be at fault.” While defective fabrication may point to part replacement as a suitable action, a faulty design might require re-engineering the functionality of other components. In that case, a comprehensive reassessment of many interacting factors would be needed.
Early in the chain of separate failure events, the equipment owners had decided that vendor quality control (QC) was at fault. QC events were grouped together as the first of three problem categories. Next, examination of data collected by the plant’s process computer established that the small compressor had often been operated outside its mandatory design flowrate or range. This prompted the TFA to explain the vulnerabilities of low-flow operation. Operator or automatic control-related deviations from the original design intent constituted the second category, and at least one subsequent event was placed in this group.
Low-flow operation (operation in surge) means that the gas flowrate is so low that the gas volume pulsates back and forth as it travels from the compressor’s inlet (at suction pressure) to its outlet (at discharge pressure). Pulsating flow would also explain failures of the thrust bearing, although thrust bearing distress could be the result of lube oil foaming and/or low lube supply quantity and the temperature having been out of range. A thorough review of lube oil properties was among the failure investigator’s recommendations; such reviews could be considered maintenance-related.
Maintenance-related problems were categorized as the third set of issues by the TFA. Wherever the maintenance-related category was involved and wherever the equipment owner/user recognized this, mechanical team members had carried out repairs. Failure of a coil in the seal oil cooling circuit was among the maintenance issues, but no effort had been made to find the underlying failure cause. As is generally the case, unless the root cause is known and addressed, a repeat failure is likely. It stands to reason that equipment owners would be well-advised to establish the root causes of a failure and to implement long-term corrective action whenever possible.
Mechanical and assembly flaws. In this instance, the plant’s failure records also pointed to the possibility that, on at least one repair occasion, the compressor impeller had not been fully inserted; consequently, the impeller hub probably did not make full contact with the shaft shoulder. The resulting impeller-diffuser contact damaged the impeller and may also have been responsible for a slight bend in the final output shaft.
Giving due consideration to repair procedures is of interest here. Before final assembly, a diligent maintenance technician will make it a practice to install a dial indicator. The indicator readings must show that shaft runout is within acceptable limits. There should also be verification that the impeller nut has been retorqued to the prescribed value. Because the axial length of the impeller hub will shrink as the previously heated impeller cools, retorqueing should be attempted only after the impeller has returned to near room temperature. At that time, another indicator reading taken at the vane tips should confirm that impeller runout is not excessive.
Whenever machines have been designed and fabricated by reputable manufacturers, important data can usually be found in the manufacturer’s operating and maintenance manual. Indeed, pertinent data was available in the manual in this instance, but there was an attitude of, “Who has time to read an entire manual?” Therefore, the TFA recommended that pertinent manufacturer’s instructions be condensed and issued in single-sheet checklist format. Locations and items to check, dimensional listings, and tolerances often fit on a single sheet of paper that can be laminated in plastic. Single-sheet checklists are greatly contributing to high rebuild quality and low failure rates at best-in-class companies.
Mechanical seal updating. This owner-user company’s maintenance technicians had shop-tested certain mechanical seals and found them to be marginal at best. The experienced TFA then questioned the appropriateness of using Teflon® wedge secondaries and explained that more-pliable Viton O-rings would be preferred as secondary sealing elements. In fact, as a duty-bound and results-oriented investigator, the TFA proceeded to reverify his recollection by telephoning a competent mechanical seal expert. The expert confirmed that his company had frequently retrofitted this style and model of HSLF compressor with seal upgrade kits – some at very moderate cost. The user-purchaser was asked to consider an upgrade kit and to communicate with two other mechanical seal manufacturers. He knew that HSLF compressors designed in the 1960s or 1970s will often benefit from recent advances in sealing technology. Such advances can involve alternative material compositions, seemingly minor configuration changes, improved flush plans and more-effective water management systems.
Because the failure analyst was unable to rule out the (slight) probability of casing deflection due to pipe stress, it was decided that the machine’s inlet and discharge flange bolts would be removed during the next scheduled shutdown. Temporary dial indicators could be mounted and casing movement monitored while the compressor pipes were being disconnected and reconnected.
When compressors chirp. After an operator alerted the TFA to a “chirping noise” originating from the compressor, he decided to hear it for himself. A noise did indeed come from this compressor. However, while judged not to present an immediate concern, it was nevertheless conceded that the sound was a bit unusual. The TFA then offered the opinion that the sound emanated from a tooth passing frequency associated with a slightly off-spec gear tooth. When in doubt, this TFA ascribed or attributed noises to aerodynamically induced phenomena and admitted that he didn’t always know how compressor-internal parts interacted with the gas flow.
Still, a gear-tooth-related problem may have been causing the noise. It could possibly be managed and the progress of gear tooth distress slowed by using a synthetic oil blend made from polyalphaolefins and diester-based stocks. In HSLF compressor gears, modern nonfoaming synthetic gear oils are superior to the automatic transmission fluids typically used decades ago. But regardless of circumstances and perceived sounds, we have again confirmed that compressor failure analysts or equipment troubleshooters will often have to deal with more than one deviation. Whenever these deviations combine, a distress or downtime event may need to be addressed with great urgency. This was the case here; it serves as a fitting reminder to use structured and repeatable approaches to failure analysis and to take action early.
Finally, an automobile-related analogy and plausible scenario deserve to be kept in mind: We will occasionally be able to drive with unsafe tires. Suppose we’re fond of our ability to run on sets of tires that are worn and underinflated. Our fondness probably lasts until the day when the pavement happens to be unusually hot and we decide to place six bags of cement in the car’s trunk. That’s when one of the rear tires blows out. For just a few seconds, the other rear tire manages to carry an extreme share of the already excessive load before it, too, fails.
We can be certain of how this chain of events will end, and it’s just not worth taking the chance. The risk of far-more-costly catastrophic events is even greater on process machinery if we allow deviations to exist, accumulate, and become the “new normal.”
By : Heinz P. Bloch, P.E.