Personal lessons from managing a critical facility.
A very common problem for facilities with critical loads is that the power generator doesn’t start when it is needed. Fortunately this can be remedied in the vast majority of cases.
When I say critical loads I am talking about computer data centers, hospitals, schools, stadiums, police stations, 911 centers, office buildings, or whatever you’ve deemed important enough to attach a backup power generator to.
You, or your organization, has a critical load and has also gone to the trouble of spending the money on investing in a generator backup power source for it.
This is simple enough: keep the power on.
A local newspaper reported on a sewage spill in the county I live in:
About 20,000 gallons of sewage spilled from the California Men’s Colony prison at 4:10 p.m. Sunday when power was lost and an emergency generator did not start. The sewage flowed into Chorro Creek, which flows into Morro Bay.
The fault was apparently that the generator did not start after a utility power failure:
“The power failed and then our backup generator failed, so it was kind of like a double power failure,” said Mike Minty, chief engineer at the prison’s waste water treatment plant. “It’s all fixed now.”
If you operate a data center or critical facility that has a power generator, there are some very easy pro-active actions that can be taken to mitigate the most common problem I observe: the generator fails when the power goes out. For most, that’s not the hoped for outcome of the capital they’ve invested in making their facility more resilient to utility power outages.
While there’s always a possibility that
shit this can happen even if various preventative actions are taken, the chances are far lower if a handful of items are paid attention to. I am not privy to the maintenance procedures at the California Men’s Colony waste water treatment plant, so I’m just using their outage as the thought provoker and not judging them.
When a generator “simply does not start”, rarely is that the entire story. Rather than being the root cause of the outage, it’s the manifestation of the maintenance and monitoring practices.
In my experience it’s usually a symptom of a lack of a pro-active culture surrounding the backup power system. Sadly, some organizations that invest large sums of capital into their backup power systems (and, presumably, whatever the critical load is they are protecting), don’t factor in proper operational costs and fail to implement appropriate procedures to see to it that appropriate preventative work is performed. This diminishes the return on investment on the capital invested in the entire system.
The failures then flow through to two areas:
The usual failure scenarios are one or more of:
- Generator fails to start (common)
- Generator starts, but fails under load (common)
- Generator starts, but no power reaches the critical load (less common)
The end result is the same: the critical load loses power.
In the case of a generator, here’s the practice I’ve learned to follow:
Weekly no-load automatic tests (usually this can be programmed into your automatic-transfer switch)
- What this verifies:
- basic generation functionality
- control functionality from the ATS to the generator (a simple cabling problem between the ATS and genset, even if both are completely operational and test out fine, can ruin your entire day)
- Labor involved should be to verify:
- genset actually starts on its own (checklist item)
- inspection of the gauges for anything unusual (temperature, voltage output, battery voltage, fuel levels, etc.)
- physical inspection of generator, looking for unusual sounds or animals that have crawled inside of it (I’ve had cats inside..)
- Junior technician or facility maintenance person, approximately 15-30 minutes one day per week
- Risks resulting from implementing this procedure:
- Nil. Won’t have a real load on it. Nothing will be disconnected during the test.
- A risk is that this procedure isn’t completed every week. I suggest requiring a small checklist report to be filed each week with a colleague or supervisor (and making sure they know to expect it so that if it doesn’t come they go in search of a reason why) to make sure something isn’t missed just because somebody “got busy”, was out sick, went on vacation, etc. To verify it was really done and not just filled out paperwork, listen for the generator tests (duh!) and have the run hour meter reading from the generator be one of the fields filled out (which should be going up every week).
Monthly (or Bi-weekly) Manually Triggered Actual Load Tests
- Takes the place of the weekly no-load every fourth week
- Mechanical functionality of the ATS
- Electrical functionality of the ATS
- Real functionality of the generator (many problems do not manifest themselves when the generator is running without a load or with only a very low load)
- That the facility is not exceeding the capacity of the generator (doh!)
- That there aren’t some weird charactistics of the load or the backup power system that are interacting in a way that will lead to power loss
- If you do not have UPSes for computer systems and other pieces that can’t lose power for even a few seconds, which are in turn fed by the generator, plan accordingly how to do this type of test. For a data center with UPSes, a monitored test can be performed. The worst that occurs if there is a failure should be a very brief loss of air conditioning while you switch back to main power so you can isolate what went wrong. Computers should run from the UPS batteries briefly.
Yearly (or Quarterly) Dummy Load tests
- What this verifies:
- Often the generator will not be under 100% utilization, even during the monthly actual Load tests. Some generator problems will not manifest themselves except under heavy load.
- How to do this:
- Have a testing and maintenance contract with a company that specializes in this. They can also assist you with other matter maintenance activities. Look in the yellow pages or speak with other folks who have backup generators — ignoring the folks that say they don’t do anything special to make sure their generator works when they need it of course. 🙂
Things This Solution Should Catch
Some of the causes of the inability to start, that would have been picked up under the above system, that I’ve observed are:
- Generator battery fails
- Causes: Age, No installed smart charger, Cabling disconnect during maintenance or due to loose connector that is shaken off during generation operation
- Basic care and feeding of the generator overlooked (it’s like a car: think oil, spark plugs, etc.)
- Empty fuel tank
- Bad fuel
- There are different types of diesel, depending on season and location
- Animal trapped inside
- Not good for you or the animal
- Power cabling from the generator to the load broken or poorly connected
- The power consumption of the load exceeds the capacity of the generator
- The generator is flakey under load
- This is why you must test the generator under load when it is not actually needed, both with the a dummy load and a real load (see dummy load testing above, for example)
- Transfer switch control wiring to the generator fails
- Loose connections, low quality cabling, too much water in conduits, construction smashes underground conduit (unknowingly even sometimes)
- Programming modified on the auto-transfer switch
- No longer activates generator properly
- No longer cuts over to generator under conditions desired
- Turned off. 🙂
- May have been an accident or the little battery in the ATS running lower (should be checked quarter with a volt-meter and replaced at least every year)
It’s good to be aware of these so that the staff implementing the system can understand the specifics of the problems they’re looking for. They do require a checklist system — something for a junior tech or maintenance person to perform/verify on a regular schedule. These items are easy and even “cheap” both in absolute and ROI terms. I’m reminded of a couple quotes I recently wrote in my journal (credit to Robert Rosenthal for these two sayings):
- “If the cheap solution fails, it may be the most expensive option of all.”
- “No one ever went to the board of directors and said, ‘The project failed but I’m proud of the fact that we paid next to nothing to implement it.”
- or unfortunately, depending on how you look at it ↩