What happens when an air conditioner water leak meets a server rack? The water leak wins, by a landslide.
What happens when an air conditioner water leak meets a server rack? The water leak wins, by a landslide.
It started with a very early morning call. One campus had a server down, but checking remotely nothing seemed to be out of the ordinary.
Unfortunately, that was just the tip of the iceberg. A wall mounted air conditioner had sprung a leak, and the water had dripped down perfectly to just hit the edge of a loop of the cable beneath that was connected the monitor on the rack. The water dripped down that cable into the first server, filled it, and then over flowed down to the next server, a wireless controller, etc. and so on down the rack.
The good news… some really expensive switches were on the top of the rack, and they weren’t affected. But the router, wireless controllers, servers and the UPS were toward the bottom, and they all were exposed to the water leak. This is how most racks are organized, with the heavier stuff on the bottom, because it’s a pain to lift those things up over your head to mount them.
The first step was to immediately disconnect everything on the rack from power, and then remove it all for inspection to a safe and dry location. Then we communicated with school administration about what had happened, that we were working on an assessment for time to recovery, but that it wasn’t likely to happen in under 24 hours. This allowed them to make some informed, executive decisions about events that were planned for the next day.
The equipment was then opened up and laid out to dry to the best of our ability with large fans. We had some backup equipment we were able to repurpose, and were able to get Internet access back up by the end of the following day.
The complex question was what to do about the equipment that, once fully dried, we were able to boot back up. It held up through the weekend, and this started some heavy internal debate. We considered water remediation repair service providers, but in our own experience that is rarely as successful as you’d hope. We consulted with other tech directors (super helpful to get a second opinion) and other tech consultants, and the interesting thing about that is how often this kind of thing has happened.
In the end, we decided that it wasn’t a responsible path to have our core network rely on components that had been exposed to water damage.
The scenario that clinched it: Imagine hundreds of students poised to start taking an online standardized test. They’ve spent weeks preparing, and their anxiety levels (despite our teachers’ best efforts to the contrary) are high. The students start taking the test, and the network drops. Repeatedly. Imagine their frustration, tension, and likelihood of success in this endeavor.
In explaining this to the business office, I estimated the costs at around $100k. That ended up being higher than the end result, as some of the items we were able to combine. For example, we had slowly added multiple smaller UPS units over the years, and were replacing those on a schedule. Instead we purchased one larger UPS unit for less cost.
Lessons Learned
The thing I wish I’d known earlier, aside from the obvious, was to take photos and document every step of the way. Actually the first thing I was told was that we didn’t have flood insurance, but this was classified as an equipment failure (on the part of our air conditioning unit) and so it was covered by our policy.
The thing I am extremely grateful for is a dedicated and tireless tech team who worked late nights and weekends to get that campus back in action, and that we have a thorough backup system in place so all of our data was restored in no time.
Yes, the leak was repaired, and there’s now a chute add-on to prevent any water leaking from that AC unit from reaching the server rack again. (Fingers crossed.)
Recommended!
Take a look around your server closets and other tech related spaces. Look for any physical dangers that are present, that you may have become immune to noticing over time. What’s in the ceiling over head? What can you do to better protect the hardware that you’ve spent countless hours researching, selecting, installing and configuring?