When "the condom breaks"

Every so often the things designed to protect are the very things that kill us.

I was reminded of that this week when an article about TABcorp's data center fire popped up in my inbox.

So the thing designed to protect their equipment , by putting fires out, instead destroyed the equipment.

That's gotta hurt.

You think you've done everything right and then something unexpected and weird takes you down.

Except that Tabcorp weren't the first to experience this...

Aside from these risks, these types of fire suppression systems are designed to remove oxygen from the room so the fire starves...which isn't great if you are a human in that room when it goes off.

The lesson here is that perhaps the data center should have looked into the system they had installed (or planned to install). Maybe someone could have noticed the other incidents in the industry news or media and asked some questions about whether something similar may apply to them. Some remediation could have been done or a different system chosen.

Often I am asked about software systems for businesses and one of the first things I tell people is:

  1. Find someone that uses the system similarly to how you plan to.

  2. Ask them some questions about it - do they like it, what would they do differently etc.

  3. Ask them what problems they had and analyse how likely you are to experience the same issues.

It's smart to learn from your mistakes but it is cheaper to learn from other peoples' mistakes.

In the 80s, when I worked for a major bank, we relocated out data center.

In those days data centers were full of very large mainframe computers.

Our new data center had a diesel backup generator to cater for mains power failures.

Mainframes don't like losing power at all let alone suddenly. They really need to run 24x7 in very constant and controlled conditions.

The people involved with this system had to test it one weekend.

They switched off the external power to the building and the big diesel kicked in as expected.

Good news!

...except that the system detected the power being restored...by the generator...and thought it was the mains back on...

...so it turned off the generator while there was still no mains...

...it then detected the lack of mains power and restarted the generator...

...it then detected power being restored (I am sure you know where I am going with this by now)...

Yep, these mainframes that didn't like losing power at all were going up and down like yoyos.

The lesson here is that maybe a better test should have been devised.

It is possible to test these sorts of things without the mainframes being put at risk. After all, it was a new building. At the very least the backup power generator could have been tested BEFORE the mainframes were even on site.

I suppose the overarching theme of this post is "remain skeptical" i.e. just because you "have protection" don't assume that it will work let alone work as well as you expected.

Many of the systems we use to protect ourselves require maintenance, vigilance and occasional tweaking.

If your IT person says something is "set and forget", then I suggest you set and forget them.

You always need someone in the loop who cares and IS actively caring.


49 views0 comments

Recent Posts

See All