With Availability Management there are two main aspects to consider...two worlds if you like...(a) Pro-active and (b) Re-active.
Most organisations focus (in their production environments) on just (b) Re-active - which usually consists of constant firefighting or trying to minimise the time taken to restore normal service in production (say, after a major incident).
Good practices for (b) include:-
- SLA's for all IT functions to ensure that response and fix times meet business needs. Most don't.
- Empowering teams to "get service up" thereby protecting the level of availability
- "mopping up" after incidents by carrying out "post mortems" to look at:- what went wrong, why, underlying root cause, root cause elimination programme/initiative, tracking progress through to succesful conclusion.
- Establish a firm "post mortem" process that's inclusive and does not blame anyone or point the finger (people are put off and don't go again!)
- Make eliminating root cause(s) part of everyone's goals. Availability figures for uptime will soon increase!
- Publish weekly and monthly availability figures to everyone. Everyone in the IT function plays some part in preserving or improving the availability of systems and services.
As for (a) Pro-active Availability Management...well...if you can make the time through the effective use of the techniques outlined above, then you have more time to do (a)
*Remember "prevention is better than cure" and it's the same for availability*
Some ideas for (a)...
- Build availability in - from scratch - from the beginning of the project/new system inception or whatever it is you call it in your organisation.
- Gather true end-user requirements and vigorously protect them through-out the project lifecycle.
- Ensure that the user community know what it costs to provide such (typically) high levels of availability. Ensure that the sponsor in the project knows and is prepared to fund (and continue to fund) this level of expenditure.
- Design not just high systems availability - but high service availability. In many ways this is harder than systems availability! Anyone can buy "two of every box" and "fault resilient this and that" but have you ever bought a fault tolerant human?! We all are to some extent but seriously, to protect systems availability - you need highly available underpinning services. These can be services your business utilise every day, or perhaps internal IT services that support the production environment. Availability is a constant battle. It costs money to do properly. It takes time to see the results (so measure and report on them frequently!)
Final thought...How much does 0.01% loss of availability cost your business area? If you don't know - you should find out. Sometimes the business doesn't need 99.999% and they will be comfortable with 98.5% when they see the incremental costs involved in "5 nine's" Availability.
Feel like adding your thoughts to this post? Simply add your comment by clicking below...