The Seven Sources Of Problems #3 - Upgrades/Patches
In this third instalment of “The Seven Sources Of Problems” we take a closer look at a specific category of change – namely, “Upgrades and Patches”.
The first question that needs answering is, “Why have you separated this out from the last source (New Changes)?”, writes Robin Yearsley.
The answer lies in the fact that, in our experience, is deserves a special category all of it’s own because “Upgrades and Patches” are:-
- A key part of maintaining the health and supportability of Infrastructure (which drives Service)
- Frequent, technical and often challenging to apply
- Driven by Vendor Technology roadmaps
- Require special consideration (e.g. When to apply? How to test? How to back-out? Etc)
When you think about the entire Infrastructure that enables your IT Services – think through the sheer number of components. Then, multiply this by the number of bug fixes, OS patches, PTF’s, Hot Fixes and any other essential upgrade that they need – just to maintain their “maintainability” within the letter of the law within your support contracts with Vendors.
Falling behind can lead to you being ‘forced’ to perform an upgrade just to remain within the terms of your support contract. Moving too close to the latest release can lead to you suffering from “first fault found” syndrome. There is a delicate balance to be maintained between “leading edge” and “bleeding edge”. For each It Service you should use your configuration maps to determine what strategy to apply to upgrades and patches for your underlying Infrastructure.
Do you have such a process in place today? If you do – great! If not, how can you really assure the ongoing availability of your Service, given that it may or may not be error prone or out of contract?
Let’s consider some of the lower level details with this potential source of problems. There are several factors to consider:-
1. Being “aware” of a patch/upgrades actual existence
- This sounds a bit simplistic but part of the challenge in maintaining the correct balance and level of upgrades/patches is actually knowing what’s out there and available to install in the first place. Different vendors provide different levels of service and your contract will also differ depending on who signed the maintenance agreement, when the system was installed and what tools you have in place to search/validate/upload/test/install your upgrade/patches
- Most Technology Vendors use the “post it – and they will come if they need it” approach. This does three things. It enables the Vendor to say, “Go and download this patch and try it in your test environment” (so the bucks with you). It also means that you have to keep up with what’s out there and perform the validation and risk assessment yourself. Thirdly, it provides the Vendor with another revenue opportunity, in that they can offer to do the ongoing risk assessment and validation for you – for a fee – on top of the maintenance contract that you already have with them. (nice money if you can get it!)
- You need to develop an excellent “radar” on a per component, per Vendor basis. For each of your vendors and the components they provide map out exactly how, and how often new patches are released. Ensure that you have an automated alerting system to advise you of any new critical patches.
2. Finding the right balance between ‘unmaintained”, ‘leading edge’ and ‘bleeding edge’
- As mentioned above, the priority and agreed availability levels (within your SLA) should assist you here. For those higher rated components you will need to be more “bleeding edge”. Where installing a patch represents an operational risk (above what you would normally expect, say, you’re the first company to use this patch) then this should be performed with everybody understanding and agreeing to these risks.
- Try to determine the inherent “stability” level of the component in question. If the component is complex, new or highly dynamic then more frequent and closer inspection of it’s patch status should be undertaken to protect your environment.
3. Maintaining your organizations overall adherence to recommended upgrade/patch schedules
- Create a chart (from your CMDB and Vendor Schedules) that drives validation activity.
- Carry out spot checks from time to time. This is good from an audit perspective and often reveals interesting nuances in the Vendors release policies
4. Having the physical capability to actually test the next upgrade/patch (install and back-out!)
- Have you got a physical environment where you can test the new patch without breaking Service?
- Does the test environment enable you to ‘load’ the system, like production, to test for how the component works in reality?
- How easy is it to back-out the patch, once installed? “Bleeding Edge” patches are sometimes recalled or superseded.
- Have you a formal interface in place between “Patch Management”, “Incident Management”, “Problem Management” and “Change Management”?
5. Managing Change through your Change Management Process
- It goes without saying that installing new patches still fall within the control of Change Management, but time and time again, due to the complexities of patches, technical support folks often seem to have created their own ‘fast path’ for these types of changes where certain steps in the procedure are ignored or not subject to controls. This is dangerous and should be reviewed immediately.
6. What to do if you are ‘forced’ to accept a new upgrade/release in a critical (service down) situation
- It happens. You suffer from a problem and in order to eliminate the root cause the Vendor recommends that you install their latest patch. Trouble is - you’re the first in the world to receive it. They cannot guarantee, because everyone’s environment is different, that it will eliminate the problem – but it’s the best you’re going to get for now. You have been issued a ‘fait au complete’ (sometimes known as a ‘hospital pass’!) In situations like this – you need buy-in from Senior Management and you need to form an overall opinion based on current services pain V’s the risk of something else going wrong! Sometimes it’s tough in Service.
7. The resources and skills you absolutely require to pro-actively protect your environment
- One recommendation that I like to make – is to ensure that you have the right skills, capability and attitude leading the technical upgrade/patch function
- It’s often a monotonous chore validating the availability of new patches against the Infrastructure estate, but one that can significantly protect your environment from new problems. It needs to be correctly resourced.
To wrap up Source #3 “Upgrades/Patches” let me finish off by saying this… as in most things in life… it’s a bit of an art, rather than a science. No two environment are ever the same – so Vendor are incapable of testing all eventualities on their test systems. So, prioritise, automate alerts, resource correctly, interface to other key areas, perform spot checks and maintain the “expect the unexpected” mindset.
Tomorrow, we’ll move onto our next source of Problems #4 – Your Vendors. Interestingly, ALL Seven sources of Problems can apply to your Vendors, but more about that tomorrow.
If you have any thoughts about this post – please have your say – create a comment below…