Calling all you ITIL theorists, philosophers, pontificators and pundits. Marty is back: our follower from the real world, trying to make sense of ITIL on its home grounds, the operations of big iron batch computing. Marty asks what happens after a service is restored? What does ITIL call the function of undoing the damage done while a service was unavailable? I have a view - of course - but I'm going to stay quiet - for a while- and hear what everyone else thinks. So have at it.
When a train rolls by, the guys on shovels and brooms, track gangs, crews on the ground, crews on other trains, clerks, station-masters, everyone stops and watches the train and waves to the crew on board. Lazy? Hell no.
Complex systems are by definition broken. They will always break and sometimes they will break when everybody did what they are supposed to. Fixing the problem won't necessarily reduce the risk of another incident.
This BOKKE (body of knowledge known error) has been posted for a day or so, hundreds of views. I was sure someone would say "no you idiot, service impact analysis is right here" but not one. It seems to be true.
Problems suffer from the important/urgent dilemma. They are very important but struggle to get attention in the less mature shops over the incoming bombardment of incidents.
Vendors are making a fuss about their Root Cause Analysis (RCA) features in their tools. People Process Things once again: who says Root Cause is in the technology?