ITSM incident and problem: two names for three things

Debate around the definitions of Incident and Problem never seems to end.
Here's my take on the fundamental issue that fuels the endless arguments: we have two entities trying to do three jobs.

When a service breaks, we have to deal with three things in support:
1) Look after the users, as a group and as individuals. Help them, and apply any workaround to get them productive again.
2) Restore the service to normal operation. This may only need a reboot or rebuild or restore.
3) Remove the cause(s). Fix the fault, bug, missing patch, mis-configuration, mis-information, lack of training...

ITIL tries to cover all three of these activities as an Incident, and maybe sometimes under vaguely defined conditions also a Problem. We have two entities Incident and Problem trying to do three jobs: Contact, Interruption and Fault.


[update from comments below:]
To take a hospital analogy, the three roles are nurse (call), doctor (interruption), and specialist (fault).
In hospital patients hardly ever see the doctors. The nurses get you back to health.
The doctors deal with an outbreak of patients by looking at them all .
If it is nothing they have ever seen before they send samples to a medical research centre.
So which one of those is the incident?
And what is a problem? An unknown disease? How about when a surgeon operates? Is that problem management or incident? an oncologist using chemotherapy ? What about an Ebola outbreak?

Perhaps the analogy isn't a good one, because we often get incidents affecting many users with a single cause external to them. To drag our analogy along its like lots of people reporting with radiation sickness. They can't be cured until the single external source of radiation is removed.

BTW, i guess GPs are field support :)

[Even more update, from a comment I posted elsewhere:
You are still thinking of Incident and Problem as functions not processes, I.e. as teams. That a problem record is only required if we want to functionally (horizontally) escalate to a ”problem team”.

The problem record represents the thing we are dealing with not the people dealing with it. It’s the same thing whether the service desk or Level Two resolve it: if there was a fault causing an incident, and that cause was repaired, there should be a problem record regardless of who did it. The records don't exist just to track current work, they also serve as historical data, which is crap if we aren't opening a record for every problem.

Syndicate content