HelpAboutBlogBOKKEDPodcastsTopicsWizardLinksBooks

Skep's Pick: The IT Skeptic Awards for 2008 This link is here because...(hover)

Great paper on failure of complex systems

It is not often you read something that completely changes the way you look at IT. This paper How Complex Systems Fail rocked me. Reading this made me completely rethink ITSM, especially Root Cause Analysis, Major Incident Reviews, and Change Management.

It dates from 1998!!. Richard Cook is a doctor, an MD. He seemingly knocked this paper off on his own, it is a whole four pages long, and he wrote it with medical systems in mind. But that doesn't matter: it is deeply profound in its insight into any complex system and it applies head-on to our delivery and support of IT services.

"complex systems run as broken systems"

"Change introduces new forms of failure"

"Views of ‘cause’ limit the effectiveness of defenses against future events... likelihood of an identical accident is already extraordinarily low because the pattern of latent failures changes constantly."

"Failure free operations require experience with failure."

Read this paper. And READ it: none of this 21st Century 10-second-attention-span scanning. READ IT HARD. Blow your service management mind.

Does this change any of your ideas of ITSM? Should any of these ideas be in ITIL?

(My apologies to whoever sent the link to me. This old brain has forgotten and LinkedIn makes it almost impossible to find your message again! Thanks. Remind me and I'll credit you)

[For more on this see this later blog post

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Normal Accidents

People wanting to read more on complex systems and their failures will find the book "Normal Accidents: Living with High-Risk Technologies" by Charles Perrow of interest. Written in layman terms, it is very interesting. ISBN 0-691-00412-9.

More Deming than Maslow

"safety cannot be purchased or manufactured..."
Replace 'safety' with 'quality' and you have a quote from Deming.
This was my first exposure to "hindsight bias" and I appreciated having some dialog on the role of people in accidents to support that term.

Cook's Book

Resilience Engineering
ISBN: 978-0-7546-4641-9
This book appears to contain a paper by Cook entitled "Resilience engineering: chronicling the emergence of confused consensus"

suggested books

Agreed - an excellent document

Often when we have a major incident, there is single-point-of-failure elimination campaign. Which is kind of humorous. During the almost two dozens of years I have been involved in such things, very rarely is there such a thing - the elusive SPOF; the one thing which we can fix that we prevent *all* future failures. The silver bullet. With very few exceptions, major incidents involve at least five failures - in my experience. Trivial failures only cause trivial incidents.
I think these things, in the context of complex systems are a bit like Black Swans (See the book by Nassim Nicholas Taleb). A Black Swan is by definition unpredictable. So, in theory. most service disruptions in complex systems are NOT black swans. They would be predictable if you we all of the data. But large corporate IT systems are too dynamic to have all of it in order to predict reliably.
Arguably specific human behaviour is not predictable; will a given operator react in a predictable manner in the timeframe required? We may know what the documentation says, but we cannot account for all possible human errors.

CMDB is like building giant temples to the gods

It seems to me CMDB/CMS is part of the denial of IT's imperfection and unpredictability; it is a desperate attempt to get control over the uncontrollable. CMDB is like building giant temples to the gods to make the crops reliable.

This is a great paper

Way back in the late 70s, when I was a young technician and thought I knew it all, I read a book that changed my thinking forever: SYSTEMANTICS: How Systems Really Work and How They Fail, by John Gall. (now titled The Systems Bible) http://en.wikipedia.org/wiki/Systemantics

This paper reinforces and expands on the lessons in that book. What a wonderful reminder that complexity itself creates unique problems.

Excellent Paper

Very thought provoking and well worth a read - the section "Hindsight biases post-accident assessments of human performance" made me think immediately of the blame seeking postmortems that happen after failures in the child protection systems in the UK. A baby was the victim of unspeakable cruelty recently in Haringey, London and afterwards the resulting enquiry castigated the professionals involved in the child's care. "It seems that practitioners “should have known” that the factors would “inevitably” lead to an accident." could be a quote from the inquiry report. I wonder how well informed the authors of such reports are in the difficulties of rigorous and effective analyses of system failures. I am doubtful.

Alex Jones

Ackoff

Russell L. Ackoff has passed away at 90 years young. 4 p.m. Oct. 29, 2009.

He was one of the greatest organizational thinkers of the last 100 years. Condolences to the systems thinking community, his students, readers and colleagues.

Nice paper - thanks for

Nice paper - thanks for posting! Yes it's very apposite to link this to IT. It also reminds me very much of the work of James Reason (1997) and his 'swiss cheese' model of accidents and failures: i.e. there are multiple points of failure (holes) that all have to line up for failure to occur.

Another recommendation

Bignell & Fortune's "Understanding Systems failure" Open University.

Dated case studies but very readable. I presume out of print now.

second-hand

A few available second hand: Understanding Systems Failures

another recommendation

Here's another one. Check out Phil Simon's first book: Why New Systems Fail: Theory and Practice Collide. It addresses many of the same topics.
http://www.amazon.com/Why-New-Systems-Fail-Practice/dp/1438944241/ref=sr_1_1?ie=UTF8&s=books&qid=1257517726&sr=1-1

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Syndicate content

Buy your books here to support this blog: