SLAs: measuring an ITSM service as a black box is essential

Measuring a service as a black box is essential. Simply because it is impossible (in a practical sense) to discover and measure every link in a service chain, it is impossible to build a complete composite view of a service's performance bottom-up from the component CIs.

It is the boundary problem I have spoken of before. Something is always outside your range of view: telco links, proprietary hardware, "magic glue" [EDI, EAI, SOAP, UDDI, LDAP, CORBA ...], the internet, service providers, outsourcers. Web Services, SOA and grid computing are not making it any easier.

The only way to accurately measure performance and availability SLAs is to monitor the user experience, i.e. black box the service. All the discovery, analysis and component-monitoring tools are there to alert us and to sometimes tell us why a service is not meeting SLAs, but they never(?) give us the 100% picture by building a service view bottom-up.

(In the same way, the meta-view of the service can only be completely built top-down. The top layer links are all conceptual, perceived, existing only in wetware [people's heads]. Further down the stack there are some links that exist physically but lie outside the boundary. Both these types of links in the service-view have to be manaually built. This is why I say CMDB can not be fully automated.)

We are lucky that such black box service monitoring is not too hard. Measuring every single end user is impractical, but a good view can be achieved by selecting and measuring a representative subset of users by deploying an agent to their desktops. This gives us SLA reporting, and SLA monitoring and alerting.

It also overcomes the old problem of users complaining and IT saying "well, we can't see any problem" using their bottom-up composite view.

Finally, lots of performance issues are user perception. Users never remember how it performed six or twelve months ago. Their frame of reference is always the last week or month. If IT says it has improved over the year, they will insist that IT is not measurting something. End-point monitoring at the desktop gives objective stats of the measured experience to show the trend over time.

You see, there are some technologies the IT Skeptic actually likes :-D

Comments

I agree, there are some useful technologies out there

When I was working on an Internet Banking application, we looked at a product called NetQoS it was a client side monitor but you did not have to deploy a client side agent everywhere. We implemented it off a SPAN port in the central network core where it looked at the packets and their send/receive tags. That is for those of us (like me) that don't speak networkese the time the packet originated from the source and was acknowledged (responded to) by the dest.

The product kept stats for 13 months so you could see yesterday, last week, last month, last year to determine if the client experience was degraded - you could then target the deployment of client side agents as necessary (and quite quickly) to repsond to these flags. In fact with the appropriate thresholds you would be alerted to client experience problems quite possibly before the client even noticed.

(I do know that the product was implemented with some success in a major vehicle manufacturer)

The very nature of an IP network means that you cannot guarantee the paths that packets will take from a source to a dest and back, so no amount drawings is going to help that one.

As to the drawings, I haven't seen any yet that indicate the routing rules or traffic or its importance to the business (that is the importance of particular sources and dests of packets)

But then network types have always been a little odd to those of not from their world.

Have a happy easter everyone if I do not post before then.

VaioBoyaus

Looking in her eyes I felt as near to Icarus as to the city before me as to the death of the sun. (The Long Road of the Junkmailer, Patrick Holland, 2006)

The challenge that tools like NetQoS have

The challenge that tools like NetQoS have is to be service-aware. That is, they see packets, so they can tell what the user experience is overall, but can they differentiate between packets for the Payroll application and packets for eMail and apckets for the core business transactional system? One user might use all three services but have quite different SLAs for each. I don't know if NetQoS can or can't do this, but I know it is tricky if the tool is not actually out at the desktop tracking what applications and screens are executing.

You comment on the challenge is too true

It can alert on all of your scenarios, so long as the network address of these services is known (presuming you are in an IP network of course). When we were trialing it, it became quite obvious that the clients network services provider was not doing 100% of the job they were being paid for. Suddenly the shutters came down on the SPAN port and the tool was left high and dry, despite numerous protestation from us (we were charged with managing the services provider) and the shared customer.

I do agree with you though Skeptic everyone has to enter the scene with the desire to improve the client's and their end-user's experience - not just meet SLA's.

Looking in her eyes I felt as near to Icarus as to the city before me as to the death of the sun. (The Long Road of the Junkmailer, Patrick Holland, 2006)

The danger of being precisely wrong

I am 100% aggreement that top down is the way as it factors in the user/business.

I would add that the great danger of going bottom up is losing context and falling into the trap of being precisely wrong. Someone (within another silo) then see's how much time and effort you have put in to produce the "details" actually assumes that the outputs have integrity.

RedMule

RedMule Blog

Syndicate content