Dev needs to understand what Ops is for

Submitted by skeptic on Thu, 2011-03-10 01:10

Share this post with

A recent blog post made me angry. This is one cause of the Dev-Ops divide: ignorance of what the other side does. I expect some of that in the trade, down at the coalface amongst Dev and Ops practitioners. I don't expect it from Forrester analysts.

Mike Gualtieri says "I Don't Want DevOps. I Want NoOps." This shows a clear contempt for the value of Ops. This appears to stem from a profound ignorance of what Ops does:

Sure, the ops guys efforts are critical to our applications because they have to run on something. But, developers should look to spend more of their time getting closer to the business, not getting closer to the hardware

I don't doubt that I make some folks just as angry with my comments about Agile or DevOps, but I like to think I have some knowledge of the development world when I make those remarks. I've been a programmer, a software library developer, a data modeller. I've cut COBOL, 4GL, Realizer (a GUI OO), VB, HTML and PHP. I've also sold tools for coding, CASE, database, directory, code analysis, debugging, testing, SDLC, project management, development methodologies... you name it. I built my own LAMP stack, dammit. I spent a decade immersed in the development world until I came over to the dark side.

But Mike thinks Ops is there to provide running servers. This would suggest he's never been in a server room let alone worked in IT Operations. Somebody please introduce the man to the concepts of risk, cost allocation, portfolio and demand, security, identity, access, provisioning, availability, capacity, continuity, service levels, problem, incident, requests, service desk, and most of all what "change management" really means and why we have it. Mike thinking Ops is about servers is like someone thinking Dev is about code cutting.

It's possible Mike is being provocative rather than ignorant, however I would expect to see a tongue firmly in cheek somewhere in the post if that was the case and I don't. "NoOps means that application developers will never have to speak with an operations professional again." Oh for god's sake! Only if they are gonna support it. Only if they will be providing their own phone lines and email addresses to do so. Only of they'll be providing the round-the-clock monitoring, archiving, backups, and disaster recovery. Only if they aren't planning to ever hand something over to those entrusted with being its long-term custodians. I'm not sure where Mike is writing from, but on my planet that doesn't happen, not even in Agile development. Successful Agile development involves close collaboration between Dev and Ops to provide Dev with the flexibility they need in dev, test and prod environments, whilst still protecting the organisation's interests through ring-fencing the change to prevent anything else getting broken, managing the risks, defending security, ensuring integrity and so on. And eventually the iterations slow down as the customer loses interest or funds, the Swiss-balls disappear along with the project team, and somebody is left holding the semi-documented mess. Accommodating Agile developers is much like accommodating someone else's three-year old in my house. I provide defined safe areas, remove sharp objects and lock the cupboards. Once they're gone we return to the grown-up lifestyle. [Come on Mike, if you're going to have contempt do it properly].

It gets worse in Mike's post:

NoOps will achieve this nirvana, by using cloud infrastructure-as-a-service and platform-as-a-service to get the resources they need when they need them.

This tooth-fairy mindset has been addressed often enough elsewhere, including by me, but once again it shows total lack of understanding (except in the use of the word "Nirvana").

Mike seems to be on a roll in this anarchist approach, with a more recent post "Want Better Quality? Fire Your QA Team." It's all very exciting, just as anarchism is to university students, but it is equally as stupid, destructive and detached from reality. Businesses didn't build this immense IT Operations infrastructure for no reason. And contrary to some opinion, the reasons are still there. You'd think we would be grown up enough to get past this "but the rules are all different now" bovine excrement.

In New Zealand, kids got their driver's licence - until recently - at 15 years old. The few anecdotes we have about Agile success (and Mike's story of one client getting away with dumping QA) are like a 15-year-old saying "See Dad, I drive safe. I, like, got to town and back without, like, killing anyone". IT Ops is about managing IT risk on behalf of the business, and most Dev folk wouldn't recognise operational risk if it licked them.

Forrester, a word of advice if I may: you have good guys from all over the industry. You might want to learn from each other more before sounding off.

Published in The Skeptical Informer, April 2011, Volume 5, No. 2

Previous story: ITIL V2-V3 Managers Bridge
Next story: Don't run IT as a business, run it as part of the business

Comments

Submitted by skeptic on Fri, 2012-06-08 01:45.

Common sense on NoOps

I enjoyed this post from Mike Loukides on DevOps and NoOps.

I found myself able to agree with much that he had to say about DevOps and Operations in general. Specifically though, I cheered his comments about NoOps:

Operations doesn't go away. Responsibilities can, and do, shift over time, and as they shift, so do job descriptions. But no matter how you slice it, the same jobs need to be done, and one of those jobs is operations... NoOps is a movement for replacing operations with something that looks suspiciously like operations... modern applications, running in the cloud, still need to be resilient and fault tolerant, still need monitoring, still need to adapt to huge swings in load, etc... what about NoOps? Ultimately, it's a bad name, but the name doesn't really matter. A group practicing "NoOps" successfully hasn't banished operations. It's just moved operations elsewhere and called it something else. Whether a poorly chosen name helps or hinders progress remains to be seen, but operations won't go away; it will evolve to meet the challenges of delivering effective, reliable software to customers

Amen

Submitted by Tim Coote (not verified) on Wed, 2011-04-13 11:02.

what a useful debate

Can't language trip us up?

As Frank notes, IT's a long way from delivering value to customers. Owning the technology and buildings is comparable to the situation when the auto industry used to own rubber plantations to ensure that they could get tyres.

IT as a function is certainly both very inefficient and not very effective. My analysis is that the division between building and running is part of the problem. As originally envisioned from the business design stage, the role of architect would consider the lifetime aspects of an application and enterprise issues about patterns of use of technology. I don't see this role much in evidence, and for most organisations it's impossible to get sufficiently current and accurate information on how applications relate to the underlying components to do a good job at managing variation anyway.

More typically, the build phase makes it very difficult for the run phase (including future change and retirement) to work. And the run phase typically does not have robust enough control to ensure that it does not get dumped with applications that have ill defined non-functional requirements. The leadership of the build phase is incentivised to deliver quickly, within budget and a parting statement that 'it was alright when it left here'. The run phase leadership is focused on predictable service levels and cost management compared to last year. The run organisation has virtually no real hope of understanding what drives the performance or failure of applications in the estate, since it does not usually have a good view of the design, nor the implementation. As a result, there is invariably too much of the wrong stuff in the estate with a distorting capex recovery model.

An approach to improving the situation would be to use an operating model where one individual owned the whole lifetime aspects of an application. Some aspects of Agile delivery could make this much more tractable. In particular, TDD ensures that you have at least got a repeatable test for functional and non-functional aspects of the application at its touchpoints with other systems and with underlying components. Continuous Integration and Continuous Delivery (not strictly and Agile discipline) give operational tools that simplify incident management. A more joined up team with a single point of responsibility would improve the operations experience. Whether the economics work is a moot point at the moment, but, given the inordinate and largely invisible waste, particularly in larger run organisations, there quite a lot of savings to re-use. Agile's almost collectivist approach to development where everyone is interchangeable is not scalable.

If TDD is delivering the right set of tests, and the reality of component configuration changing the performance and error rate of an application is accepted (IBM has some great data on this), then what is currently regarded as Operations' domain actually needs to be managed through the source code -> running system pipeline. In this scenario, what Mike's describing as Operations does not exist. With an application centric operating model, smaller overall teams can make rational decisions about how to meet the non-functional needs of the business with a much shorter communications chain.

I think that this is both NoOps in Mike's sense and how devops is envisioned. It does need an architectural wrapper to get organisational consistency, but at least there is a more tractable problem if Agile's refactoring is part of the mindset.

Tim

Submitted by skeptic on Wed, 2011-04-13 11:23.

tribalism

Tim, you made complete sense for the first three paragraphs then none at all. i agree completely with your definition of the problem and not at all with your solution. In one breath "Agile's almost collectivist approach to development where everyone is interchangeable is not scalable" and in the next you want "smaller overall teams ... with a much shorter communications chain". How's that scale? How does "an application centric operating model" work in an enterprise? The moment you have more than one application, you have duplicate operational teams bumbling all over each other. That's where we were in the early Sixties. That's why we have centralised Ops.

Another reason we have centralised ops is specialisation. Smaller teams means multi-skilling. That's not what drives progress - specialisation is. I gave up fixing my own car and plastering my own house long ago. Good luck finding all those super-skilled staff if Agile ever scales. I wrote about that today.

The third reason we have centralisation is economies of scale. If you think you can run a large enterprise's IT cheaply with many small application-centric teams, you're on drugs.

So it won't work, you can't find the super-humans, and it is inefficient. Nothing you said in the last couple of paragraphs sounded anything like my planet.

What we have is a cultural problem. We have the natural consequence of any large community: tribalism. The dev tribe has fallen out with the ops tribe. Slicing the org in different ways won't help: the website tribe will war with the warehousing-app tribe. Agile and Lean and DevOps are thinly disguised attempts by the dev tribe to overthrow the ops tribe. That's simply destructive. We can address tribalism without entirely rebuilding IT. In fact I think the solution is called service management.

Submitted by Tim Coote (not verified) on Wed, 2011-04-13 14:40.

re: tribalism

Hi Rob

I'm clearly struggling to get my point over very well. Most of the cases I've reviewed are at the scale of the largest enterprises (in IT terms), and part of the problem is too much specialisation, leading to groups that actually deliver negative value to the business, but which are invisible to management. In one case, such a group was about 2,000 people, which was needlessly shadowing another 2,000 people in an outsourcer.

I see many IT functions 'investing' in superfluous technologies, failing to exploit them to support business applications and then overcharging the business (through the rest of IT) for the poor investment decision. This problem becomes a serious cost and a bottleneck: you get scenarios where to start any new piece of work your minimal sized team with sufficient skills can be in the dozens, and the test landscape becomes a critical resource.

I don't mean that small teams scale well, but it's easier to manage smaller teams where it's possible to shrink them, and I'm hypothesising that it's possible to shrink the overall team size by changing the operating model, including breaking down the Dev piece into smaller chunks so that requirements can be better understood before they are implemented.

Service management is the right starting point, but I see that as an architectural issue, not an operational one (I would wouldn't I?). Every application should have a service management framework and suitable supporting tools in the right places in the business applications.

The challenge that I see at scale is that the structure of IT Operations by technology makes it even harder to get service management disciplines to work:
- there is limited leverage to agree service descriptions or levels with the relevant business owners (who are organised by business application)
- the build project scoped out suitable control, test and monitoring for the applications
- the implemented systems drift from the original design with no one retaining a good enough view of how the application behaves in its environment to diagnose and fix any issues, leading to even more bloat in the deployed technology (eg all of the hardware is scaled up, rather than identifying and removing the constraining resource, which is often in the software stack).
- regression testing of any change to the application or any of the components that it depends on is very difficult and expensive as it's mostly a manual process if it's well defined at all. This makes change very risky: compare this to continuously changing large scale apps like Google's.

I've not recently had direct involvement with big builds, but my ex colleagues tell me that they have been able to get the Agile concepts that I mention to work at scale to great effect. Other Agile concepts (like getting everyone to be equally multi-skilled) do not strike me as being likely to scale.

I think that an application centric organisation can be made to work as that's how larger internet systems are now being built. There will be conflicts between different groups, but at least they can be resolved based on meaningful business metrics, rather than second guessing where the value add is.

Am I making sense yet? Can I get the coherent para count above 3?

:-)

Tim

Submitted by skeptic on Thu, 2011-04-14 19:35.

Hi Tim All 7 paragraphs make

Hi Tim
All 7 paragraphs make sense to me, but I think you take a dated view of service management. I'm talking about SM as a portfolio management approach, the service lifecycle. To me service-centric is more progressive thinking than application-centric.

Submitted by guerino1 on Tue, 2011-03-22 17:22.

DevOps

Hi Rob,

"Mike Gualtieri says "I Don't Want DevOps. I Want NoOps." This shows a clear contempt for the value of Ops. This appears to stem from a profound ignorance of what Ops does"

If I may, I believe Mr. Gualtieri is pointing to the fact that "Ops" (i.e Application & Infrastructure Operations) is one of the furthest things from what most enterprises would consider to be their core business activities. For example, if you're an automaker, competitively designing, marketing, manufacturing and selling automobiles is your Core Business, not running and supporting the applications, systems and infrastructure that enable your business. In theory, these latter concepts (apps, systems & infrastructure) are mostly commodity-type enablers that every enterprise has to deal with, in most cases, unwillingly.

My interpretation of Mr. Gualtieri's statement is that in a world of Core vs. Chore, where Core is "Core Competencies" and focuses specifically on value-add business activities and Chore is "Non-Core Competencies" and focuses on those activities that are non-business competitive, enterprises (and their leaders) would prefer to focus on Core activities and eliminate as much Chore as possible. We all understand that every enterprise has a limited amount of funding, resources, time and energy to allocate to the things they want to achieve. In the end, every leader who owns or runs a business wants to focus, as much as possible, on Core activities. This is not to say that the people whose job or career it is to focus on Chore activities don't work hard or deliver solid results. It's just that Chore activities are pure "expense" and drain from activities that generate revenue, profits, and innovation.

In a utopian society, we wouldn't have to allocate time, money, and energy on the things that don't add, directly, to competitive value. However, we all know and acknowledge that we live in a non-utopian world and every business will always have its share of Chore activities to deal with. Mr. Gualtieri's "nirvana" is not achievable for many reasons. One big reason is that, over time, a great deal of what is considered to be competitive IT (a small part of most IT budgets) will always be turned into commodities, ultimately creating new work and need for Ops.

The simple fact is that in mid-sized to large enterprise, the majority of the cost for IT is not tied to the enterprise's innovation and revenue generation activities. Any solid leader would want to tip the scales so that the money that goes into Ops would be reallocated/reinvested into innovated and profit generating activities.

Again, I don't see any of this as being a personal or professional jab at the people who perform the services in Ops. I see it as more of a jab at Chore (i.e. non-Core) business activities. (However, I don't know mR. Gualtieri and can't directly speak for him. You may be right and he may just not like Ops people.)

Regarding Mr. Gualtieri's comments on firing your QA team to improve your quality... let's just agree to not go there! ;-)

My Best,

Frank

The International Foundation for Information Technology (IF4IT)
Open IT Standards & Best Practices

Submitted by Mike Gualtieri (not verified) on Thu, 2011-03-10 16:24.

Don't You Want Progress?

Firstly, I sincerely do appreciate the detailed analysis of my DevOps/NoOps post.

Thanks for introducing me to the "concepts of risk, cost allocation, portfolio and demand, security, identity, access, provisioning, availability, capacity, continuity, service levels, problem, incident...". These are all wonderful roles for Operations professionals. The question still stands though: "Why do applications developers need "close collaboration"? How about instead a little bit of infrequent collaboration so application developers have more time to do their job. Collaboration is a very popular word, but what does it mean in this context? Does it mean saying "Gee, that server needs more memory." or "We should really increase the JVM Heap size" or "The new Web Application Firewall (WAP) is slowing thing down." or "Maybe we should add another server". I am sure you don't advocate using DBA's to help application developers create database schemas. Do you?

I have developed my fair share of applications from complex enterprise applications to more straight-forward eCommerce Web sites. I learned early on that Ops (or anyone else) for that matter can save you from yourself. Bottomline: You have to understand your platform and design an architecture that provides high availability, performance, scale, adaptability, and security that is cost effective.

I believe DevOps is the result of:
1. Application developers who don't know their craft enough to design applications that are highly available, scalable, performant, and secure.
2. Operations professionals who don't know enough about software architecture to do their jobs behind the scenes.
3. Collaboration for collaboration sake.

On-demand infrastructure resources in the cloud now provide the catalyst for making this a reality. Operational risk can be mitigated by software architecture that is monitored, fault tolerant, and elastic. That is progress.

Submitted by Machteld Meijer (not verified) on Thu, 2011-03-10 19:20.

What is Ops?

Dear Mike, Dear Skep,

I would like to hear from the both of you what Ops is, from your point of view. Which activities does it include?
I've got the feeling that you are not talking about the same. To me this leads to confusion in this interesting discussion.

Regards, Machteld

Submitted by Mike Gualtieri (not verified) on Thu, 2011-03-10 23:23.

Ops primary function is keeping production systems running

Ops primary function is to keep operations systems running. That means monitoring, responding to incidents, and resolving them. In addition, ops is responsible physical datacenter or hosting, acquiring hardware, installing hardware and infrastructure software. For new or changes to applications, ops must get the requirememts and then are often responsible for figuring out how to get the infrastructure built. If it involves a data center that could be an entire construction management project or hire bruns-pak. Or, it could involve identifying new Routers that will boost performance.

Submitted by Charles T. Betz on Fri, 2011-03-11 13:26.

More heat than light

This argument has generated way more heat than light. There are so many different ways that IT functions can be aggregated that arguments like this are essentially religious and "not even wrong."

First, I can point to any number of IT books on my shelf that start by breaking down IT into three major conceptual functions, such as "Plan, Build, Run." Notice the existence of the "Plan" function, to mediate between "Dev" and "Ops." That's where we find things like portfolio management.

COBIT calls for four major conceptual areas, essentially Plan, Build, Run + 1:

Plan and Organize
Acquire and Implement
Deliver and Support
Monitor and Evaluate

Notice under Acquire and Implement:
- AI2 Acquire and Maintain Application Software
- AI3 Acquire and Maintain Technology Infrastructure

And under Deliver and Support:
- DS13 Manage Operations

Is AI3 the same as DS13? That seems to be the confusion. Delivery is different than engineering.

I examine these issues also here: Hosting Zone of Contention. Quote:

There is a tendency to see applications as being about development, while infrastructure is about support. This is because production support teams are often aligned with infrastructure organizations, while application support teams are aligned with development organizations. A more accurate picture is to see application and infrastructure as separate “tracks” in the value chain, with both crossing all the major value chain activities.

The basic terminological confusion lies in confusing "ops" with "infrastructure engineering" and "development" with "application management" (especially in its service delivery aspect). Whatever the layer of the stack, one needs to first engineer it, and secondly to run it. But the affinity of application management with software development and infrastructure engineering with operations should never be taken as a given and in larger more mature operations all four roles become increasingly well delineated:

1. Application development
2. Application service delivery
3. Infrastructure engineering
4. Infrastructure service delivery

And then sometimes, fully independent of 1 through 4, we have:
5. "True" operations in a very focused sense: staffing 24x7 ops centers, actually performing the eyes-on monitoring of a wide variety of event feeds and visualizations, providing level one support, serving as the enterprise IT nerve center and communications nexus, and driving the incident management processes (not problem, and sometimes not even change except as a consumer of change notifications). And no, this particular set of responsibilities never goes away.

Dev2Ops and the desire for NoOps really are more about driving the timeline for infrastructure engineering down (which is a fine thing) and also coming up with new models for code deployment and change management that allow more granular and faster release cycles. Neither of these objectives are the same as doing away with the true operations function. All that one winds up with is perhaps driving "Ops" into the application service delivery paradigm. But someone needs to take those pages, at least until developers start writing bug-free code. And that, by my definition, IS "Ops."

Charles T. Betz
http://www.erp4it.com

Submitted by John Worthingto... on Fri, 2011-03-11 02:18.

Time for IT

interesting tread and enjoyed seeing you guys mix it up.... my interest in DevOps was more about the potential for cultural change in IT than anything else.

as someone who manages the relationship with the Business and service level commitments, what is Ops to me? ...

a point in time... IT's either working the way I need it or IT isn't.

NoOps would mean noValue

John M. Worthington
MyServiceMonitor, LLC

Submitted by skeptic on Thu, 2011-03-10 19:32.

What IT Operations means

IT Operations can indeed mean many things. In the Dev/Ops split model, I believe it refers to everything to do with the operation of information technology, not the old goldfish-bowl view of Ops as the operators who change tapes and watch consoles.

So Operations is service management: risk, change, test, operational readiness and acceptance, release, deployment, IT infrastructure, availability, capacity, access security, threat security, problem, all the support disciplines, service level management...

There is of course a grey area: portfolio, demand, service design, information security, application lifecycle...

Submitted by skeptic on Thu, 2011-03-10 19:15.

specialisation

Several hundred thousand years ago, early humans began making flint tools and spending more time on them than they could afford if they were also going to hunt and gather their own food. Ever since, we have developed specialisation as one of the main engines of human progress. Now Agile posits a methodology dependent on polymaths: people who are expert in enterprise architecture, systems architecture, business requirements analysis, system design, software construction, testing AND now operational readiness, operational management and support as well. Wow. Good luck with that.

What will of course happen is that Agile teams will include Ops specialists. I guess we aren't allowed to call that "close collaboration" because what you really want is TAME Ops people. it is true that Ops has too often become "The Department of No". Wiping it off the face of the earth isn't going to make things better. This is Khmer Rouge thinking.

if Agile projects ever manage to get big enough to matter, (and that's a big "if" as I believe they'll collapse under their own unspecialised mass), we will then have islands of technology with inconsistent architectures, multiple redundant technologies, and inconsistent redundant operational infrastructures, processes and staff. It is hard to see this as a good thing. So those Ops people will reach out to each other and begin to standardise, eliminate waste, get central consistent policy and... recreate the Ops team.

I don't call Agile progress. I call it a Luddite attempt to take us back to our primitive unspecialised roots. A small group of hippies can make subsistence living look pretty good only if they exist within a larger advanced community. The basic premise of Agile is a good idea (but not a progressive new one): agility, tight coupling with the users, and iteration. The way it is applied as a political tool by Dev cowboys who can't handle constraint is not progress. It's anarchy.

The Cloud is indeed progress, and like any new technology there is a lot still to be discovered about the business risks. But like any outsourcing, you can't outsource governance, and you shouldn't outsource all management. Thinking that Cloud will automagically make all Ops considerations go away is nonsense, propounded only by those who don't understand what Ops do. I say it again: we manage IT risk on behalf of the business.