As teams, it’s essential that we not only design and develop our software, but also operate our software. In my earlier posts, I discussed some ideas around building teams so that they’re cross-functional and all-inclusive to help facilitate this notion. Something that I didn’t cover in those earlier posts was the “why” behind having teams not only write but also operate their software stack.
Perhaps it seems obvious as to why. After all, those who wrote the software have the most in-depth knowledge of the software they wrote. However, do these folks understand how their software operates? How all those design assumptions and interactions with the other parts of the system impact the overall system? For example, if service A calls service B and two different teams developed the two services, assumptions made by one group may not fit the call patterns expected by the other organization. An example of this that I’ve seen is service B, making an insert into a database using a transaction. If service A is calling this API hundreds of times per minute, an API that service B could expose instead is one that allows batching. This particular use case does not invalidate the use of the single call; however, understanding the overall system and its use cases prevent unpleasant surprises. Systems understanding also leads to designing systems that more usable.
When it comes to developing and operating your software, understanding your stack is as critical as understanding the code that you’ve written. At no time is this more evident than during a significant incident (think a pageable event with customer impact in the middle of the night). When an event like this wakes multiple teams up in the middle of the night, engineers should not be figuring out what dependencies exist and where the failure points may be. This type of event requires teams to be familiar with all aspects of their service, such as deployment activity, database, lambdas, interdependent services, frontend applications, etc. Taking it a step further, the on-call engineers should also understand particularly problematic dependencies, for instance, calls to a legacy system.
A multi-disciplinary and inclusive team helps address such areas. They have varied perspectives and think about different parts of the system. They round out a group’s knowledge and understanding. As explained in earlier posts, DevOps culture is not about creating a team. It’s about building an overall understanding of how your software works and operates.
In my previous post about the Dream of DevOps, I discussed my vision for how we achieve the true vision of DevOps. Namely how having an inclusive culture around software development where not only developers work in a silo, but rather include disciplines such as software testing, infrastructure, and build/deployment.
What DevOps is not…
Let me first go into what DevOps is not meant to be. It’s not meant to be a separate team. Unfortunately, I see many organizations falling into the trap of saying, “We need a DevOps team.” If the organization read The Phoenix Project or any number of other articles on the topic, they would quickly realize the fallacy of building a “DevOps team”. The whole point is that you want integrated project teams that solve customer problems.
How did we get to thinking we need “DevOps teams”? As eluded to in the prior blog post, I feel that this is because many organizations see this is a new engineering function. Since this new engineering function doesn’t fit within the context of the pre-existing functions (such as frontend or backend engineers), there’s a feeling that this needs to be a separate team, operating alongside other software teams.
Instead of treating DevOps as a new team, an organization needs to come to the understanding that this is a function of a team.
Some organizational styles
While many software engineering organizations may start off with the need for a new “DevOps team”, engineering leaders should begin to see the need to have this function be part of software delivery teams.
How does an organization begin to transform with respect to DevOps? An early way to begin making changes is creating a matrix organization. This is an organization where a software delivery team is comprised of engineers from a variety of different teams managed by multiple managers.
The matrix organization
A good example of this is a team delivering something like a web storefront. This application involves a web frontend for users to interact with, some backend for product listings, shopping cart and checkout, some infrastructure that the store will run on, some build and deployment pipelines, and perhaps even a mobile application. In this fictitious organization, let’s start with the frontend engineers having one manager, backend having another manager, mobile having yet another, and let’s say that the build/deployment and infrastructure are managed by someone else. In a true matrix organization, the project scope would be roughly identified and some seed of folks from each of these teams would begin to work together as a software delivery team. They would wholly understand the needs of the customer and work to build a solution meeting those needs across each of their areas of experience. This software delivery team would likely have a team lead, but ultimately not be managed by any of the managers of the individuals working on the team. This type of organization is shown in figure 1.
Figure 1: An example matrix organization, with project leads denoted in bold.
The hierarchical organization
Contrast this to a classic hierarchical organization where software delivery teams do not exist. Rather projects are defined and are worked on in silos. The backend team may be told that they need to deliver a way to list products and provide a shopping cart and checkout. The frontend team would be told that they can query the backend for a list of products. So on and so forth. In this manner, each team only gets a small glimpse of what they need to deliver without understanding the bigger picture. Designs of one team may be incompatible with the other’s leading to rework later. This type of organization is detailed in figure 2.
Figure 2: An example hierarchical organization where managers are the project leads.
The hybrid organization
A third way to organize is a hybrid of the two styles above. Let’s call it the hybrid matrix organization. In this organization, it generally follows a matrix organization. The key difference is that certain functions — typically “DevOps”, infrastructure, or testing — will remain under a manager with either dotted-line reporting into a project or working with an engineer in a project team that has some interest in taking on those skills. The idea of this is to allow this specialized team to remain intact while providing input into software delivery teams and grow interested engineers on those teams to take on the work the specialized team would have done in the past. Some teams may not see a need for input from this specialized team, in which case there would not by any liaison. Additionally, there may be engineers who do not work with any software delivery team and instead focus on building some shared tooling or have some specialized role that otherwise doesn’t lend itself to working within a software delivery team. This organizational model is outlined in figure 3.
Figure 3: An example of a hybrid organization, with engineers reporting to manager 3 working with liaisons from some software delivery teams.
Contrasting these organizations
Some may say that how we organize doesn’t matter since software eventually gets delivered in either organization. Some software will indeed be delivered in any of these organization styles. However, what will be impacted is how teams interact.
In a hierarchical organization, teams may play the blame game when dependencies between managers aren’t delivered in a timely and high-quality manner. Requests of one team on another for features or addressing defects may be delayed as this requires injecting the requested work into a different schedule that has different delivery priorities. Requests to infrastructure or test teams often come late and these teams are generally not consulted on the software implementation.
In a matrix organization, software delivery teams are empowered to work together on delivering a set of functionality on the same schedule. Feature requests and defects are a normal course of business and prioritized according to the priorities for that software delivery team. There are no requests for infrastructure or test teams as they are fully incorporated into the fold of the software delivery team.
A hybrid organization has some benefits of a matrix organization expect when it comes to that specialized team. If there is a liaison established, the partner in the software delivery team may be keen to take on the work of that specialized team, but they may also be overwhelmed with their normal responsibilities or have been selected without having any interest in being the liaison. This poses a new problem unique to this organization type. In the case where no liaison is established upfront, but the need arises later, similar problems to that of the hierarchical organization emerge. Additionally, there become questions about what other work that a specialized team is delivering.
Of the organizational models presented here, a true matrix organization offers the ability for software delivery teams to deliver software holistically.
What’s next?
While this is relevant to the previous post on the Dream of DevOps, this is the first of two parts on organizing for collaboration. This part focused on three common ways to incorporate DevOps, infrastructure, and test engineers into software delivery teams. The next part will look at some of the foundations of this including diving into organizational research and Agile management practices.
A teaser… what if teams were able to self-organize as the needs of the organization evolved?
By now, most folks have heard of the DevOps movement. This isn’t meant to be a post about what DevOps is or is not, however it is meant as an opinionated piece on building inclusive teams. There’s plenty of articles out there, some phenomenal books, and lots of great talks on what DevOps is. I must define how I see DevOps, and that is as the integrated approach to development, where all stakeholders of a software project work together in a coordinated effort to iterate on delivering a project. A full-stack delivery team if you will. Most importantly, what it is not is a bunch of siloed teams interacting through infrequent narrow scopes of interaction in hopes of delivering on a project.
Early in my career, when I was a software design engineer in test, the predominant style of software deliver was one where there would be many weeks of development time, followed by a shorter amount of time for testing and stabilization. This amounted to a milestone. For large software projects like Windows was (where I started my career), there would be 3 or more of these 3-4 month milestones, followed by a stabilization milestone to iron out all the issues of integrating all the moving pieces. From the perspective of software testing, this put a lot of pressure on software test teams to run fast and find bugs in those testing milestones. By default, this created an “us versus them” culture. You’d hear things like the classic “works on my machine” or the blame game of “why didn’t you find so and so bug” or “this isn’t implemented like the specification stated” (when specifications were available). Testing integration points between teams became even harder, as these were typically different organizations with even fewer communication points. If things worked at first glance, the types of bugs that exist were ones leading to performance issues as the interaction models between features were not well understood earlier. Additionally due to this silo creation, integrating code changes between teams was relatively infrequent, perhaps a handful of times per milestone.
One of the things that I, along with many other test leads, worked on was to partner with our development teams early in the development process. This meant that software testers were seen as equals at the table during the software development process. This led to many interesting improvements. For instance, testers were now helping in designing features. Some patent ideas in teams and related organizations that I worked in were generated by testers! Testers had a unique perspective as they had seen issues come in from customers via forums, conferences, and bug reporting systems. This voice was now heard. They also helped mold features so that features become more testable. Many testers also have a knack for breaking or exploiting software. This type of collaboration between development and test teams led to software that was designed better and was overall more reliable and secure.
In effect, this type of change was an early foray into Agile. Designs weren’t created in a fell swoop, they were iterated on within this team of developers and testers.
Jump to today. Largely software testing teams have been absorbed into development teams. In teams that I’ve led recently, every developer on the team is responsible for testing. There may be a few individuals within the team who have more of a knack for software testing than others, so they may pick up the lion share of test framework and coordination. However, these folks are in the same team, either reporting directly to the same manager or matrixed in. (Organization design will be an upcoming blog post.)
With the increase in focus on delivering software more quickly, DevOps became a hot topic. Often this is seen as paired with shifting from shipping “boxed” software to software as a service. When software made this transition to being provided as a service, it was clear that how we shipped had to change. Shipping software once per year wasn’t going to cut it. The software had to continually get better. Now that software was a service, infrastructure (e.g. how the software was run and operated) came to the forefront as well.
“DevOps Teams” with DevOps Engineers were born. These were folks who are responsible for building the ability to deliver software quickly. Cloud Native Engineers and Software Reliability Engineers/Systems Development Engineers were born out of these “DevOps Teams” to run and operate the software. To those paying attention to what the DevOps movement is, this isn’t supposed to be a siloed team. However, in the recent organizational understanding since this was a new function of engineering it had to be a team. Start seeing the parallels to treating software testing as a separate silo?
With DevOps and infrastructure, being treated as a silo, many of the same side effects that existed with test teams being a silo start to become evident. Take infrastructure that modern software runs on as an example. With infrastructure being looked at as an afterthought, just as testing was, how the software is run and operated takes a hit. In some cases, even the security and performance take a hit due to the workarounds needed to be put in place to operate the software per the original design. Looking at delivering software also takes a hit, though these are sometimes harder to spot. Issues here often come at the cost of build or deployment times, for instance not relying on pre-built artifacts or pulling in too many dependencies, or worse, thinking a monolith is a microservice.
How do we get out of these problems? We apply what we’re learned as software test teams have evolved. To build truly great software, that can be built and deployed quickly, run and be operated smoothly, and is keeping up with what the audience is looking for, we need integrated teams. These are teams where there is expertise in each of these areas. Adding in DevOps and Cloud Native engineers into software development teams ensures that building and deploying quickly as well as reliable infrastructure aren’t an afterthought. These require design alongside the software that’s being built. This is the DevOps dream realized.
In coming blog posts, I’ll cover some ideas on how to integrate these ideas into existing software organizations and how to better incorporate these concepts in the case of reorganizations.