General, Software Engineering

Understanding your stack

As teams, it’s essential that we not only design and develop our software, but also operate our software. In my earlier posts, I discussed some ideas around building teams so that they’re cross-functional and all-inclusive to help facilitate this notion. Something that I didn’t cover in those earlier posts was the “why” behind having teams not only write but also operate their software stack.

Perhaps it seems obvious as to why. After all, those who wrote the software have the most in-depth knowledge of the software they wrote. However, do these folks understand how their software operates? How all those design assumptions and interactions with the other parts of the system impact the overall system? For example, if service A calls service B and two different teams developed the two services, assumptions made by one group may not fit the call patterns expected by the other organization. An example of this that I’ve seen is service B, making an insert into a database using a transaction. If service A is calling this API hundreds of times per minute, an API that service B could expose instead is one that allows batching. This particular use case does not invalidate the use of the single call; however, understanding the overall system and its use cases prevent unpleasant surprises. Systems understanding also leads to designing systems that more usable.

When it comes to developing and operating your software, understanding your stack is as critical as understanding the code that you’ve written. At no time is this more evident than during a significant incident (think a pageable event with customer impact in the middle of the night). When an event like this wakes multiple teams up in the middle of the night, engineers should not be figuring out what dependencies exist and where the failure points may be. This type of event requires teams to be familiar with all aspects of their service, such as deployment activity, database, lambdas, interdependent services, frontend applications, etc. Taking it a step further, the on-call engineers should also understand particularly problematic dependencies, for instance, calls to a legacy system.

A multi-disciplinary and inclusive team helps address such areas. They have varied perspectives and think about different parts of the system. They round out a group’s knowledge and understanding. As explained in earlier posts, DevOps culture is not about creating a team. It’s about building an overall understanding of how your software works and operates.


Inclusive Design

In previous posts, I presented a viewpoint about how to organize teams for collaboration. In a way, the previously shown organization methods describe ways to promote inclusive design.

What exactly do I mean when I say “inclusive design”? Inclusive design is the process where a diverse set of folks provide feedback and share in the design process from the start. When I say diverse above, I don’t only mean others that don’t look like you or have the same characteristics as you; I also mean different disciplines.

Hearing the perspectives of various people during the design process sounds very obvious. However, in practice, teams often do not seek this sort of inclusive design culture. For the most part, there isn’t a deliberate statement to exclude someone. What ends up happening, though, is that there are several assumptions made about the problem or environment where the end product will be operating. Oh, and add on top of that a deadline that’s coming up quickly or is already past due.

Let’s reflect on an example. A developer must deliver new functionality on a website that allows configuring a unique setting that previously was immutable. This task sounds very simple. An initial approach the developer considers is merely adding a textbox under the application settings, restrict the textbox to the type of input expected (after all, they have heard about input sanitization), and write it to the database that already exists. Their immediate team reviews the design, the developer codes it up and submits the pull request, only to find that a subset of the tests broke and the git automation for secret management pages the whole team. What happened here? This “simple feature” didn’t turn out to be so “simple.” Were the folks specializing in testing consulted on how adding the textbox would affect automation? How about understanding the access patterns to the database? Were the designers asked to evaluate the usability of the configurability of the application? How about the accessibility of this new setting or handling input in different locales?

Perhaps the example is a bit contrived with some very cooked up cases. However, many reading this will relate to a problem that suffered because inclusive design principles weren’t applied initially. (In all fairness, though, the example isn’t all that contrived. I’ve seen each of those happen on separate instances!)

Here’s a call to action!

The next time that a design task lands with you and you think the design is complete, ask yourself whether you were genuinely inclusive in the design process. Did you leave your desk early in the design phase to talk to others across the team? Did you seek out those with testing, infrastructure, security, or user design experience? Did you try to understand the bigger picture of this task? Let’s make it easier for our software to use and operate. We owe it to our future selves.