Can game theory help break down DevOps silos in your organization?

High- performance organizations have some key characteristics in common. One, they leverage complex systems. Two, they mitigate the challenges of that complexity with automation. But organizational often silos hinder the collaboration, shared responsibility, and innovation necessary to implement effective automation and instill best practices. In my opinion, game theory tells us how to break down these silos by shifting teams’ motivations and dynamics. Change the culture, and you can change the game from being zero-sum to one with mutually-beneficial wins across the board.

DevOps happens to be a perfect example. Organizations have converged on a strategy of directing Dev and Ops teams to each automate whatever they can. However, each team encounters issues and barriers that require mutual communication and collaboration to resolve. Whether it be due to oversized demands on shared resources or time, on the J-curve of transformation, these teams get stuck in the pit and can’t get out while at odds and defending their own siloed interests.

Siloed groups within an organization exist in a state of contention when the risk of doing otherwise is high. This state reflects the game theory concept called the “prisoner’s dilemma:” a scenario where players acting out of rational self-interest have an incentive toward an adverse outcome.

The dilemma is also familiar in popular culture: the British game show Golden Balls pitted two players against each other for prize money. Each had two choices: select a “split” ball to split the cash potentially, or a “steal” ball to try and take it all. If both players choose split, they split the reward. If just one player chooses to steal, they successfully take the cash. If both choose to steal, then they both walk away with nothing. In this situation, each player has the incentive to steal. The show’s designers counted on this: their motivation was not to pay out the prize every time. In most cases, both players tried to steal, and no one got anything.

Switch the environment from a game show hall to a business, and game theory is probably all-too-familiar for those of us in DevOps. We’re in a daily situation where our needs and demands are often at odds, and where we’re too often butting heads. Collaborating – and getting better results – requires changing the game. A brilliant example of such a shift occurred on the Golden Balls game show, which effectively broke the game. A player came on who guaranteed they would steal no matter what, but said they would split the money after the show. The other player realized the only chance of winning anything was to choose “split” and go along with the plan. The first player chose “split” as well. Other players repeated this strategy until the show was canceled: it became more expensive and made for bad TV.

Successful organizations have mimicked this shift, applying this theory by putting both developers and operations teams on-call together and, hence, creating shared responsibility (and reward) for wielding software.

Today, high-performance organizations embrace DevOps by eliminating manual tasks and freeing engineers to work more productively. Out of the DevOps movement has produced a shift towards Site Reliability Engineering (SRE), ensuring that the software deployed is reliable and highly available.

Along with the rise of SRE comes a focus on observability. Where services have long been a black box, solutions like distributed tracing bring clarity to their inner workings. In reality, organizations seldom trust their staging environments. It’s simply not possible to emulate the wild dynamism of production microservices environments in staging. Observability solutions fill the need to get information out of production systems. They enable engineers with the context, status, and expectations needed to understand application communications. Better yet, observability offers critical visibility into all the various systems at work to make a solution operational. It shifts the focus from narrow to holistic.

Distributed tracing consumes and makes sense of signals from applications. Engineers can know how many times a system is retrying or backing off. They can understand information varying from request parameters, the query statements affecting a service, the timing of issues, and top-level exceptions. Distributed tracing also makes dependency graphs possible, presenting real insight into immensely complex architectures where thousands of microservices are always communicating. Engineers can visualize whenever things start going wrong, from service issues to slowdowns. They can also recognize when they’re fixed and move on efficiently. The prisoners of our dilemma thus become co-collaborators in their mindset and their work.

Bringing this back to how observability enables DevOps culture and changing the game to disarm silos, this focus promotes empathy through clearer understanding, transparency, and communication of our systems.

The Westrum culture model postulates three cultural models in IT organizations: pathological, bureaucratic, and generative. The generative model – the good one if your goal of fostering collaboration and the best result in any prisoner’s dilemma – is defined by the following attributes (pathological cultures oppose these values, bureaucratic is in the middle):

Information is actively sought.

There’s cooperation in sharing information within the organization.

Messengers aren’t shot.

They’re rewarded for bringing news forward that helps better the organization. Responsibilities are shared.

Cross-functional collaboration is celebrated and rewarded.

Bridging is always encouraged.

Retrospectives are constructive.

Failure causes inquiry at the organization, and that knowledge is shared.

New ideas are welcomed.

Novelty is celebrated and rewarded.

A common theme across all of these points is the celebration, reward, and positive reinforcement of behaviors beneficial to the organization.

As cultural goals, including achievements and influences, the realization of DevOps, SRE, and observability deliver value beyond their immediate utility.

Gene Kim said that “DevOps allows small teams to achieve great things” – I want to add that observability enables those small teams to be productive while doing those things. Observability empowers engineers to find and fix problems. It lets organizations understand their applications’ behavior and increase the shared value of that understanding across silos.

Ultimately, it invites organizational cultures that build more reliable systems and ship software faster.

Kevin Crawley is a Developer Advocate at Containous

Featured