DevOps Confessions
Dominica DeGrandis reads the DevOps Confessions.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
The next format is called DevOps Confessions.
This came from some amazing advice that we got from Dr. Richard Cook from the safety culture community. He said, "There are certain types of stories that you don't hear on stage. Instead, you hear them after the sessions, most likely at the bar after there's been a few drinks. And that's where real lessons are told, because great practice comes from experience, and experience comes from bad practice."
So in the program committee, we wanted to bring these stories that you might hear at the bar, but bring them here on stage, so that you don't have to be so lucky to be in the right place at the right time. And that's what we're calling the DevOps Confessions format.
We collect these stories that we've anonymized and want to share with you. So please welcome one of our program committee members, Dominica DeGrandis, who'll be reading one of these anonymized stories. Dominica.
Dominica DeGrandis
Hello.
So it is an honor to be here and to be the voice for this confessional. The willingness of this community to embrace the sharing part of DevOps is just amazing. And frankly, it's just really a relief to learn that others struggle, too.
And so here we go.
My Fortune 1000 organization had decided to become cloud-first. We only had a handful of folks who could spell cloud or even DevOps. In addition, we had a ton of work to deliver, including a brand new ERP, cloud integrations, event-driven components, and removing decades of middleware and other heirloom systems.
To make matters worse, there were at least ten other high-profile projects in play within the enterprise that had the potential to transform how our supply chain works, how it delivers products, and how it generates revenue.
I was freaking out given the sheer magnitude of cultural, automation, and tooling changes that would be required. As I reflected on my situation, I realized that I was also missing broad executive support for a DevOps transformation. In the week prior, one of our top executives asked me, "What's the root cause of the failure, and who should be held accountable for the interruption in core services?"
I felt like I was in the middle of a chicken-or-the-egg scenario. If DevOps is all about CALMS — culture, automation, lean, metrics, and sharing — how am I going to drive a DevOps transformation if we don't have that culture, or tradition of automation, lean, or a focus on metrics? Is culture an input or an output of DevOps? Can culture be shaped by working differently and devoting daily time to improvements?
Despite all the headwinds, I decided that we cannot be successful long term without leaning into first principles from Lean, DevOps, and continuous delivery.
Much of the journey has been completely organic, and at times it feels like pushing a big boulder uphill as I spend more time convincing teams of the benefits of loose coupling, a lean mindset, and continuous improvement.
On the best of days, I've been able to inspire teams to create more automated unit tests to increase coverage and safety of deployments. And on these days, I'm gaining allies and converts to new ways of working. These teams start to work differently, although they may struggle from time to time.
The bad days, oh my goodness, there have been some bad days. Immediately, the memories come flooding back.
We had written a prototype program that would remove unused resources in a non-production cloud sandbox, and everything was good for several months. We were saving money and keeping the environment clean. Then one day, the cloud sandbox was removed while the cron schedule remained. And we came in the next day to a huge dumpster fire because the original cleanup job and schedule had somehow jumped across to the production environment.
Hundreds of resources were deleted, or hundreds of resources such as web apps, functions, Service Bus subscriptions, and API management registrations were deleted. At first, we had no idea what was going on, what to do about it, or how to stop the bleeding. After a couple of hours, we figured out what had happened, but our reputations and relationships with our delivery teams took a hit.
Luckily, the job timed out and prevented more carnage, such as removing the production data lake. By the middle of the afternoon, many of the resources had been redeployed by their delivery teams, although a handful of items were never fully recovered.
This event made an indelible imprint on the organization, and it's driven a large body of continuous improvement work, such as separate accounts per environment and auditing pipelines for random blocks of code.
Our bad days have been opportunities to role model calmness and focus, and teaching us how to nudge the culture and the practices and the mindsets to the next target condition.
Ultimately, the new ERP system was implemented on time and with the new event-driven cloud integrations. Was it perfect? Absolutely not. But we have indeed started to work differently, and I do see signs of different culture, practices, and beliefs.
One of my favorite examples is a team that's been applying continuous improvement to the CI/CD delivery pipeline practices for Databricks and machine learning. And these outcomes, I believe, are going to serve us as a foundation for more continuous improvement and hopefully more impactful outcomes for the enterprise.
As a DevOps change agent, I seldom had everything to be successful. At any time, I needed more money, C-suite support, technology, experimentation, time, or open-minded teams.
But looking back, it's important to note that many of our breakthroughs came from thinking about the next best things that we could work on, rather than waiting for the moon and the stars to line up perfectly.
Thank you.