Log in to watch

Log in or create a free account to watch this video.

Log in
Las Vegas 2020
Share
Download slides

Chasing the Unicorns at T-Mobile

Twelve hour outage bridges, worn out headphones, 90% unplanned work, and 25TB of randomly corrupted file systems were normal business for T-Mobile developer platforms. When the foundation of where software delivery happens is the bottleneck, throughput remains buried under a large pile of debt. Ripe for improvement, T-Mobile has begun to embrace DevOps principles including transparency, telemetry, post-mortems, and continuous experimentation to spark a turnaround of historic proportions.


Listen as Chris Hill, Senior Manager of Developer Platforms, walks through a journey capitalizing on T-Mobile culture and desire to create experiences customers love. The culture, otherwise know as "Team Magenta" lead to an appetite to change and now has teams achieving up to 30x throughput gains and decreased deployment pain.

Chapters

Full transcript

The complete talk, organized by section.

Chris Hill

Chris Hill: Good morning, Las Vegas DOES. My name is Chris Hill, and I'm from T-Mobile, and I run developer platforms.

I'd like to talk to you all today about chasing unicorns. A unicorn, in my book, is an enterprise that has mastered software delivery at scale. Here at T-Mobile, we are actively improving every single day so that we can eventually get to that benchmark of what it means to be a unicorn and deliver value at scale to our customers.

I'm part of a group called Product and Technology, and my space primarily has been in the developer platform arena. I'm highly passionate about this area, and I'm really excited to talk to everyone about creating a developer experience that we can all be proud of within a very large enterprise.

This is what I'm going to talk about today. First, I'm going to talk a little bit about why it makes sense to invest in developer experience. Then I'll talk about transformation fatigue and what it means to have multiple conflicting transformations all done at the same time, and how we may actually be diluting the impact of the underlying spirit of the change we're trying to make. I'll also go into how, in the last two years, we've transitioned out of the chaos domain from a developer platform perspective, and then I'll go over the lessons learned.

Now, I'm going to start us off in an area that I think a lot of us don't really like to talk about, and that's onboarding. This is where I feel most developers first lose their initial spurt of motivation. What I'm referring to is inheriting a software project.

I have inherited many software projects throughout my career, and every one of them feels like I stepped in the middle of an IKEA build cycle, and all the parts were missing, and there are no instructions, and there's no support line, and all the screws are stripped, and I have pressure that I should come out with my first feature next week.

By stumbling into a lot of different software projects that now became my own, I felt firsthand the experience of frustration trying to get up to speed to be effective and contribute value to existing software projects.

I've spent sometimes up to a week, two weeks, just looking for documentation on how to actually get access to the code or get access to the environments, and sometimes I'm left with a series of tickets that I have to raise. As we all know, in a corporation, tickets have SLAs. Tickets usually have an approval workflow. And if I'm really lucky, within that approval workflow, every one of those approvers is actually in the office during that SLA window.

If the first thing in terms of value contribution I make to the business is to go around and disrupt every one of the approvers and say, "Hey, would you mind doing this for me?" I may not be contributing as much value, and I may not have as much motivation in the longer term.

One thing that I've noticed is if you start at a 10 out of 10 in terms of motivation, and your first experience is something similar to what I've just described, before you even look at the code, you may already be at a three. I honestly don't understand why most developers don't run for the hills. I think secretly every time I thought, "No, it's definitely going to get better."

Then by the time I actually got access to the code, I realized that it's just worse than I thought. Not only is it worse because I don't understand any of this code that I didn't write, but I also don't fully understand the value stream and all of the fragmented tools that I have to use just to get the end fulfillment. I hear that you're asking me for a feature. I don't really know how to get a feature into your hands, but let me try and piece this value stream toolset together to figure out how I get you what you want. And if the change muscle wasn't something that had been flexed very often, I've got an uphill battle. This can be extremely frustrating and extremely demotivating.

Honestly, at T-Mobile, customer experience is in our blood. Developer experience has just recently come into our blood. Ultimately, when we think about a T-Mobile customer, we think about a subscriber, and we think about that subscriber telling their friends that they should subscribe, or keeping their subscription longer than they originally intended, driving higher revenues for T-Mobile. Better customer experience equates to a better subscriber experience and long-term growth for T-Mobile.

We also see the same behavior from a developer experience, and the end results are higher throughput, more innovation, more creativity, higher retention. The more investment you make to keep that motivation up from the beginning, the happier your devs will be.

In order to reconcile that, I've asked myself the question: why does this make sense? Why does something like a developer experience equate to results that I can reconcile? There are a couple of ways that I think I've been able to capture this in my head. One is that there's less cognitive load for context switches. It essentially isn't challenging for me to get that value delivery and have that personal fulfillment that I've delivered something that makes my customers happy.

I also feel that there's less wait time within the value stream and less people that have to get involved with a change that I need to make. We've done calculations to show that if we take all of our CI/CD jobs that are done on a daily basis and we save one second on average across all of them, it's like we just hired a brand new full-time employee. Just recently, we've been able to shave off four or five seconds off of every single CI/CD job. This means essentially we just hired a small team.

What I also think is important here about the experience is, are you empowering rather than impeding? And if you're empowering, is it leading to the results that you care about? That could be more creativity, it could be higher quality, it could be faster.

One big assumption that I make in this reconciliation of experience is that if you're changing the way that you conduct your business or your value stream operates, do you have the confidence of your customers? Ultimately, I think out of the pool of every customer, we've all experienced this idea where change and transformation really equate to this fear of loss. "Well, I've been told this before. I've been told it's going to get better the last time we moved. How is this time going to be different? What is it that you can say to me this time that will change the result? What we already have works. We've already made it work for us." If you're not truly invested in earning the confidence, especially for your shared services, are you doing the best thing for your company? Or is the service actually adequate for where your business needs to go?

I've been with T-Mobile for about two years driving this initiative, and I've always asked myself, if we figured out that the developer experience was valuable a long time ago, would we be further along? I've come to the conclusion that it's a lot more complicated than that. There's no really good answer to that. But what I've discovered is that you can't just take a unicorn's playbook and become the unicorn overnight. This always fascinated me: groups of thousands of people could all know the ideal and right way to do things, and that the big tax to getting to the ideal isn't just knowing what is right and what you're doing is wrong. It's how to actually get there and the journey and the legacy debt and everything holding you back from making that change.

I drew a little picture here. Don't make fun of me; I don't really have good drawing. But in this picture, I show a life cycle of where I've seen transformations within these organizations end up at. I see a new and shiny. Everyone loves this new and shiny on the top. Everyone's excited about using something, and it gets adopted or it fails fast. It doesn't scale. It's the wrong timing. There are too many other things going on. And it ends there.

If it does get adoption, then typically the next step is either centralization or governance. If this is going to be feasible in our working environment, how do we productionize? How do we make this accommodate for all of our use cases? Now, during governance, we may drive the relevancy of our service or our platform or a product or a change completely into this unusable state, and sometimes it never dies. In fact, it just adds on to the pile of debt of existing services and softwares and products that we started last year and the year before that, and it just becomes ongoing debt, almost like a house of cards. Or it becomes too constrained that our customers start to resent us and start to detract and essentially know that, well, we could do better. This is no longer relevant for us. Then we start the new shiny cycle all over again. Ultimately, when we start this new shiny cycle, we probably haven't deprecated the previous thing. I see this all the time from a change perspective.

Change is much more than just following this graph. There are a lot of things that are included with change in an enterprise. You've got people to convince, you've got funding to earn, and that's hard to earn. You have legacy systems to make sure that they're running and satisfying your existing customers. You have to integrate with these legacy systems. You have to take anti-patterns head on and go, "This is an anti-pattern. This behavior needs to be broken." Sometimes you have architectures to rip apart, firewall changes to make, policies to challenge, cultures to evolve. Your culture may not even be in a position where it can immediately move to what you're trying to transform to. And you also have unplanned work to compete with. If the department doing your change or transformation has 90% of their work completely unplanned, then they don't have room to have a thought on how it could be better.

When I first joined T-Mobile, roughly about two years ago, I took on a department that was currently in that chaos sort of realm. In fact, we had 10-hour, 12-hour bridges every single day. I remember them because I went through multiple pairs of headphones, because all of them would start to fray, almost like I had a haircut every day, of those little black pad shards.

Ultimately, I really channeled John Allspaw, the ex-CTO of Etsy, when he said, "Incidents are unplanned investments." I took every 10-hour and 12-hour incident, and I really understood: well, not only can this help us figure out what we're doing wrong with legacy, but this helps us figure out what to do in the future. Usually the communication to our customers could come out in the form of: "I really apologize for this experience. This isn't ideal, but this is what we're doing today to make it better in the future. I understand I don't have a silver bullet for you, but what I do have is how we're incrementally changing so that we can serve you better."

Transformation, I feel, is intended to be fruitful for all, but a lot of times it's painful for some, and it's uncomfortable usually for most. You find comfort in the ways that you were working before because it's known. This is my fourth industry doing digital transformation. I started in semiconductors, then I went to retail, then I went to automotive, and now in telecom. I keep thinking one day, maybe it's going to get easier. Maybe I'm really naive. It never is easier, and who am I really kidding? Every enterprise has their fair share of legacy debt that is holding them back from how to get to that ideal unicorn status, but usually can acknowledge that it could be much better.

It's almost as if you're at the bottom of a hole trying to claw your way out, and the bottom is just this legacy quicksand, just continues to pull you down. I can't tell you how many times I get faced with the decision-making of: do we invest in legacy or do we invest in new? Do we have to invest in legacy or can we invest in new?

Honestly, there's hope for getting out of that hole. Here are some things that worked for us. We turned the unplanned into planned. One of the ways we did that is we took every incident into consideration and got a full return on how we should conduct ourselves as a business so that we don't end up in a series of judgment calls which lead to that incident. That way we can plan and prioritize.

We also made all of our work visible. We've all read the book Making Work Visible by Dominica DeGrandis. If you don't see the activity and the volume of your work in progress, you're harming your ability to make a justifiable analysis. We also had a formal acknowledgment of debt and preemptively set the stage to all of our customers that said, "Look, we have all this debt. We're not going to be able to provide as good of an experience as you would like. Here are some steps we're going to take that are going to be disruptive in the short term that will help us in the long term." It was almost preemptively setting the stage of: come along with us on this journey. We want to make it better.

We also built in the discipline to how we operated. It's really easy when you're operating in chaos not to have a formal runbook, not to bring a buddy with you, not to establish rollbacks, not to estimate how long a change is going to make. As you transition out of chaos, you have to build in this discipline, otherwise you're never going to get out.

I also think we changed the way that we think. Anytime that we showed up to an incident and we knew the change that potentially caused the incident could be backed out, we'd choose back out versus the fail-forward mentality. I can't tell you how many calls I've been on where that is a debate. We could try this patch, and we think that it'll be done in three hours, two hours past our downtime window, or we could just back out and get to a known state in a couple of minutes. Just back out. If you're going to measure the incidents and you're going to use them to learn, you have to be able to praise the flawless execution.

Now you can transition to finding the right questions that will pull you out of this hole. The questions are: do we know what good looks like? Do we actually know, if we were doing this right, what right looks like? How can we measure ourselves for our own success? Where are our bottlenecks? Eliyahu Goldratt says there can only be one bottleneck constraining your system. Well, what is that one bottleneck? Can we not find it because we don't measure it? Well, measuring is now our new bottleneck. Also, consider what standards should be enforced and what standards should remain flexible, but always be open to refinement. And ask yourselves: are we impeding or are we empowering? Do we have a community of support? Do we have customers that believe in us?

I think you transition into: now that we know we're asking the right questions, can we find the right solutions to progress our journey or our transformation? There are a couple of solutions that come to my mind. One is define and refine the best practices that you can control, and then simultaneously challenge the ones that you can't for more refinement. It's important to think ideal and unicorn, but it's also important to think iterations, and here's how we're going to get there, and here's how I will evangelize the support to do that.

Also understand that if you're going to make large-scale directional movements, huge toolset changes, huge pattern changes, organization changes, you only have small windows of opportunity to be able to do that. Take advantage of those small windows of opportunity. Make sure you treat any feedback like gold. I scour through all of our NPS surveys to find the bad feedback. I love the bad feedback because it instantly gets me in a position where I can take the seat and be empathetic with the individual and really harness the frustration on how I can fix that and make it a better experience so that they never end up in that position, or the future developer won't end up in that position.

I think it's really important to also factor in the cognitive load you put on a developer or your customers on an ongoing basis and ensure that they have less context switches and that they're using path-to-production toolsets that focus on throughput. One of those examples is the underpinning of our developer platform using GitLab, which happens to combine CI/CD with source control management. If they are so closely involved with the path to production, why not make them the same tool?

Along the same sort of context is this idea of core versus context. What is core to your business happens to be what makes you special. But what is context is what somebody else is good at making special. We took this to heart, and with the underpinning of GitLab, we chose to allow GitLab to host their software in a SaaS for us. T-Mobile shouldn't be able to run GitLab better than they can, and if they can, I think GitLab's in a lot of trouble.

But the implementation of how T-Mobile spins GitLab and how it uses it, and how it maintains its internal network and internal automation sharing and templatization, and maintaining ecosystem and policy and compliance, all of that stuff can remain core to our business. It may potentially move to context, but if we're constantly aware of what feels like context versus core, we have to be able to make future-driven decisions that ensure that we're focusing on what makes us special.

Now, there are some lessons learned, and some of them already came out. One of the questions I really like to challenge almost everyone that brings a problem statement to me is: is this the best thing for your team or is this the best thing for the enterprise? If there's a unique scenario that for some reason a platform doesn't accommodate for, I've always asked the question: if you are able to contribute to the platform and accommodate for this unique scenario, do you think anyone else in the entire enterprise may have a similar scenario that could benefit from that change? The answer is almost always yes. What turned into almost resentment actually turned into empowerment. Well, if you want to change how it works, belong to this community and ensure that the control is democratized throughout your entire business.

One of the things that I mention a lot is that it's not about what I say goes or doesn't go. It's collective ownership through an entire community of what it means to be a software developer in T-Mobile. Then you're empowered to make decisions based off of a much larger scale rather than isolated to one team.

I mentioned transformation fatigue before. There is a passive diluting that can happen to your transformation if you have many simultaneous transformations all happening at the same time. If you're not cognizant of what transformation is most important to your business, just the idea of having more simultaneous ones will dilute the important ones.

I also think it's really important to focus on what constraints cannot move and which ones can. Eliyahu Goldratt mentions in his book: subordinate the constraints and use his framework to really determine how you can turn a constraint to your advantage rather than the current disadvantage that is currently in front of you.

I also think it's important to obtain adoption by unlocking the passion. If your transformation is consistently hitting a brick wall, maybe there's a reason for that. If you can't natively or organically create adoption and create a value statement, maybe there's a reason for that. Maybe it is timing. Maybe it's a perfect change, but the timing's wrong. Or there's transformation fatigue. Or maybe it's not the right solution.

Now, a lot of you may know we just merged with Sprint, and we're all really excited to have Sprint as part of the family. What this has generated is a challenge for us to make two companies into one and to operate as one, and to extend everything that we've learned from a developer capacity perspective and delivering enterprise software at scale from two companies and turn it into high economies of scale community between both groups. What this means is there's a lot of opportunity within T-Mobile, and I would encourage anyone who wants to be a part of our journey to visit our career site.

That's all I have for today. I really appreciate everyone's time. I appreciate everyone for listening and realize that this conference comes at a challenging point when, virtually, it creates a lot of strain on what the conference has primarily been about. Now, if you've watched how IT Revolution has been able to gracefully transition into temporary workarounds, it has been a very heroic effort, and I want everyone to acknowledge the fact that putting on this DOES conference in this sort of format, and changing this quickly, is very impressive. So I appreciate the IT Revolution staff. I appreciate everyone who has listened to me, and would love to hear about challenges that you have. Thank you so much.