Product Management Meets DevOps

Log in to watch

Las Vegas 2018

Download slides

Product Management Meets DevOps

Brian Clark

VP Product Management · CSG

Scott Prugh

Chief Architect & VP Software Development · CSG

Evolving product management to embrace DevOps as a differentiator and a means toward continuous improvement across the product life cycle.

Brian Clark is a thought leader with over 20 years' experience in the Product Management space with a passion for driving innovation, growing revenue, developing talent and maximizing profitability. Brian currently leads ACP, which is CSG's market leading North American Cable BSS solution. Prior to joining CSG, Brian held several IT leadership positions at AT&T and EDS.

Scott Prugh, Chief Architect & VP Software Development. Scott supports the North America Development teams that deliver CSG?s hosted Billing & Customer Care Platform. Scott has broad experience across development and operations functions from startups to large enterprises. Scott is a Lean enthusiast and his mission is to help others learn and improve their environment to maximize value delivery to customers. Previously, Scott was CTO of Telution and built the core runtime and billing architecture for the COMx product suite. Scott lives in Chicago with his wife and 3 kids. In his spare time, he perfects pizza, enjoys wine and code.

Chapters

Full transcript

The complete talk, organized by section.

Scott Prugh

Gene, thank you very much, and thank all of you for the excitement and everything that we see around this. We're pretty excited to be back for the fifth year of all this. Pretty hard to follow the Nike presentation; my marketing budget isn't really quite the same, so we'll have to try to do our best.

One of the key themes you do see, as Gene alluded to in our presentation, and you'll see it in the other CSG presentation by Erica Morrison and Joe Wilson, you also see it in Topo Pal and Jamie's presentation from Capital One, and you saw it in Nike, is basically business leaders from across the organization coming together to help deliver better outcomes. Today, we're going to talk about that with product management, engineering, and applying DevOps techniques.

CSG in North America is about a 35-year-old, SaaS-based platform for customer care and billing, and it's the largest in the U.S. We have a pretty good heritage of serving most of the major cable providers. If you get a bill, you call a call center, you most likely use our software because we have about 60% of the U.S. market: 62 million subscribers, 150,000 call-center seats. The technology stack is everything from the mainframe to Node running on Linux to even components on AWS. We really span everything. If you think about that, you can apply these techniques to legacy technology. You don't have to have just cloud-based technology to apply these DevOps techniques.

I want to call out to Brian. Brian has been with CSG for 19 years. Brian has been the steward of this incredible platform, and CSG owes you just an amazing thanks for everything you have done. I owe you great thanks for, one, your partnership, and second, your incredible ability to continue to learn and experiment and try new things. We've tried some really great crazy stuff and delivered some incredible value. Thank you, Brian.

Brian Clark

Thank you.

Scott Prugh

Let's give Brian some applause.

I'm going to recap and catch us up to where we are today. We've told pieces of the story over the years, but this brings together a few quick snippets of where we've come. In 2012, we reorganized from having functional organizations - design organization, development, test organizations - into cross-functional product organizations. We applied Lean thinking. We looked at the organization and tried to remove queues in as many places as possible. We applied techniques we call the inverse Taylor and inverse Conway maneuvers to both change the organizational structure and change the technology so that teams could share a lot more and we didn't have very rigid APIs between our systems.

We put in a lot of continuous integration and continuous deployment. Test automation was a huge investment that we made across our legacy systems. Shared telemetry, so all the teams understood what was going on, also known as observability today. Batch size reduction: we reduced our batch size in half, halved our releases, and started putting in what we called shared operations teams. Instead of having separate operations teams for dev and QA and also for production, we started having one team that would run both environments.

This delivered us some great results. We reduced what we call our release impact by almost an order of magnitude. When we were putting releases into production, there would be a lot of impact and we'd spend weeks cleaning up. The picture we got out of this was this. This is one of my favorite pictures, and I've showed it before: these are our operations engineers deploying 15.1, which was 2015's first release, and they got to play video games. Prior to this, they'd be running around with their hair on fire, cleaning everything up. We'd have irate customers. This was a great tribute to the things we did with automated testing and repeatable deployments. They got to practice that release 70 times before it actually went into production, so when it went into production, quality was fantastic.

Going into 2016, we still struggled. I had this picture in my head of what was going on. We had the software engineering group on the left wanting more speed and better environments, and we had operations on the right saying, "Why can't we have stability? Every time we do something, we break it." The only thing these groups agreed on was that they both hated CRQs because they took too long to get in, large batches of work, and they broke everything. Mostly, the road to production was a precarious one. It was borne on the back of a lot of cross-organizational groups like change management, release management, our production operations team, and PMO, and all our customer wanted was to get high-quality features a lot faster.

In 2016, this was the crazy question that I posed, to quote the Nike presentation: why can't these folks be on the same team? Why can't we be on the same team, be fast, stable, and secure, and compete together? What we did was rearrange the teams and get rid of a separate production operations group and bring all those folks together on one team to both build and operate the product together.

We also strove for what I call unimodal IT, which is the opposite of bimodal. We would apply the same techniques to our heritage infrastructure as we would to our new systems. Things like continuous integration, continuous deployment, and automated testing aren't just reserved for new systems. We can do them across all of them. We did things like localized CAB. Instead of having a centralized CAB, we pushed the change to the teams because they're the ones who know best how to implement that change. We implemented things like support swarming, which John Hall will cover, I think, on Wednesday, which is a fantastic way to reduce your mean time to recover.

We inverted the effects of Brents. We had Brents all over the organization, so we surrounded them with teams, shared the work on the team, got things that they did into version control instead of them being a stovepipe for knowledge in these very specific areas. We consolidated backlogs, getting all of the work together in one backlog, added infrastructure automation, and started treating operations as an engineering problem. We basically applied more engineering and less duct tape at the end of the development process.

All this together gave us those great outcomes: reducing the release impact, reducing incidents in the environment by over 70%. But the real thing is then the benefits to the business in growth. We grew our subscriber base over that time 27%, and we also grew transactions on the platform over 400%. Those things would not have been possible without all the work we had done across both the development and operations lifecycle applying these DevOps techniques.

The three themes we'll hit on today are, first, that connecting people to your strategy and improving the product management and DevOps relationship yields better outcomes. The next is that IT cannot be separate from the business. We need to manage product value streams, not projects. And operations is an engineering and a product problem.

We'll look at some other metrics today. One is impact minutes, which is a proxy for time to recover. It's our service-level objective that we manage across the products, and we've shown significant improvements there, over 58%. Releasing on demand and starting to decouple the infrastructure and release features when they're done instead of waiting for big releases: we've improved that over 400%, going from less than 5% to 28% of our features. Finally, delighting our employees, improving our internal Net Promoter Score, the employee NPS, going from 4 to 20, which is a 400% improvement. Brian?

Brian Clark

Why do I, as a product manager, care about product management meets DevOps? The simple answer is the results. We're in the cable BSS space, and when you look at that market, we own over 60% of that market. It's very mature. Our customers are not growing very much. The CAGR for the industry we're in is less than 2%. When you look at the results we've achieved over the past three to four years, our CAGR is twice what our competitors are. We're doing twice as well as the market. We've added new clients. We've increased our Net Promoter Score. Our technical talent retention is greater than 95%. In a market where there's not a lot of growth, we're positioned to win more subscribers and more market share. I think the business results are, to me, the most important thing of the two of us coming together.

Scott and I, when we talked about this back in 2015, said we need to establish a practice that we call lean portfolio leadership inside of product management. The goal is simple: make our work visible, connect our people to our strategy, and drive engagement, value, and excitement. There are three parts to lean portfolio leadership. There's the portfolio component, there's the release component, and there's something we call transformational enablement.

I'm going to take you through our story, what we've done over the past three years. All you need is a couple of nuts like Scott and I with some ideas, and then we need to hire some really smart people, and they can get the work done. What I'm going to show you here is what our teams have been able to accomplish.

The first thing we did is we built a service catalog. You're probably wondering, what is that? Think about going to Starbucks and you order a latte and somebody fulfills that for you. That's a service. In our world, and your world too, we have hundreds of technical services. We built out a service catalog so we can get a single pane of glass, see these services, and know what to do with them. Our development teams were telling us, "Hey, we have a lot of this work, but it's not visible. Nobody knows what it is." We built out our catalog with 100 services. Jump forward to today, a little less than three years later, we have over 620 services that we've cataloged. Year to date, we've executed 53,000 service requests, and we're going to end the year at about 70,000. This is all work that was not visible to anybody in the company but the people doing the heavy lifting in the background.

Once we got this visibility, we were able to engage our executive leadership and say, we need to invest in optimizing this work because this is repeatable work that we could optimize. We've been on a path to optimize these services over the past couple of years. Two examples we've achieved: one is we optimized one service that freed up 5,000 hours a year of people's time. That's three people that can now do engineering work. Another service used to take 10 days to get fulfilled. We now fulfill that same day, and we gave it to our customers. Our customers are seeing us as thought leaders, and they're engaged more as well. In our theme of engagement and getting everybody excited, we had a little party at the end. We spent a few hundred dollars. We had some drinks, some food, got outside there in Omaha in a little breezeway, and celebrated this achievement a couple of years ago.

The next thing we did is we have work that's large project work, and we needed to make that visible, so we created something called The Taproom. It stands for technology and product. We do serve beer in there from time to time, but it primarily stands for technology and product. What we did is say, let's make all of our large pieces of work visible on a wall. What you see here is a wall, and the blue is security, red is our client, yellow is cloud, green is technical, and orange is product management. These are the top priorities in each category for our customer. If you want to know what we're working on, you come here on this wall. If it's not on this wall, we don't work on it.

Then we built out the team wall. We took all the 60 teams that deliver for us and lined them up to the epics. Now you can go in and see which teams are working on which epics and which teams are overloaded. We also built an intake column. A simple column. If the work is not on this column or on this board, we don't work on it. Now this has helped us when our executives come and say, "Hey, I have a new number one priority." Instead of saying no, we can say yes, and here's your trade-offs, and what would you like us to do? Because we don't want to say no. We want to say, yes, here are the options, and this board allows us to do that.

In The Taproom, just like the other things, after every PI, we have a little get-together. We have hamburgers or hot dogs. We tailgate in The Taproom. In fact, we have one coming up here in a few weeks. If you guys are in Omaha, you can join us.

The third thing we did is we said, this is really cool that we've started a service catalog and we've got our epics lined up, but how do we get this information out to our organization? We have over 1,500 people that work on ACP. We came up with this concept of tanks. We started with Shark Tank. I run Shark Tank every two weeks with our leadership team, and we go through priorities. Then we take that information and push it to our product owners in something we call Think Tank. Then we take that same information and push it to our scrum masters in Do Tank, because they're the doers of the work. Now we're connecting our priorities all the way down to the people doing the work, and we provide two-way communication back up to the executive team.

We then take the same information and share it with our business people in Biz Tank. We share it with the entire product management team in something we call Team Tank. We have something new that we started called Tiger Tank, where we allow our consulting people and our client-facing people to bring ideas in that we can feed into Shark Tank. Every one of these meetings is every two weeks. We're very big on cadence and synchronization, and it provides us a way to share that information across the organization.

The fourth thing we did is we said we don't want to just tell people to go to SharePoint or send out an email. We want to help people consume information differently and teach them how to fish, teach them how to come get the information. I have a little video here, not quite the quality of Nike, but it's something we put together to show you what kind of stuff we send out that gets our people engaged.

Video

Thank you. Thank you.

Brian Clark

I don't know about you guys, but I've watched that video a lot, and it makes me happy when I see it. It makes me engaged, it gets me excited, and it gets back to one of our themes about engaging our people, doing something differently so people feel like they're connected to our company. By the way, we did this with an iPhone and some free software that we downloaded from the internet. You don't have to have a big budget. You just have to be creative in what you do.

The other thing we've done is started to communicate differently, and that was an example of one. We create these cute little videos or five-minute segments. We sent this out a while back after one of our releases just to outline the value of the release. Within four or five days, we had over 500 people that watched the video from beginning to end. This is people that have already been exposed to the showcase. Now they're listening to the information again, and they know where to go back and find it. We taught them to fish and taught them how to consume our information.

We then moved on to something we call Capacity Insights. As with most organizations, we had this conflict between product and the technical groups and engineering groups. We would say, "Hey, can you guys do more work?" They would say, "No." We'd say, "How come?" They'd say, "We're busy." Then we'd say, "Well, prove it. You can do more work. I just saw somebody go to lunch. He's got free time. He can work on that project." We had this battle back and forth.

In our concept of making work visible, we created something we call Capacity Insights. We took all of our planned work, put it into a tool, and made it visible across the organization. That was so successful that we're taking all of our unplanned work and also putting that inside the tool to make it visible. Now that tension has gone down tremendously because everybody works together as a partner.

A couple of examples of what we've been able to do. First, there's a database, a reporting tool, which we use BI, and then we use this in planning. This is an example of our release. We're going to do 42,000 hours of work. That's the upper right-hand quadrant. In the bottom left, that's our target allocation, whether it's client work, security work, or technical work. The upper left quadrant shows you where we're at as we plan for the PI. Anytime we see something like a big black bar for service advancement that's way above, we can have that conversation with Scott and his team and say, "Hey, are we doing the right things?"

In the same exact tool, you can drill down to individual teams and individual skill sets. It's not a way to hold people accountable to what they think their capacity is. It's a tool to facilitate a conversation so we can get on the same page and say, "Why are you spending more time here versus here? Because this is our strategy that we shared with you in all the different tanks." They're able to answer that accordingly. This has been an awesome tool for reducing the noise and making the technical teams feel like their voices are heard when we get to the planning process.

The last thing we did is something we call work-life balance. We don't want our technical people working weekends. We want them to go to their kids' baseball games, football games, church rehearsals, anything that everybody else wants to go to. We were hearing a lot that our technical people are working 60, 70 hours a week, and we want to keep those people. They're valuable to us. As an organization, we made a commitment to take X percent of our capacity every PI and let the technical teams fix what their problems are. The only requirement was they make their work visible. We created a simple dashboard with acceptance criteria, and we can go look at what they're doing and say, yep, that makes sense, we're investing in the right place.

Some key stats just for this year: we invested over 15,000 hours across 31 teams. We had one team that reduced their pages by 25%. That's pretty work-life changing. We had another team that removed themselves completely from data setup. There were three people full-time doing that, so now those people can do more valuable work. We had another team that was part of a validation process, and they reduced their manual level of effort there by 90%. Just on these few things I picked up here, we freed up probably at least a half-dozen technical resources to go do automation or things that are more valuable to the organization than answering a page.

We also have themes for each of our PIs. There was Olympics, then we had a spring theme, and during summer we had a barbecue theme. We recognize the technical teams in these meetings. It's not just about feature functionality; it's about what the technical teams accomplished. Of course, our fearless leader over here, Scott, also participates. We're not sure exactly what he was doing there, but he was participating. In all seriousness, Scott is a great supporter of what we do. He's visible, he's seen, he participates, and that makes everybody else feel engaged.

What's our secret sauce? You've got Scott and I, and we've got some ideas and passion, but our secret sauce is no different than your secret sauce. It's our people. People willing to jump in and clean a whiteboard for a meeting if need be. People that like to get together and have fun, go out and play volleyball. People that want to engage with our business partners. Are we willing to go talk to the technical teams, go talk to the testers? Are we willing to be vulnerable and share our information and really tell our teams what's going on in the organization? Are we willing to participate? Lastly, are we willing to surround ourselves with a team of people that can make it happen?

I'd like to take a side note for just a second. Three of my team members are here. Can you guys stand up? There's Jill and Tara and Jason. Would you guys stand up? Give them a round of applause.

These guys are awesome. I love these guys. I love coming to work every day. We have fun, but none of this that you saw would happen without them. They're the ones who are creative. They make it happen.

The last thing I would say on a personal note is, why am I passionate about this topic? Obviously, you've seen the business results and the things that happen at work, but I've been at CSG, as Scott said, a long time, and CSG has been good to me. I want to make sure that 10 years from now, 20 years from now, when I'm gone, CSG is around, it's thriving, and the people coming up behind us live in an environment where they feel like transformation is the way it should be: engagement, excitement. That's hopefully the legacy that I leave or that we leave when we move on from this company. Thank you guys for listening to my story, and I'm going to turn it back over to Scott.

Scott Prugh

Thank you. We'll finish up with some examples of why you want to manage products versus projects. Mik Kersten is presenting on this later, and he has a fantastic book coming out on this. This is just the summary of the differences. I'll jump right to the bottom and make the statement, and I made this statement last year, that queues don't learn. Projects don't learn either. Projects are ephemeral. They go away. They're usually a lot of context switching across a matrix organization. Product-based teams stay in place, and they can learn and improve over time.

I want to take you through an example of how we used to manage work after we went through our DevOps transformation. We really had two lists of work. We had our business epics in blue, and then we had our IT projects in orange. What we would do is release our business work, and it would take dependencies on the development teams, which would then in turn take dependencies on the operations teams that would run the products. This cadence-based work would consume those resources. Then we would release project-based work on the other side, and that would collide. What would happen is either business epics would slow down or be delivered with poor quality, or the IT projects would languish and never get finished. We've had some that would languish for really long periods of time and never get completed. This collision of work in the enterprise was driving WIP up on the teams, and things weren't getting done.

It also tended that project-based work generally resulted in poorly engineered solutions, since it was get a bunch of folks together, grind something out with a lot of meetings, get it done, and move on. You generally had poorly engineered solutions, and I'll show you an example of that in a minute.

What we did in 2016 with reorganizing the teams, and then 2017 consolidating the work, looked like this: we treat every team as a product value stream. We have one list of work. Now we do cadence-based work release. We do that in two-week, four-week, six-week, and 12-week increments, depending on the size of the work and the complexity. But just because you do cadence-based work release doesn't mean you can't have different cadences, and it doesn't mean you can't release on demand, which is something we've been trying to improve quite a bit.

This allowed us to have this single list. Now we can go through and reprioritize. We can look at the capacity, as Brian said, with Capacity Insights, and then we release that work into the enterprise. We might decide to defer something like an AD migration or some large business epic because we just can't fit all the work in at that time. That may be important, but we have to look at capacity and say, "What's the most important thing now to the business that we have to do?" We might have to do the OS upgrade and PCI because that's coming up, and it allows us to do those types of things.

The other thing we do, and the Thoughtworks Technology Radar had this in Trial, is applying product management to internal platforms. I want to say that we fully adopted this, and I highly suggest that you look at this too for your internal platforms. We have product managers for platform and security. Fred is security, and Jason is platform. They are the stewards for those different products, and we treat those like products with an investment, and we prioritize. It's a great way to think about how you should do things in the enterprise.

The other picture is this, and I call this product value-stream management. Traditionally, when we thought about doing development, we did design and build. It looked like this: design, build, test, continuous integration, and artifact and release management. That's where that value stream ended. What we've done is go further and say, look, the run piece comes into this. Things like configuration management, service request management, telemetry and monitoring for your app, security, and incident response are all part of that product value stream. They're not things to be given to another group. They're part of all of that.

The cool thing you get from this is service improvement. Generally with ITIL or ITSM, service improvement was something often done outside of the service. This now integrates it in, and it's something that's done as part of the product every day when you look at that consolidated backlog.

I'll give you a specific example of how operations is really an engineering and product problem, and it's around our PCI. Once we started looking at these things as products and started looking at the capacity that was consumed, we noticed that our audit consumed a huge amount of time. It was about 20,000 hours a year. We were struggling with the audit and the build-spec process. In a perfect world, we would rebuild all servers from scratch with the CIS hardening specs and have the auditors come back when we're done.

The problem is that PCI is an ongoing process. It's something you really need to do every day, and it's also mandatory to continue processing credit cards. The most important thing here is the auditors are not forgiving, but the real thing is that the bad guys aren't forgiving either. The best thing the bad guys want is to find an exploit on one of your servers. We realized the requirements: we needed to create something that on a day-to-day basis would process all these audits. Carter McHugh, who is here somewhere, was the architect of this, and he presented a great presentation at ChefConf on what we did around this.

The tool we built is called ACT. It's an Asset Compliance Tracking tool, and it brings together Chef InSpec and a database for your build spec. I'm happy to announce today that we're going to be open sourcing that tool. It's actually available out there now. If you stop by the Chef booth, they've got stickers. They partnered with us. It runs all on AWS. You can deploy the code right from there, put it up on AWS, get your own instance, and run it. That's very exciting for us, and it was a lot of work from the teams to get that through, and also our lawyers.

The benefits are we went from compliance theater on the left, some 20,000 hours of work, and on the right, we had an 80% reduction. Our CISO, Doran Steineke, is here. He was a great sponsor in this. Thank you, Doran. This was really great for both organizations, security and development and operations, to come together to deliver.

The next thing is around work management visibility. This is a picture of what our work management systems look like. The different colored work items used to exist in different tracking systems in the enterprise, so we never had a full picture of all the work. One of the things we've done with product management is invest and unify and bring that all together. We brought everything together in Jira, so now we have that total view, and all the teams can look in Jira and get a dashboard. They understand what epics they're working on, the features, the stories. They understand where the test cases are, service request changes. Post-incident reviews, when we actually do a post-incident review, that work goes in there too, and we can see all of that in one place.

We wrote a great paper on this, "Overcoming Inefficiencies in Multiple Work Management Systems." It's out on the IT Revolution site. I highly recommend downloading it. It was a collaboration. I want to thank Dominica, her great book Making Work Visible, and Rosalind, Pat, and Keanen, who were collaborators on this. Thank you very much. It was a great piece of work.

The final thing on what's next: this is the picture of where we want to go. Impact Minutes, our service-level objective that we track by product, is right now trending to improve at about 50% a year. We want to continue that. How do we get 50% year-over-year improvements in the service-level objective? We're investing in things like system robustness, but also resilience in our people.

The other thing is release on demand. We used to be at less than 5% over a year ago. We're now at 28%. We'd like to get that up to 50%, but eventually we want to get rid of releases. We want to be out of this release process and just release a feature into production when it's ready. We've been investing in architecture, automation, and decoupling of the products. Finally, the employee NPS, we want that to continue going up. What Brian has done with lean portfolio leadership, connecting the people together, and continuing to invest in work-life balance, that helps our people.

The final thing is I want to thank a lot of people. I want to say I'm standing on the shoulders of giants, but it's more like these people carried me here. It really can't be done without this incredible community and mentors that I've had. The first is my good friend Mauricio, who unfortunately passed away in 2011 and did not get to see the rest of this journey. My boss Ken, my operations partner Steve, who really started on this journey, Erica, who's been fantastic to work with, and all the folks at CSG. I think there's 32 of you here. Thank you all for your incredible support.

And then all the other DevOps mentors that I've had. All of you have taken an interest in what we have done as a business. You text me when I'm having problems. You give me advice. These are things that are indispensable, and I really thank the community for their incredible support. Thank you.