Utilizing Distributed Dojos to Transform Our Workforce

Log in to watch

San Francisco 2016

Utilizing Distributed Dojos to Transform Our Workforce

Sr. Manager Card Technology Advanced Engineering · Capital One

Aimee Bechtle of Capital One’s Card Technology Advanced Engineering team will share how they have utilized Distributed Dojos to transform to a workforce skilled in DevOpsSec, public cloud and automation.

Their Distributed Dojo strategy was formed when they needed to quickly and efficiently meet the challenges of a large cloud migration but were limited by local resources. Reaching out to a prominent retail chain they learned how draw from their engineering talent to form short-term, highly focused delivery teams. These teams now work cohesively across multiple locations to solve the challenges introduced when migrating such a large-scale, complex infrastructure to the cloud.

They will explain how within weeks several Dojo teams were formed and releasing automation that not only supported Card Technology’s DevOpsSec and cloud mission, but provided associates with new skills that could be proliferated throughout the company.

Chapters

Full transcript

The complete talk, organized by section.

Aimee Bechtle

Thank you for joining me today. I am so excited to be here, and I'm so excited to share how Capital One is utilizing distributed dojos to upskill our workforce in public cloud and DevOps.

Woo. Woo.

Oh, I love you guys already.

So most of you know Capital One as a credit card company. I am going to repeat a little bit of what Topo said this morning. And it's a natural thing to think about Capital One, because we are one of the nation's largest credit card companies, with over 70 million accounts.

What many people don't know is that Capital One is a founder-led, 20-year-old technology company, and we are the nation's largest digital bank.

I'm going to quote Rob Alexander from AWS re:Invent. He's our CIO, and in 2015 he said, "If we're going to continue to win at banking, we have to model ourselves on the best technology companies out there."

So we are investing massively to transform our technology platform. And in 2015, our executive leadership made a bold decision to fully migrate to the public cloud and be out of our data centers by 2020.

Capital One is embracing open source. We not only utilize external open source products in our development process, but we also have an internal open source process where we share and reuse code. All of our applications are built using microservices and REST APIs. We are a 100% agile shop, and we are doubling down on our transformation to DevOps.

Within Card Technology, we are working with hundreds of applications, thousands of engineers, and we have a very large, complex, and heavily regulated infrastructure.

So with that in mind, in 2016, our Card Tech leadership, and I refer to the organization that I work in as Card Tech, challenged us with the goal to accelerate this technology transformation. Sort of a tall order. My heart beats a little bit.

And we are to do this without degrading our customer experience and to be doing this seamlessly. We are going to grow resiliently. We are going to maintain our uptime and availability requirements. We're going to be sustainably well-managed, so we are going to contain costs and either improve performance, but there'll be no degradation, and we will continue to sustain the quality that our customers are used to.

Security is the top priority. Our customer data must be protected, and our perimeter and our network as well.

So with an order like this, we have to have a plan, and we have to manage this risk. How are we going to do that? We're going to do two major things.

First, we are going to invest in our talent. We are insourcing. We are decreasing our contractor presence. We are upping our training opportunities. But most importantly, we are investing in our existing talent to get them skilled in DevOps and the cloud. And we are supporting them by establishing an enabling organization that helps with the transformation and promotes the DevOps and cloud best practices.

And this is where my story begins.

After the decision to migrate to the cloud was made, Credit Card Technology, the leadership responded by establishing the Card Tech Advanced Engineering organization. We all call it CTAE. I am joined by several of my colleagues today.

Yeah. Yeah.

And we have a beautiful logo. Anyways, I joined Card Tech back in December of 2015. Went on the interview. I have a large family. I showed them my credit card statement, said I would switch when they saw the balance, and I got the job. No, I'm just kidding.

No, but seriously, I started in December. I was one of the founding members. I immediately started on working on the best practices. That's after I traded my blouses in for some hoodies and my slacks in for some jeans, because we do have a startup culture, and we hit the ground running.

We started by defining the DevOps best practices that we wanted to start disseminating and perpetuating in the application teams.

The engineers in CTAE were all very passionate about solving software delivery problems and operations problems through automation. We love automation. Within Capital One, we are on a mission to fully embrace DevOps with self-sufficient teams. We want to enable those self-sufficient teams through the dissemination of best practices for cloud and DevOps, powered by that automation.

We experienced the early and expected challenges when rapidly moving such a large and complex infrastructure to the cloud. In collaboration with our enterprise cloud team, we began our DevOps initiative with early automation aimed at solving these challenges.

First is tagging standards. We needed to enforce compliance to tagging so that we could accurately track costs and manage our resources. We automated the interrogation of those tags, and if there were violations of the standards, we notified the resource owner, and we indicated an X amount of time before that resource would be terminated.

Dev instance utilization. We shut down all of our dev instances at night, at 7:00 PM till 7:00 AM. It was a proactive move to mitigate the exponentially rising cost from the underutilized resources. This automation alone is going to save Capital One over $3 million a year.

IP address consumption. We started monitoring IP address utilization as a result of potentially over-allocated subnets.

Resource encryption. Encrypting S3 buckets, EBS volumes, and their data. Again, like I said, security is so important to us. We have encrypted thousands of instances.

So we were taking big leaps forward in automation and solving these challenges, but we're also taking small steps back, and it seemed like with each new problem we solved, we introduced a new one or a new opportunity.

So as we are creating work for ourselves, as I like to say, we were also, and I was developing the best practices within CTAE, our enterprise cloud governance and our InfoSec organizations were releasing policies and procedures at a rapid pace as well.

In our CTAE automation world, that was, "Oh, automate, automate." We saw lots of great automation candidates, and we started to develop a backlog. We harnessed those best practices, policies, and procedures, and the new opportunities we were creating for ourselves, and we put them into a database or data store to establish our intent.

I want to add that we added into our intent some of the new opportunities or examples. In the dev instance utilization, when the instances came back up in the morning at 7:00 AM, it revealed an immature startup and auto-configuration process in the scripts, and we were able to react to that quickly and remediate that and drive up the maturity of our auto-config and provisioning process.

The IP address consumption in our automation: we were using Lambda functions, and a lot of teams are starting to use Lambda functions. And the irony being that Lambda functions were consuming a lot of the IP addresses that we were trying to control.

So our backlog was growing, and our headcount was not. Like many organizations, we're all competing for DevOps and cloud engineers, and the supply does not equal the demand. So we needed to rethink our approach and do some out-of-the-box problem-solving so that we can continue to support our mission.

Many of us had heard of the Target Dojo model from attending conferences such as this, and our leadership reached out to Target. I'm afraid everybody's going to call Target after this. So sorry.

Our leadership reached out. They flew to Minneapolis, and Target showed how they implemented and operated their dojos. Our leadership returned, and our leadership decided they wanted to implement our version of distributed dojos across our three locations.

So we started by forming three teams, one in Chicago, one in Richmond, where we were founded, and one in McLean, where we are now headquartered. Each team was staffed with a product owner, a scrum master, and two or three core engineers that were already skilled in DevOps and cloud. We call them our masters.

We reached out into our systems teams who are already doing a lot of the infrastructure work but may not have these skill sets and asked for volunteers, and we got quite a few. And we made sure we didn't exceed our two-pizza rule limit in forming our teams. We have a general rule that if we have to order more than two pizzas, the team is too large.

We divided up the intent way more fairly than this pie chart shows, by the way. And each team sort of staked out what they were interested in. And then everybody flew in in the middle of April of last year, and we held our first eight-hour program increment. That was the first in establishing our eight-week cadence that we exercise within the dojos. They spent all day grooming the epics at a high level and brainstorming on how we were going to move forward with this effort.

We have shared ceremonies and some shared ceremonies with our cadence. So we do share PI planning every eight weeks. We share a mid-PI planning, which is shorter, or I think it's what, a half day, few hours? And then we share sprint demos at the end of each of our two-week sprints. But the teams themselves do standups locally, as well as do their story grooming.

We have completed four eight-week program increments utilizing the dojo and are planning on its indefinite continuation. Actually, they're returning to Target, and they're doing a check-in with them. It has been very successful.

We have a varying level of skill sets when they come into the dojo. They might come in with no programming. They might come in with some or some scripting, or they may know, but they definitely leave knowing Node.js or Python. They know infrastructure as code, CFT or Terraform to stand up the cloud environment, or Chef. They're checking in their code through version control, and then they're watching the CI/CD process kick off because the dojo itself is maintaining a pipeline for delivering its own code.

They're learning cloud technologies like Lambda, CloudTrail, and CloudWatch. They're utilizing Docker to package up the application. And like I said, we are embracing microservices and REST APIs. We walk that talk. All of our development is with microservices and REST APIs.

The best part is when they're done with the dojo and they have these skills, they go back to their home team. And then the DevOps intent that is on their home team board, they pair up now, and they were once apprentices and now they're masters, and they're helping to perpetuate their skill set into the teams that they work with every day.

So we were successful. We're being successful with upskilling and at delivering solutions. We're producing reference implementations that people can reuse and modify.

We have a blue-green reference implementation that eliminates deployment downtime. We have a pipeline-as-code reference implementation that out of the box meets our audit requirements and works with our standard change process. Have a little luxury today of having the developer who worked on that here if anybody has questions on that.

Our security group approval and implementation or provisioning process was a huge bottleneck in our environment provisioning and AWS process. We are automating that.

We're instance rightsizing. We're looking at CPU and memory and looking for opportunities to downsize boxes so that we can save on money.

Maturity dashboard. I'm hoping everybody by this point has learned about Hygieia? Yeah. So we are extending Hygieia. Those best practices that we worked on from the beginning are going to become a rules engine. We're hoping to interrogate pipelines and application environments and report on an application's maturity to DevOps and cloud, and hoping that probably will be open source with Hygieia.

We clean up orphan ELBs and other resources.

And then last but not least, my favorite, which is the AMI compliance, is something that I've been passionate about. We look at the expiration dates on our Amazon Machine Image, and we notify when it's about to expire. If it reaches the expiration date, we terminate the instance. Again, security is very important to us.

But that AMI process, that yet again created another new opportunity for us.

Our DevOps transformation, I mentioned earlier, we're moving to self-sufficient teams. So we are bringing down the silos and we're transitioning to full-stack, autonomous, multi-skilled teams.

CTAE managers like myself, we are engaged with the application teams, disseminating and promoting these best practices. Do you guys remember Topo's talk this morning where he showed the 16 practices? So we're the enablers. We're the ones on the street working with those teams and getting those practices into the application teams and acting on those.

So I might be working with a team and I see that they're resolving a problem or working with an issue and they have a workaround, or they might not get a workaround. And there might be another team in the organization doing the same thing in a different way.

So as CTAE managers, we needed to be transparent and forthcoming with that information and start collaborating. And we did.

And the AMI rehydration process was one. We're releasing our public cloud team who bakes and customizes the AMIs for all of us to use. We were experiencing some incompatibility issues that exposed that we weren't parameterizing our AMIs and there was a much better way to do this, and it was causing a lot of frustration on the teams.

So we decided, well, let's pull up and solve this problem. And instead of doing the dojo, we started what's called a dojang. And we looked at the engineers across the organization who already had something in place, and we pulled up and formed a team, and we started to piece it all together.

The process that we went through was we ID'd the problem. We formed a team of passionate engineers who were willing to think outside of their team to solve a problem for the organization. We designed a target system. We scoped out our MVP intent and started working in sprints, and we keep to the two-week sprints. It wasn't a full 100% commitment by them because they still had other work to do.

And then we released the MVP. And then we could have done one or two things. We could have kept reiterating through and pushed out until that system was done, or we can tap into our inner sourcing model. As I mentioned, we embrace open source, and put the remaining intent out for other engineers to start tackling and raise up the quality of what we already developed.

And we ended up going with a hybrid approach, and we iterated through, and we're on the cusp of releasing our code. We've delivered a couple of different components to support the AMI process, and we are increasing the compatibility, and we are seeing a lot of improvement, and that this was successful.

We are also looking at other opportunities with how we manage our AWS accounts, how we go through the process of establishing applications from idea to conception. It could be endless.

So the dojo and the dojang are the yin and yang of our Card Tech transformation.

It's the Japanese place of the way, is the dojo. It's where they go, and they train up in their martial arts skills. It's a strategy fulfillment approach that is focused on transforming skills, and it's a continuously operating model that rolls over from one PI to the next.

The dojang is the Korean form of the dojo. Hopefully, you've learned something today. It is for problem resolution or what now I'm calling "passionearing," where you get a group of engineers, form a team who are passionate about making an improvement and solving a problem. They come in with a baseline skill set. I can't wait till somebody from the dojo is on a dojang.

And now my favorite word since I started practicing DevOps: it's an ephemeral operation. It'll start until we feel that we're done, and then it will stop until we have another dojang candidate.

We are still learning, and we are still leveraging the lessons learned. We've been into this, what, about seven months now? Since about April.

We now know to have a really good onboarding process and to get your onboarding associates acclimated to that process before they start, so they can get up to speed and exposure to the skills and tools that they're going to be using. And we keep evolving that. We were struggling with transition and knowledge sharing as we switched out dojos.

Team building and tooling is really important. Have fun. Go out to happy hours, lunch, play games. We have chosen really fun names for our dojo teams. Chicago is DevOps Runway, McLean is 7721, the longitude and latitude coordinates of our headquarters building, and Richmond is MYOBI: make it or break it.

Tooling took us a while to settle in on. We had HipChat lovers, Slack lovers. We had Confluence lovers. We had our internal intranet document management lovers. So it took a while, and we finally did. We narrowed down. We are VersionOne, Confluence, and Slack.

100% dedication from the core team members is really important, and then at least 50%, but you want more from the apprentices that are coming in to learn.

We value learning as greater than or equal to delivering. And when I say that, the dojo needs to be a safe place for them to get over a very steep learning curve, to make mistakes, to fail.

And that leads me to my last point, the recognition, praise, and support. So even the most minor, their code compiled or passed the unit tests, celebrating things like that. But also having empathy that these people are transforming their skills, and that's scary, and they could feel vulnerable, and that we need to understand that and be patient, and we want to support them through that transformation.

The dojo has been a great way for kinesthetic learners who might have gone to training but didn't get to come back and use what they learned, and to come in and then do real hands-on DevOps work.

The dojo is an outward visible sign of Capital One's and Card Technology's commitment to supporting our talent through this transition and to being successful. I think this is really important, that they see that we're doing this. We're shepherding them through this process.

Most people think of dojo as a building. And I heard a talk about that today, that it's a place. For us, it's not necessarily a place. We do this virtually. We have people all over the country. So for us, it's our place of the way. For us, it's the way.

So the CTAE strategy is being very effective in supporting our technology transformation. That by establishing an enabling organization that came in with an automation strategy that is focused on solving the challenges of moving to the cloud and transforming our platform, and enabling them with dojos and dojangs, we are seeing skilled talent, and we are simultaneously embracing and migrating to the public cloud while transitioning to a culture of DevOps.

It's been a very powerful thing to watch.

I put a quote down here that I hear all the time, because we can all get overwhelmed. Even just with what you've learned this week, your head's probably spinning. And we always say, and I hear it during PI planning, "The best place to start is to start."

Start small. Just pick a place. You don't have to boil the ocean. That's been a very effective way.

And so I do want to end that with what I said about Credit Card Technology's successful and simultaneously migrating. I also want to discuss the nuances between the Target dojo and our dojo, after I had the privilege of talking to Ross today, who is one of the pioneers of the dojo model at Target.

We did a little bit of a different approach. We developed our backlog and our intent, and we are training up associates to help us solve our intent. And correct me if I'm wrong, the Target model, the teams come in together. And the mentor and the coaches that are inside the Target dojo help that team resolve their DevOps intent.

So we have a little bit of a different way of doing it.

And I would be remiss if I didn't highlight our Capital One open source product. Hygieia, I think everybody knows about as a DevOps dashboard. And then Cloud Custodian is a policies and governance tool that will govern over your AWS account with the policies that you establish.

So we don't just consume open source, we also contribute back.

And that's all. Thank you very much.

Q&A

Love to take questions. Hopefully, I can answer them.

Does anybody have any questions?

And I have my army here.

Q: So it's actually more of a repeat, but a little more about the contrast between your dojos and Target's. I didn't quite catch that.

A: The intent is flipped. So we are the ones that own the intent, and we bring our associates in to help us do our DevOps and development that the Capital One associates will benefit from, versus a team might come in to Target's dojo with their own intent, and the Target dojo helps them resolve their intent and upskill.

Q: I might be heard throughout the room without a microphone. So I don't know if I'm being recorded.

A: I can hear you.

Q: I'm William Judd at Genesys. I've talked to a lot of companies that have had traditional project management offices for many decades, and it's obvious that as we transition to a model where we're delivering features every few minutes, there are no projects. So I wanted to understand from you, just briefly if you could share, what's your experience of the project management office in the context of DevOps transformation, and what's the role of the PMO?

A: I can't answer that because we don't have a PMO.

Q: You have never had one?

A: No. So we have product owners, and we are organized around our market segmentation, and the product owners establish intent, and the application teams resolve that intent. So I don't know if that answers your question.

Q: It does, yeah.

A: I don't know. Does that make... I'm trying to... And honestly, I haven't ever worked at a place that has a formal PMO.

Audience Member: So actually what we did several years ago is we moved to the Scaled Agile Framework. And at that point, we started transitioning our PMO from your traditional command and control, and we actually started doing our planning through value streams and through product management. And so that really paved the way to be able to enable this type of interaction.

Q: But do you still have a scrum master and agile process that you administer?

A: How does our scrum master work the agile process?

Q: Yeah.

A: The same way any agile team. We have a fixed scrum master for each of our dojos. So they are just applying the scrum principles to the dojos. Does that answer your question? Not sure I...

Q: Yeah. So these people are embedded in your teams, in the product teams?

A: Mm-hmm. Yes.

Q: So just in terms of when you run your dojo, do you run individuals through, or do you run entire teams through?

A: Individuals.

Q: Or does it depend on the size of the team?

A: We run individuals.

Q: As a group, though, right?

A: As a group. Yeah, individuals come from different organizations and stay for eight weeks and work through our intent.

Q: Is there a specific reason you chose to go that direction versus doing teams, or was it just that's what ended up happening?

A: Yeah. Volunteer-based, so we would take what we could get.

Audience Member: So please repeat the question if you could. Thank you.

A: Oh, here. That's right, I have a microphone here. He was asking why we chose to do the individual volunteer approach versus a whole team coming in, and our answer was because we took a volunteer approach to it. And by the way, we have thousands of people that we need to be working with, and that's a lot to manage as well.

Q: Do you ever get pushback from people who, they can't do it, their manager or their current activity doesn't let them do the work?

A: Yes, we do, but we try to be very careful to have associates that volunteer that will commit and are available. But we of course had to switch out because priorities shifted on that team.

Moderator: And Aimee, we're actually out of time.

A: Okay.

Moderator: So if anybody else has any other questions, feel free to come up and speak with her, but we have to switch out the room now.

A: Thank you.

Thank you.