Getting Business Results Faster With FedEx DevSecOps Fast Lane

Log in to watch

Europe 2021

Download slides

Getting Business Results Faster With FedEx DevSecOps Fast Lane

Matthew Pegge

Managing Director IT · FedEx

Ilia Shakitko

DevXOps, Site Reliabiity Engineering & Innovation Lead · Accenture

In this talk Matt and Ilia will explain their journey to enable DevOps in a traditional program that supports route optimisation services for FedEx packages delivery. And how leadership behaviours, enterprise wide cooperation, engineering practices and shared quality mindset changed hearts and minds of people which resulted in supercharged delivery in this program.

In such a large enterprise it’s not possible to just start “doing DevOps” by a team and that’s it - there are organisational procedures and functions, established historically and distributed geographically, various delivery commitments to the customers, ongoing operations.

In reality there is limited capacity for such moves. We all know the theory - everyone who is participating in value delivery needs to be “on board” in supporting the change:

Leaders to model behaviours and allocate time for improvement.

Change Management to make processes better so products can move fast.

Security and Compliance, to shift left and let possible violations be caught at early stages.

Infra and Architecture, to enable the ecosystem and let local architectures and solutions emerge.

This story is about such 360 degree cooperation, where leaders, various organisation functions and Agile Release Train teams joined their forces and made change real.

Chapters

Full transcript

The complete talk, organized by section.

Matthew Pegge

Okay. Hello everybody, and welcome to our virtual talk today on our approach to how we created a DevSecOps fast lane. My name is Matthew Pegge, and I'm Managing Director of IT at FedEx Express. One of my main responsibilities is to lead the business agility transformation in Europe, and today I have Ilia with me. Ilia, over to you.

Ilia Shakitko

Thanks, Matt. Hi all. I'm Ilia Shakitko. I'm a DevSecOps, Site Reliability Engineering, and Innovation Lead at Accenture, and I'm working together with Matt on the FedEx business agility transformation in Europe as a technical coach.

Let's go and give the overview of FedEx. Matt?

Matthew Pegge

Thank you. Hopefully everyone has, at some point in their life, received a package from FedEx, or at the very least seen the film "Cast Away," so you've at least heard of FedEx, if only in the movies.

For those of you that haven't, FedEx as a corporation has over half a million employees and an annual revenue of about $70 billion. We're made up of six operating companies: Express, Ground, Freight, Logistics, Office, and Services, and I belong to FedEx Express.

Express was the original and is historically the largest of the operating companies within FedEx, and is significantly the only one that operates outside of the U.S. As you can see, of the total half a million employees, around half of those are employed within the Express operating company, and more or less a third of all shipments that FedEx handles globally are from within the Express operating company.

Next slide, please.

FedEx as a corporation has been experimenting with agile for many years, but we officially launched our business agility transformation for real back in around 2018, when our CEO, Raj Subramaniam, and our CIO, Rob Carter, used the analogy that you can see on the slide of the small, fast fish eating the large, slow fish.

The analogy goes that gone are the days where the big fish, such as FedEx, simply bought out their competitors or ate the small fish. These days, we see small, nimble startup companies, unimpeded by the years of legacy technology and layers of bureaucracy that the big fish can have, attacking small slices of the most profitable bits of our value stream. Left unchecked, this could lead to a death by a thousand cuts.

That's why they explained FedEx needs to lean into business agility and become a more fast, flexible, and focused company. Ilia is hurrying me up there.

In order to support that business agility transformation, we established an enterprise business agility office. This was stood up to build the core business agility framework based around the five key areas that you see on the slide. The model also allows for the operating companies and regions to build their flexible edge while still maintaining enterprise-wide standards and nomenclature, so we don't all need a decoder ring to understand each other.

What do we want to take away from today's session? It's common sense that we want quality user experience, customer and employee satisfaction to be the best, and to continuously improve the value we deliver. However, it's often a hard decision to sacrifice FTE time for non-delivery work. There's always demand for new features and capabilities, internal requests for new requirements, and BAU defects to fix. We don't want to pause the machine or slow down delivery, right? What about ad hoc requests and deadlines? There's a seemingly never-ending cycle of continuously growing product complexity.

On the other hand, when it comes to innovation and improvement, everyone thinks it's a good idea and something that they should support. We encourage teams to adopt new ways of working, apply automation, and improve quality. Yet the reality is there's never time for these moves, or if there is, you stumble upon roadblocks at almost every move. Everyone else is doing their own thing in their own way, and whenever you want something, you get asked to raise a ticket. This doesn't enable change.

As I said, one of my roles is to stand up a European Agile Enablement office in order to lead the transformation in Europe. As part of our broad and balanced transformation plan, one of the key things we decided to do was set up a flagship ART. This is an agile release train that could be recognized as being best in class, and then we would use that to nail it and scale it and spread that across the other release trains.

Again, we focused on a balanced approach to launching the train, taking an existing ART and using external coaches from Accenture SIQ to help coach the ART alongside internal coaches and build a robust ART roadmap. Key to this was maturing our DevSecOps capability, and today we aim to share how we approached this in a bit more detail. Ilia, over to you.

Ilia Shakitko

Thanks, Matt. Before diving into the details, it is important to make a little disclaimer. FedEx is a large enterprise, and this is not a no-regret move in isolation. FedEx enterprise leadership and Accenture SIQ coaches worked together prior to this event, developing an outside perspective on business agility maturity across all operating companies.

Improving technical agility is one of our critical priorities of the transformation. There are more than five change events taking place in our journey, of course, but here we will focus on five main ones. Let's talk about each point now.

There is a selection of DevSecOps health diagnostic frameworks. Together with coaches, we designed a custom approach to give us necessary insight across all areas. That included maturity assessments, surveys, workshops, and interviews.

The product selected for the fast lane was the new ground we entered. Not all processes were established, and plenty of opportunities to improve were found. We started by developing an outside perspective on the DevSecOps maturity and made it our baseline. As you can see, few areas on the top were making their way toward the second level.

The challenge was not the fact that some may think, "Yeah, that's a low maturity area. We got this. Next." The real challenge, and our motivation to have this talk today, is the time and effort it was taking to gain those initial improvements. You can also see some continuous integration ignition, pieces of automation, and tests automated here and there. Hey, what about security?

And about continuous integration, by the way, did anyone see Jez Humble's talk about real continuous integration? I'm curious. Share your thoughts in the chat. Are your teams really doing CI?

Okay, there is plenty of room for improvement. Where to start? What improvements can be carried away by the teams independently, and where is program or enterprise support required?

In a complex product development ecosystem that has ongoing commitments running, our approach was to implement improvements that make the largest impact and impede the flow most. Mapping the delivery pipeline helped here to reveal main points of rework and delays. We also looked at the areas of control for each improvement opportunity.

The picture and highlighted areas are the common struggle for teams at most of the maturity levels. Is the path to production defined and clear to everyone who is participating in value delivery? Is quality incorporated at all stages of the delivery and collectively owned? Are the teams moving small enough customer-centric value pieces throughout the delivery?

You may have had time to glance over the selected improvement lists. We will see remeasured maturity at the end of this talk, and hopefully you'll join our achievement celebration. How much time and support would you give to your teams to learn, fail, and evolve in such areas?

Now going to psychological safety. Matt, over to you.

Matthew Pegge

Thanks, Ilia. Now we have commitment to drive the change and have improvements embedded as part of our regular delivery. But is your environment safe? I'm talking about psychological safety here. Are your teams and individuals able to take risks without feeling insecure or embarrassed?

The reality is often not what you may think, and creating this safe environment is a key role for an agile servant leader. An unsafe environment causes team members to share fewer ideas and to over-filter them, so they don't share because they don't feel safe. Stephen Smith published a safety check article which gives excellent insight on how to conduct regular exercises to measure and improve team psychological safety.

It's also important to allocate time and space for the improvements to happen. We addressed this on various levels in our case. Firstly, the program management realized the need for the improvement and supported it by incorporating it into the program backlog. Next, the business owners recognized the value added and reflected it in their higher enabler objective business value scores.

Thirdly, the innovation and planning iteration was given more attention, and we tried to limit or reduce the urgent things we needed to finalize. You notice I didn't say completely remove. We're also iteratively improving here. Number four, the product owners were coached on balancing the iteration backlog with user stories and enablers. Finally, one team started to experiment with the 80/20 rule.

Besides the actual hours dedicated, we also considered the natural learning-curve cycle. All improvements, techniques, and tools can't be applied all at once or in a row. You need time for the knowledge to settle. Something may not work for a particular team or technology. It all takes time, and you need room to inspect and adapt. Back to you, Ilia.

Ilia Shakitko

All right. One of the key components in decreasing risk of releases is to reduce batch sizes. When we started to move work batches through the pipeline more often, we started practicing integration, testing, and deployment processes more often, and therefore were able to find and fix problems earlier.

Here I always like to share a little story from my past experience about one team that was undergoing a change together with a program, and they were not really giving full buy-in to shift to smaller batch sizes. However, they were still cooperative, and they decided to give it a chance. They said, "You want the small batch sizes, you will have small batch sizes." But they were a little bit resistant still.

They decided to go really wild with decomposing work into even smaller sizes. Well, that isn't the best move when you're aware of the transaction costs, but that was another learning point. The end of that story is the team was surprised with the amount of completed work and actually fully closed iteration goals.

The next point on the list is work-in-progress limits. It is easy to get this point lost if you stop at just assigning WIP limits in your ALM tool. Embed this into the team working agreements. Develop a set of scenarios on what to do if a column became red. Address this in retrospectives or along the iteration to keep respecting the work-in-progress limit.

Finally, while technical feedback from integration and tests may not be the first available move, a few techniques that we implemented were to move closer to Scrum events, reducing the overhead on planning and giving acceptance and quality feedback earlier.

Having an enterprise DevSecOps platform and dedicated team who continuously evolve and adapt it to the organization's unique context is one of the key enablers for rapid feedback. We also boosted our environment with enablement and technical coaches who are always there around the product teams to get them on board and support the adoption.

Here we show our continuous delivery pipeline. You may have spotted the common parts such as quality scan, release candidate build, packing, automated tests, and deployments. I believe it is becoming common to integrate these tools with ALM.

We went beyond and extended the capabilities to include critical enterprise functions into the process. That's one of the examples of enterprise cooperation and collaboration. In order to satisfy compliance and future audits, the enterprise platform team works together with compliance teams to understand the requirements and incorporate them into the necessary stages of the pipeline.

Security is developing reusable assets and pipeline references to enable scanning into the delivery and conduct it continuously at various stages. Release and dependency information is gathered from various sources and appended throughout the automated release life cycle. Finally, project and change management services that are historically part of the processes are populated, updated, and closed.

Yes, we still have that box with the purple edge. Did anyone notice? Some of the review and approvals have to take place there. The good news is that, for now, once a review or approval took place, release continues automatically. But for now, this is our next point of improvement.

Now that we connected all our pieces of the delivery in one automated process, there is a big chance the big bottleneck and rework lay at work acceptance and testing. It is great if your team started greenfield, doing unit tests, component tests, operating in batches, automation, all that.

In our case, we started with a traditional development program where there were shared testing functions and phase gates. Imagine the first time testers see features and behaviors that need to be tested is when it's considered to be done. And what is done, by the way? We'll touch that in a moment.

Your testers switch from whatever they were testing previously for another part of the program to the new features coming in the queue. It's great if all the specs and behaviors are clear to everyone, but sometimes it requires extra communication to understand what exactly needs to be tested.

As you saw earlier in our story, we mapped the delivery pipeline, and we clearly saw percent complete and accurate metric low at the testing stage. That means up to 50% of rework is getting back all the way to development. This is a very expensive situation to have such a number on the right side of the pipeline.

We know there is a testing pyramid and quality shift-left technical practices that have to be implemented, also to reduce overhead on heavy and long UI testing, moving closer to unit, API, and component testing. But today's story is to highlight cooperation and collaboration, thus we will focus on points related to that.

One of the first things to do was to get various perspectives together in one room and let them talk. This needs to happen besides the backlog refinement. Before user stories are taken into development, it is not about getting three people, a PO, testers, and a developer, but getting the three perspectives with whatever amount of people is required.

So testers have the chance to ask about happy and sad scenarios, developers understand what are important points to think about, clear some assumptions, or maybe add a question that the PO has to return on.

Defining what ready and done mean was a crucial exercise, as it also incorporated various perspectives and was placed as evolution number one of the agreement. We had to be careful. It was tempting to add all sorts of criteria in the definition of done by PO and quality, as well as definition of ready by engineering.

Our coaches helped to start with the right balance, because if we overload definition of ready, it will become a phased approach again, and we will wait until perfect design requirements and definition to start work. Same with definition of done. If a team would start with too much commitment, things will never be done. It has to be evolutionary and reviewed every few weeks.

Last but not least, vertical slicing. Together with the previously mentioned small batches, moving all pieces that make a minimal value increment to the user made the difference. In particular, it became easier for the end user to understand what's being delivered and provide feedback. Shout out if you recognize the situation where customers or stakeholders are dropping the demo attendance because they don't understand what's being demoed and how to provide feedback to that.

The next thing was to look where the testing is struggling and how efficient it is. We realized there is a continuous stumble at lower environments. We don't yet talk about pre-production. How can we even talk about shifting quality left if there are so many components and dependencies around, so most existing functionality can't be easily tested without having access to real systems?

Service virtualization and stubbing really helped us to speed it up. In particular, we've got no need for dependent services to be up and running when system-level test is needed; reusability of virtualized dependencies for other products; the ability to capture required dependent system behaviors and generate stubs; and the ability to simulate realistic performance. Also, we noticed improved quality feedback time and frequency, eliminated zero-value routine, and improved overall testability.

It is hard to imagine a large enterprise having no security and compliance involved in delivery and releases, right? But let's face it, when are these parties usually involved in the process? I clearly remember the time when DevOps was a rising hype and everyone was looking at unifying development and operations: someone just merging two teams, someone looking broader at improving the walls of confusion, removing them between those two functions.

But then we had security saying, "What about us?" Business intelligence, infra, compliance: "Hey, you've forgotten about us as well." It was a nice rising of DevSecOps, stressing continuous security importance and incorporation across the development stream.

But how can we deliver fast if, in the end, there is a gate that performs verification whether an ongoing release satisfies all necessary criteria, is secure enough, compliant, et cetera, and that can reject the release? And that would be right.

Let's be realistic. How many times have we had all that hard work done, waiting in the queues, approvals, et cetera, and then it returns all the way back to development because of some security issue? Don't get me wrong, it has nothing to do with bad security folks who don't let us go fast. No. This is a very crucial moment. The problem is just that the feedback we receive, we receive at a very late stage. So how can we shift this feedback left as well?

That's what we've done. We worked together with InfoSec and cooperated with them, compliance teams, and the enterprise DevSecOps platform, and incorporated common and routine operations, requirements, and expectations for what is needed to have the complete package within the release and increase the chances to go through from the first round.

We had to think what is possible to test and include in the pipeline at lower levels. Can we start collecting important information about release, approvals, et cetera, and continuously append reports that can be used in the end? In a strict security environment where you can't simply commit and deploy as the same person, how do we incorporate that into our automation process as early as possible?

Last but not least, we observed the ecosystem replied back to these moves, providing back to InfoSec and the enterprise platforms community improved automation and pipeline pieces that benefit existing teams and everyone who just started their DevSecOps journey. Now over to Matt.

Matthew Pegge

Okay, so now we've connected the delivery process into one automated delivery flow. But we were still constrained with actually rolling out the changes, because once new features hit production, that's it: it's live. External and internal customers get to see it.

In our case, the products undergoing the change had internal customers spread across several regions and countries, and sometimes training or other business requirements are needed in order to be able to start using the features. And what about release planning? What if users in region A aren't ready or able, due to labor relations issues, to start using the new features and processes?

We've automated and accelerated the move-to-production process, but now we are asking the dev teams to hold on the deployment so we can first ensure everyone is educated how to use the new functionality, or we have the local workers' council approval, and then we can finally approve the release.

This is a separate journey to change hearts and minds of people to learn that the move to production does not have to mean release to end customer.

The dev teams always want feedback in the production deployment. Yes, we know the change worked and was tested on pre-prod and all that, but there's always a chance that things may not go quite as expected when integrating changes into the complex production environment.

What if we can let the team deploy their changes as fast and frequently as they want, still ensuring there are proper fallback scenarios in place to ensure no business disruption to the customer experience, but at the same time let the business decide when and where they want to open up the new features?

This is what we did in this fast lane exercise. In addition, due to the complexity of our landscape, we extended the feature-toggle capability with not only on and off, but also some basic business rules that specify when the toggle should be on or off based on a location, region, or even what type of user.

This decoupling enabled various benefits such as time to production, fast and frequent feedback, and advanced engineering practices, but still allowed us to maintain control for the business side.

When all of this is done, did we make a difference? What changed? We've covered some of the points on this slide on previous slides. However, some of the best results we got were around team engagement and satisfaction. In retros and inspect-and-adapt events, we started to see real signs of the team becoming a true learning organization and growing employee engagement. Where previously people saw an issue but didn't feel like they owned the solution, they now started to understand that they're empowered to make the change and fix it themselves, and not wait for management to say so.

Finally, you can see that we reassessed our DevSecOps maturity status against the baseline we took and saw improvements across all elements of DevOps. But this is not a one-time exercise. It's a continuous journey that we will carry forward using continuous self-assessment and improvement.

And we're not done yet. As we keep saying, this is an ongoing iterative journey. Some of our next steps include expanding the scope to look more closely at the release and change process and bring them closer to the teams, getting rid of that purple box Ilia showed earlier.

Now we have a flagship, but the purpose of the flagship is to use it to inspire others and hold it up as an example of what can be achieved and what good looks like. So we need to go and replicate that across all of the value stream, across all of the agile release trains and teams in Europe.

Ilia Shakitko

Indeed. It is very crucial to enable collaborative and learning environment, invest time into improvement to achieve greater success.

Summarizing the story, while there are known paths and frameworks to enable DevSecOps and agile in your organization, it doesn't work to just ask your teams to innovate and improve and expect it to happen by itself. Everyone is busy with deadlines, commitments, and catching up with bug fixes.

It is crucial to establish a safe environment to invest time into improvement and drive the change. But if you want to supercharge the change enablement and scale it widely, enable the learning and collaborative environment around those who are undergoing the change, and you will see the difference.

Matthew Pegge

Well, that was our story for today. Hope you enjoyed it. Thank you, everyone. Thank you, Ilia.

Ilia Shakitko

Thanks, Matt, and see you later.