Scaling DevSecOps adoption in a Large IT Services Firm

Log in to watch

Europe 2022

Download slides

Scaling DevSecOps adoption in a Large IT Services Firm

Dhruba Chaudhuri

DevSecOps DevOps SRE Site Reliability Engineering Process Design Specialist Functional Lead · TCS

Leena Pradhan

Delivery Excellence · TCS

The session would highlight the experience of TCS, a large IT services organization, in deploying DevSecOps at scale. We have designed a comprehensive DevSecOps framework. This is abstracted into an assessment method for benchmarking the DevSecOps maturity of various projects. The assessment provides the quantified maturity score in various DevSecOps domains and dimensions. The assessment report includes the project-specific recommendations to elevate DevSecOps maturity. At the project level, the assessment findings are contextualized to come up with the improvement roadmap for implementation.

At the organization level, 530+ projects have already been assessed. The projects are on their DevSecOps improvement journey.

The session would describe the TCS journey, an overview of the assessment framework, and the approach of promoting improvements in the organization.

Chapters

Full transcript

The complete talk, organized by section.

Leena Pradhan

Good morning, good afternoon, and good evening, everyone across the globe. It is a great pleasure for us to be sharing our experience in this great forum, the DevOps Enterprise Summit.

We are from Tata Consultancy Services, a large IT services firm. I am Leena Pradhan, leading service reliability engineering practices within the Corporate Delivery Excellence group. The charter for our group is to enable our engagements to deliver reliability in their services to our end customers. We look at the best-in-class engineering practices that are prevalent in the industry, look at these practice implementations across our different customer engagements, and then scale these practices across the organization.

Dhruba Chaudhuri

Hello. Dhruba here. I have 28 years of IT experience, and I am currently working in the Corporate Delivery Excellence function. My focus areas are DevSecOps and SRE practices. Tata Consultancy Services is an IT services company serving a very large customer base across geographies, countries, and industry verticals.

Each project is unique and a different use case in terms of domain, technology, methodology, culture, maturity, and complexity. Today, we would like to share our experience story on how we are driving DevSecOps adoption at scale, embracing diversity.

Leena Pradhan

Before we go into the details of how we are scaling these DevSecOps practices across the organization, we would like to share what triggered this thought process. In 2017, our leadership team set one vision for the entire organization: to be enterprise agile by 2020. We were not looking at agile only in IT projects, but agile in every function. The entire organization was driven by this one vision set by our leadership team.

We went about doing the transformation. We looked at process transformation, and we leaned as required. We did a people transformation to enable teams to imbibe agile behavior in all their practices. We also did a workplace transformation, which enabled a seamless and collaborative working experience for our associates.

With a strong deployment focus, rigor, and continuous leadership direction, we achieved our vision, and 2020 was the enterprise aha moment for us. With this solid foundation on agile, we now have more than 80% of engagements being executed in agile. This transformation brought another objective in front of us: to reinforce engineering practices and go deeper into DevSecOps practices.

DevOps complements agile from a technology standpoint, and thus this transformation brought DevOps into more prominence across the organization. It also paved the path to faster realization of business value. We understood what it takes to deliver faster and quicker. We are now able to help our customers achieve their business objectives and stay competitive against other players in the industry. Our customers are able to respond to industry demand with much more agility, and this also brought a lot of efficiency into the various processes.

That was on the customer side. If we look at what is happening in the technology space, in the last few years we have seen a significant rise in the digital footprint in every business, big or small. Industries have gone mobile. There is a lot of web presence in all industries, and organizations are also looking at a multichannel approach for delivering services to end customers. Perhaps the pandemic has accelerated this movement across all industries.

This also saw increased exposure to various cyber threats. What it meant was that we require a very fine balance between speed, reliability, resilience, and quality. Hence DevSecOps is becoming a default ask now. What was a need in certain pockets, industries, or projects has now become a default need for every other industry.

Around the same time, our customers were also getting more interested in DevSecOps, and they were asking what TCS capabilities were in implementing DevSecOps. With this agile transformation, we found that accounts that were early adopters of agile had already done multiple cycles of transformation for faster service delivery. They had done transformation in terms of people, process, and different technology or automation enablers.

They had moved from earlier, larger, less frequent releases -- yearly or half-yearly -- to much smaller and much more frequent releases: quarterly, monthly, fortnightly, and some even had release-on-demand capabilities. But when we looked deeper, we found that these were all diverse implementations. To some extent, there was also a difference in understanding of what DevSecOps is among different teams. Experience and learning mostly stayed within teams, and teams were relearning and reinventing the wheel, even though the same practice had already been implemented elsewhere in the organization.

Essentially, we were faced with the challenge of scalability and multiplicity in a big organization like ours. For a process group like us, it was imperative that we have a standard framework to scale DevSecOps across the organization.

That is when we came up with the TCS DevSecOps framework. It is an exhaustive process framework that helps implement DevSecOps practices across different engagements. It helps institutionalize these practices across teams with different diversity, and it addresses the challenge of higher incubation periods or higher readiness cycle time. It also gave us the opportunity to benchmark different accounts in terms of their DevSecOps maturity implementation.

This is an exhaustive and comprehensive process framework that helps us strengthen our core capabilities in the engineering space.

The framework is a four-layered structured framework. There are five domains, practice areas within them, 40-plus themes, and more than 80 practices. The five domains are continuous planning, continuous development, continuous testing, release or deployment, and continuous monitoring and improvement.

The continuous planning domain deals with practices related to product conceptualization and product planning. It also looks at team and foundational readiness in terms of skilling and automation capabilities. Continuous development looks at practices related to development, peer reviews, security practices, and continuous integration. Continuous testing looks at practices related to test design, test strategy, test data enablement, and test environment enablement for various functional and non-functional testing.

Continuous release and deployment deals with practices related to product validation, release approval, release process, and different zero-downtime strategies for deployment. Continuous monitoring and improvement looks at practices related to monitoring and observability, and it also looks at what insights can be derived based on data collected throughout the delivery life cycle.

This framework has been designed with best practices from TCS matured implementations and practices that are prevalent and upcoming in the industry. It is a multidimensional framework. It addresses culture, processes, people, quality, security, reliability, and resilience. The framework has a special focus on automation aspects for different practices such as testing, integration, build, deployment, monitoring, and so on.

The framework also covers in depth the various KPIs: pipeline KPIs while the software is being developed, application performance, infrastructure performance and health, and product performance in terms of business KPIs. These dimensions have been carefully and thoughtfully selected as essential for achieving the goals of DevSecOps.

The framework covers the entire breadth and depth of service delivery. All those 80 practices cover the breadth of the service delivery life cycle. For each of those 80 practices, we go into the depth of each one. By depth, we mean who is practicing it: the product team or another team helping the product team execute that practice. That tells us about the self-sufficiency or cross-functional ability of the product team.

We look at readiness in terms of required skills and automation capabilities; whether the practice is implemented offline or integrated as part of the pipeline; how early the practice is done, to understand the left-shift culture of the team; and if it is an automated practice, whether it is fully automated and what the extent of automation is.

As an example of static application security testing, we look at whether the product development team is capable enough to implement this practice, or whether an information security team outside the product team is executing SAST. We look at whether the product team is enabled to do setup and tool configuration, and whether they are enabled to understand and infer from the tool output. We look at whether SAST is triggered as part of the CI builds or executed as an offline process.

When we look at the left-shift aspect, we check whether SAST tools are integrated as part of the developer's IDE, whether they are part of CI builds, or whether they are done just before release. The framework also looks at whether the entire code base is being used for SAST or whether it is done only on a sampling basis. The breadth and depth of each practice determine the maturity of the practice, and in aggregate these help us determine the maturity of the team.

Dhruba Chaudhuri

Just to summarize what we have heard so far: for a service industry like TCS, the DevSecOps adoption scenario is different due to huge scale and diversity. Each project is a different instance and not comparable with others. We have come up with this standard framework that integrates DevSecOps core principles, practices, and culture into a software engineering assembly line to deliver reliable, resilient, and secure product increments at speed or on demand. The framework defines the TCS way of approaching DevSecOps projects aimed at achieving DevSecOps goals and objectives.

After the framework, what next? We wanted to benchmark our projects against this framework baseline to understand their current state of adoption and further opportunities for improvement, because we want all of them to elevate their DevSecOps maturity. We observed that maturity plays a very critical role in attaining and maximizing desired benefits from DevSecOps adoption.

We started doing manual assessments to gauge the current state and opportunities. The method included identification of stakeholders and the assessment team, conducting interviews and pipeline demo sessions, manual benchmarking against the framework elements, preparing findings, conducting draft validation sessions with the team, and sharing a final assessment report with detailed findings.

One sample report is shown as the outcome of this manual benchmarking method, which captures adoption and practices against 200-plus practice elements and their state of adoption using different colored cells.

Do you think this was scalable? No. We could complete only 20 such assessments in seven months, where we target to cover multiple thousands. Needless to say, the method was neither scalable nor sustainable due to many challenges, such as uniformity, stakeholder availability, being very effort intensive, consuming a high cycle time of four to six weeks, and so on.

How are we overcoming this? As a next step, we abstracted the framework into a technology-agnostic, self-paced, and digitized assessment method having four maturity levels: basic, standard, advanced, and best in class. It has over 500 responses designed to capture each practice element across five DevSecOps domains and associated dimensions such as people, process, technology, culture, automation, security integration, and so on.

The maturity is based on breadth and depth of practice implementation and rigor, as explained earlier, in a T-shaped model. It may also depend on practice gradation. For infrastructure as code, for example, the variation in deployment of the practice can vary widely. Starters can use IaC only for infra configurations, whereas others can provision an infrastructure component. Some are able to provision an entire environment, like spinning up or tearing down environments at will, and some are even more mature and can orchestrate the entire production deployment automatically using auto-rollback facilities.

This assessment has an integrated knowledge base to translate the responses and come up with a detailed report that is automated and sent to the team within a one-day cycle time. One sample report has three parts. The first part is the domain, subdomain, and dimension-wise maturity score and levels. The second part provides an executive summary on key aspects such as deployment frequency, lead time, automation, security integration in the pipeline, and similar items. The third part is the detailed findings on every strength and opportunity, classified as observations, and related recommendations based on the framework, with suggestions for industry-leading best practices to explore, such as hypothesis-based testing, often known as chaos engineering, red teaming, and so on.

We show one sample question to demonstrate the rigor in the assessment method and the responses we designed. It is a continuous integration build/check-gate question with possible responses. We have experienced various projects and seen both extremes: someone doing continuous integration only for compilation and linking different libraries, while another is able to integrate all these steps and have a robust check-gate also.

The assessment and reports are technology agnostic. What do we do with these reports, and what is the flow? Report findings are interpreted in the context of the customer to come up with a detailed action plan with priority, action owner, and timeline. The context could differ: technology stack and limitations, customers' existing investments in technology and tools, agile maturity, organizational and cultural rigidity, alignment toward traditional and other technologies, and so on.

It is then shared with the customer to further elevate their DevSecOps maturity. As actions are taken, reassessment is recommended in four to six months or so to see if maturity elevation for these projects has happened and whether that is reflected in achieving the DevSecOps goal. If maturity elevation has not happened, then they need to continue with the actions and look at the rigor of implementation. If it has happened and the reflection within the goals is visible, then they go for the next arm of maturity. This is how the journey continues for every project.

The next question is: we have huge scale, and the framework and manual assessment could not sustain that. How are we reaching different engagements? We have a strong ecosystem and enablement team in the form of delivery excellence heads, delivery excellence partners, unit agile leaders, and a community of DevSecOps practitioners as a neighborhood to facilitate scaled adoption.

Apart from this, we have been evangelizing DevSecOps adoption through various forums such as webinars, focused weeks, account speak, knowledge-sharing sessions, and so on. We also conduct orientation sessions at regular intervals at business unit and organization levels.

To show a sample governance mechanism by different business units, the specific single point of contact, or SPOC, drives rigor in DevSecOps adoption within his or her units. They often adopt various means such as a coverage dashboard, daily sync-up to respond to queries and remove impediments, monthly town halls to answer queries and show up to management, and collaboration with us to get any help required to sustain the initiative. The focus is to bring everyone into this improvement journey to maximize both customer and organizational benefits.

Now, talking about the impact. We explained how the framework and assessment method are helping us benchmark and scale DevSecOps adoption at organization level. As we speak, we have already achieved 700-plus benchmarking assessments for 200-plus customers across business units and geographies. Customers are very positive to see this proactive engagement as growth and transformation partners toward improving their DevSecOps maturity, thus helping and enabling them to realize their business benefits. This has reflected in their feedback too.

As a derived benefit at organization level, we are also seeing additional benefits because we have a rich information dataset of 500-plus responses from each of these 700-plus assessments. We are coming up with various additional organizational insights on each practice element and corresponding dimensions. This is helping us identify specific focus areas to drive excellence in software engineering practices across the organization and even beyond DevSecOps projects.

To summarize the behavior exhibited by different maturity levels of projects, at a high level we observe that advanced and best-in-class projects exhibit better ability in continuous delivery, enabled by maturity in test automation, infrastructure qualification, security integration, and so on. They are also more mature in zero-downtime and zero-hands-off production deployments, often one click, followed by real-time observability and proactive event management.

Additionally, best-in-class projects show better adoption of advanced capabilities such as AIOps; testing with focused user groups like A/B testing and dark launching; failure injection to test infrastructure resilience; and so on. Projects at standard maturity are capable of doing continuous integration and are reaching for the next higher level by improving in areas such as adoption of best practices like test-driven development, behavior-driven development, threat modeling, building cross-functional skills in test automation and infrastructure automation, integrating security tools within the pipeline, improving real-time pipeline visibility, and more integration and proactive monitoring.

We have often seen that cloud adoption helps projects reach higher maturity levels, especially from infrastructure automation and continuous monitoring perspectives.

How is DevSecOps adoption helping TCS and customers? There has been customer delight and appreciation everywhere, as it helps them realize DevSecOps goals toward maximizing business benefits such as faster delivery, high-frequency releases, improved agility and responsiveness toward technology changes, business change events and disaster management, faster recovery from incidents and outages, improved reliability, and so on.

The question could be: how are we refining the framework? Are we refining and enriching this framework and assessment method? Yes, obviously. We are continuously working to upgrade our knowledge on how the industry is trending, what new practices are emerging, what our mature engagements are practicing, how they are practicing, and what additional needs exist. That is how we feed all our experiences back into this framework and assessment method to continuously enrich it.

With this, we come to the end of our presentation. Thank you all for having patience and listening to our experience toward scaled DevSecOps adoption, and how this is different from the perspective of a very large IT services company. We are sure you will be able to replicate this story in your organization as applicable. Thank you.

Leena Pradhan

Thank you, and have a good day.