Endless DevOps (and DataOps) for Data Ecosystems at National Bank of Canada
Sharing takeaways (success and failures) about:
• leadership and software engineering throughout a 5+ years journey in 2 different ‘data’ departments at NBC (National Bank of Canada).
• the complexity to kickstart a software engineering community in an existing IT organization (100+ teams, 3000+ employees & consultants) and its coexistence with established DevOps communities.
Also in the mix, the importance and role of organizational culture and diversity, specifically at the department level, in management (and technical) IT teams.
Context and use cases will be based on those departments:
• Corporate Data -- IT only -- 50+ employees & consultants -- prioritization done internally.
• Master Data Management -- Business + IT -- 150+ employees & consultants -- prioritization done by Business PO (Product Owner).
Some insights, reflections, and updates from my previous DOES presentation in 2019 will also be shared.
Chapters
Full transcript
The complete talk, organized by section.
Maxime Clerk-Lamalice
Hello, bonjour everyone. My name is Maxime Clerk-Lamalice.
Thank you to the organizing committee for the opportunity to speak again. I am a software engineer by training, based in Montreal. I have been helping software teams for many years now: in the startup world for around 10 years, then NBC, National Bank of Canada, since 2016.
Here are some facts about NBC, which was created in 1859 and grew organically and through many acquisitions, as you can see on the timeline. We are now serving 2.7 million clients.
The bank's mission is to have a positive impact, and to do so it operates four lines of business: personal and commercial banking, wealth, financial markets, U.S. specialty finance, and international.
So let's start with some words about what is new since we last talked. Back in 2019, I had the opportunity to present in Vegas in person. What a great experience, to share and learn from peers.
In the last three years, I have worked with many groups at the bank and also with many providers and third parties. But most of my time was spent with those two teams: the data warehouse team and a data platform team. So basically, a lot of software and data.
My talk will be about both teams, but also shared services or enabling groups at the bank.
Before jumping into my experience report, let's do a quick recap about my 2019 talk. It was about a three-year transformation journey from a classic model, Dev versus Ops, to a full DevOps model. It was about a fun and performing team, and how to measure what matters. It was also about converting challenges into opportunities. Not an easy task, but a required one.
In 2021, during COVID, I joined an existing team to help with the predictability of the delivery. You know: a predictable scope, timeline, and budget for all squads. Perfect, I said, I have a plan. So with my colleagues, using my 2019 DevOps transformation roadmap, we planned and compressed a timeline to a quick six months, thinking it would be done in five.
Now let's talk about what we learned.
Before going further, a short disclaimer: this story is not about design patterns, service mesh, data mesh, platform migration, or on-prem versus cloud. It is our reality, and we deal with those topics on a daily basis.
Let's start with some context about my current team.
MCP stands for Master Client Profile. It is a program and a platform integrated by the bank. Here are some facts. It is an MDM, which means master data management platform, and it has been in production for more than two years.
For the technology stack, we are running on hybrid infrastructure, a mix of on-prem and cloud, a lot of Java code, some classic platform and application providers like Oracle and IBM, with cloud deployments for the newest components that we are adding.
Team size: it is around 75 on the IT side and the equivalent on the business side.
Dependencies: this is where it gets interesting. More than 70 projects or lines of business are involved at all times, ranging from integrating new data to adding actual new features. We are very popular.
So basically, we are serving only the correct information, the golden record about customer profile, and making it available through modern patterns. Fun fact: we are exposing many, probably the most, endpoints in the NBC ecosystem.
So why am I speaking again? Well, it is about taking and sharing a snapshot of our culture. And, of course, brainstorming with leaders, with all of you, while still transforming. And we will never stop.
So we are sharing ideas and transformation stories. The topics are not necessarily coming from major failures, but where we put some extra effort.
Here are the topics that we want to share with you today: onboarding your employees, category of work, governance committee, experimentation. Those are a mix of ideas, observations, use cases, topics, or simply experiences that we find interesting to share with all of you. And especially, we want some feedback. Note that I will navigate between the team, the program, and the global IT organization.
We often say that it is all about the quality of the integration. Since it is true for our systems, our team member onboarding process is following the same rule. So let's review.
For a specific role, typically an individual contributor, we are of course doing the classic welcome breakfast with the squad, sharing documentation, weekly links, recorded knowledge sessions, and having a mentor to help for the first two weeks. And we keep focusing on the experience, a bit like DevX, developer experience, but not only for developers, for all our roles.
Also helping for the general onboarding process, we have team meetings and communities of practice, all within our program. Spanning our program, all roles have communities of practice at the NBC level, making it more fluid to exchange ideas and be exposed to new use cases.
So what about metrics? We are measuring our onboarding with the following: time to first deliverable. It could be a commit, a design, a requirement, a use case. Quality, so bugs, number of rework. It is very specific to a role. And also the fun factor, through surveys and feedback.
Then we wanted to simplify their workflow onboarding, in the sense of how people actually work across roles within a squad. To bridge the gap between roles, within the same squad, we created a platform-specific automation framework, allowing for traceability from the business requirement up until the integration test, bridging the gap between the business requirements to the software deliverable, and then measuring it once it is placed.
The feedback loop did increase naturally, and it was a collaborative effort within the squad. So in summary, starting with a tool that then helped processes, and ultimately was about bringing the squad together. That was a great side effect, don't you think?
Before talking about the second topic, it is interesting to know that at the program level, we have long-lasting squads with fixed capacity, evolving towards 'you build it, you run it,' a super classic DevOps evolution.
Until a year ago, we had a challenge to progress at the right pace in our transformation initiative. We needed a way to be more systematic about working on the evolution of our platform and to protect a certain capacity. It was very variable, per sprint and by PI, product increment.
The solution was to add a notion of category of work. It could be business, integration, app, SDLC, measured in ratio of capacity. Then we spent some time to categorize our initiatives and of course explain and promote this approach with the full team, taking the time to change management.
Then the fun began. We negotiated the actual ratios by category for each sprint with RPOs, having the right and sometimes hard conversation about squad-level capacity and also program-level expected progress in all those categories of initiatives.
That helped us drive the conversation with the business scope, budget, and prioritization. And we are now reviewing those metrics at the beginning and the end of each iteration.
So in summary, those newly added categories help everyone to have the same mental model about our capacity and scope. Plus, it helps set the expectation on the planning level.
For this third topic, we are talking about a governance committee. The scope of this community was for all IT at the NBC level.
The word governance is important here because the goal was not to replace the natural or organically existing communities, chapters, or groups, but to align and possibly rationalize tools and processes, having a positive impact on DevX, developer experience.
So here is some context. For many years, many communities of practice existed and a lot was created at different levels of the organization, for example DevOps chapters by delivery towers, community by role, et cetera. Starting in 2019, I was involved in the first iteration of this governance group.
The typical pushback that we got was: not sure why we need that kind of committee.
We tried many variations of tool, content, and scope for this community. For example: invite-only voluntary participation, one-way broadcast, Yammer channel, posts and surveys, contests, technical complication, self-assessment, expert on it.
Long story short, after many iterations, the committee is now in place and having an impact, focusing on the mindset, processes, and tooling.
Here are some lessons learned while creating this community, this committee. Going back to the basics: mission and terminology to ensure common understanding. It needed a dedicated permanent core group, because ad hoc and best effort is not enough. Use industry standards to fast-track and then personalize, avoiding reinventing the wheel and less debate. Executive sponsorship was required, at least at the beginning, to get representation from all departments.
And then lastly and most importantly: focus on common and most popular blocking issues to rally everyone around it. And remember, it takes time to create momentum for that kind of governance community, but it will get easier. Think about the flywheel effect.
The fourth topic is about experimentation as a leader.
Like many of you, I had the opportunity to help different groups within the same organization. The help comes sometimes in the form of past experience: good call, prior success, knowledge of the organization context. But other times it is in the form of, let's try this as a team and adapt based on the results.
When I did present in 2019, I did not realize at the time that I was experimenting a lot because of multiple factors. Here are some of them: size of the team, around 75; the low coupling of our asset in the bank ecosystem, it was a backend of a backend; executive trust.
And in this period, there was a mix of experiments. For example: creating a new pipeline to spin up data environments; new squad structure; merging and redefining roles; new feedback processes; taking the time to talk about failures with the wall of fail. And in retrospect, it was quick to implement, even with the famous corporate taxes, of course.
Team size, employee-consultant ratio, prioritization process, line-of-business impact, and team maturity greatly impact the speed and quantity of those experimentations.
Here are some recent use cases in my current group. And remember, I am not alone in those use cases.
Creating a vision and a roadmap for new tools and then putting it in action with a POC or POV kind of value. Reviewing the format for individual and team follow-ups: the frequency, structure, planned versus ad hoc versus check-in. Diversity of the team through recruitment, important not as a buzzword, but as an accelerator to help drive the change and also be challenged. That is very important.
Hackathon or team tournament to develop the team's mindset, learn to solve unplanned challenges as a team quickly. Reviewing with the team expected target SLO, SLA, KPIs, and flow time, adding or removing to simplify. Measuring and reviewing data quality, same as code quality for software components, because code and data are typically coupled.
Communicating the team's accomplishments and creativity outside of our department. Also in the context of experimentation, being an early adopter of new product, out of the box or internally developed, requiring often inner sourcing or entrepreneurship.
It is still hard and still has hidden cost. And it is still a good deal because of the impact and team motivation. Remember to fail fast and learn fast.
So here is my summary. As a leader, we have a responsibility to experiment in those three classic spheres: roles and structure, processes, and technology. Please do so while mentoring your team.
So I am ending with one simple question, very specific to data platform teams but could be applied to your context. Have you ever integrated, merged, data quality teams and software development teams? If so, why and how? What was the result?
So let's continue the conversation. Thank you again for your time.