Cloud Data Journey: Story Of Adopting Cloud Technology and Modernization of Data Pipeline
We want to share our Cloud data journey at Discover that improved and provided faster business insights. Today cloud technology has a vast scope and whenever you come to a point of understanding exactly what it is,’ To move or not move’ is the basic question that becomes a challenge. Our solution focuses on an event-based pipeline which proved to be critical to our business while creating reporting solutions. This helped to know the status of your data in time via event-based data pipelines making use of a data lake.
Chapters
Full transcript
The complete talk, organized by section.
Shivani Anand
Shivani Anand: Hello, everyone. Are you on a journey to migrate data and analytics to cloud, or planning to start one? If yes, start changing your mindsets and get excited in adopting emerging cloud technologies. We are here to share our cloud data journey with you, which is a story of adoption of cloud technology and modernization of data pipeline. Hear our expedition story from ground to cloud, which will help you in building your runway.
Discover Financial Services is a company whose mission is to help people spend smarter, manage debt better, and save more so they achieve a brighter financial future. And the vision is to be the leading digital bank and payments partner. Discover has a presence in more than 200 countries.
Before I get started, let me introduce myself. I'm Shivani Anand, senior principal solution architect at Discover Financial Services, focused on providing technology- and data-driven solutions for data and analytics, also called the D&A organization at Discover. I have diversified almost two decades of experience in D&A, helping companies by providing thought leadership and building strategies and solutions for large-scale data projects. One fact about me: lately, I've developed a huge passion for gardening. If my family does not find me inside the home, they know where I am: in the backyard.
With no further ado, I'm passing the mic over to my colleague, Prajakta, who will be walking you through the agenda of this presentation. Prajakta?
Prajakta Yerpude
Prajakta Yerpude: Thanks, Shivani. Hello, everyone. My name is Prajakta Yerpude, and I'm a senior software engineer at Discover in the cloud data products team. I have been extensively working in cloud technologies for over five years, especially implementing a variety of data ingestion pipelines, helping derive faster analytics. Apart from this, I love to do weight training, and I've been doing it for the last two years with a dedicated personal trainer.
Moving on to today's agenda, on what Shivani and I will be covering in our cloud data journey. We will be going through the very initial basics on how we started to learn to fly from ground to cloud, what challenges we came across in the whole process, and how we overcame them by these four principles that form the base of today's talk: the mighty four Cs. That is cloud technologies, collaboration with each other, changing mindsets, and continuous learning. We will be sharing some of our accomplishments and wins on how Discover is changing the way financial services are working in IT, as well as share our bit on how you can learn from our journey as a lesson learned. Finally, we will also share some of the future scope on how things are moving fast at Discover to achieve efficient results. I'll let Shivani take over from here.
Shivani Anand
Shivani Anand: Thank you, Prajakta. It's quite evident that moving data to cloud is vital for companies to stay relevant in today's competitive business landscape because of multiple factors, the key being increasing speed and quality and decreasing the cost for the business. To be a leading digital bank and payments partner, these were the driving factors that propelled us to move to cloud at Discover.
In the era of data explosion, managing volume, variety, and velocity of data on-premise is challenging and costly. For example, in order to scale your on-prem data warehouse, you have to go through the estimation process and then accordingly pay the licensing, storage, and compute cost upfront. We were in a similar situation that triggered our cloud data journey.
Our D&A flight to cloud was bumpy initially. There was some skidding and sliding in the beginning, as embracing new technologies is always a challenge, and many factors need to be taken into consideration. Cloud technology is not an exception.
The Discover culture of letting engineers fly, in combination with cloud technologies, built an environment of innovation, which contributed to the development of a factory of homegrown products, boosting our cloud data pipeline. The company has also been awarded a 2021 CIO 100 Award for its innovation in Cloud Data Fabric platform, CDF. The CDF platform helps deliver a distinct advantage to our business and improves customer experience by bringing information faster to market with higher quality and reliability.
For us, this journey has just begun, and it has the scope of making a breakthrough in how data is being ingested, managed, and used on cloud. We have reinvented the way we build our solutions and products. Being an early adopter of cloud, our use cases helped vendors in enhancing their products, best practices, and standards.
The power of cloud computing is making possibilities reality. Some of the analytical processes and machine learning models, which were not possible to run on-premise due to volume and complexity, after migrating to cloud, are providing huge value to our business partners. Let's hear the challenges we faced from Prajakta.
Prajakta Yerpude
Prajakta Yerpude: Thanks, Shivani. When we think of any new technology, we always look for challenges or bottlenecks that we have in our current tech stack. And for us, we came across the following challenges, which in my opinion everyone has had in their tech experience while thinking about why they should be moving to cloud.
First and foremost is the large amount of legacy data. Today, as we know, data gives you about everything you need at an enterprise to move things faster for the business, and this data grows in an exorbitant amount daily, which becomes hard to maintain and manage. As the data grows, not just in gigabytes, but terabytes and petabytes in a fast-paced financial company like Discover, we need to find a tech stack that is able to manage this in an efficient way.
The second thing that follows is the high storage cost for storing this humongous data. We require a lot of resources to maintain, manage, and process this data, which ultimately causes an increase in the overall cost. It is hard to keep up with the costs for resources like hardware, infrastructure, and power consumption to keep up with this ever-growing data. On-prem systems usually require a large upfront purchase, which means capital expenditure is often required. On top of that, you need to include maintenance costs to ensure support and functionality upgrades.
This brings me to my next point, and that is capacity and scalability constraints. For example, with the on-prem infrastructure needs, we sometimes need the resources to scale up or down based on the dynamic requirements. To do it manually, you need to take help from actual human resources to spin up additional infrastructure, like spinning up servers. That requires downtime, and that can affect your application performance and its delivery to the business.
We also have to maintain data centers that are quite expensive and manage all the on-prem data that usually requires specialized resources like mechanical engineers and electrical engineers, who have to take care of these data centers when required. Of course, data centers have their own advantage, and at Discover we follow a hybrid approach of data centers and cloud, where we use data centers for critical data and use the cloud for less confidential information. Because the cloud is so easily accessible and scalable, using the cloud for additional capacity might be a good solution for some organizations. You may find that certain workloads are better suited for your data center, while others run more effectively in the cloud. In the end, your flexibility, workload, and security needs will dictate whether a data center or cloud is the best fit for your organization.
Talking about on-prem solutions, due to the above challenging factors, whenever we go for a deployment, it takes quite a good amount of downtime that can impact certain business applications. Also, eventually, since a lot of resources are involved in taking care of the infrastructure and data applications as a whole, it causes a lot of dependencies between teams that can delay your overall timelines. Our journey started with all of these points taken into account, and Shivani will talk about what exactly worked for us in the next slide.
Shivani Anand
Shivani Anand: Thank you, Prajakta. The amalgamation of the four Cs, cloud technologies, collaboration, changing mindsets, and continuous learning, has been the driving factor of our success in moving to cloud. Cloud technology, of course, is fundamental to moving to cloud. But without true collaboration with partners, both on the side of business and technology, you cannot fully adopt and implement cloud solutions. Successful partnership starts with educating your partners and gaining understanding of their processes and building understanding of their processes.
Because building any cloud solution or moving to cloud is a digital transformation journey that starts with changing mindsets of every person in the enterprise. This is a very critical step, as the implementation and usage of cloud technologies today are different from the traditional way of doing data warehousing and analytics. To be successful in this climate, one has to be a lifelong learner. As we all know, technology is changing at a rapid pace. Continuous learning is the mantra for staying ahead in the game and handling unprecedented challenges. Prajakta now will expand on the first C.
Prajakta Yerpude
Prajakta Yerpude: Thanks, Shivani. Let's begin with our base for today's session on our four Cs, and I'll be talking about how some of the challenges that I explained earlier could be resolved by the cloud technologies that are out there today.
First of all, with the cloud-based subscription model, there's no need to purchase any additional infrastructure or licenses, as explained by Shivani. In exchange for an annual fee, a cloud provider maintains servers, network, and software for you. This gives a big advantage of flexibility to use software, platform, and infrastructure as a service anytime when we need. In our journey, it provided numerous advantages to employees by greatly reducing the time and money spent on tedious tasks such as installing, managing, and upgrading the software. This also helped in rapid development and deployment of applications on cloud and ultimately helped reach our business goals and satisfy customers.
For example, in our case, whenever we required adding a new server to a data center, it took two to three months to completely set it up and get it running, and that had quite a good amount of downtime, causing delays. This prompted us to look for a data solution that provided flexibility in scaling up resources.
Talking about performance, I would like to quote the basics here, and that is: cloud computing is simply the use of large-scale computer networks, and it is the use of network-hosted servers to do several tasks like storage, processing, and management of data. In this cloud computing environment, we have multiple network load balancers that distribute workloads and compute resources. This load balancing allowed our users to manage our application and workload demands by allocating resources among multiple computers, networks, or servers. Guess what? We didn't have to explicitly do it on our own. It was all taken care of by the cloud.
For example, we have ETL pipelines today at Discover that scale up or down, and that saved a lot of cost for us. One more example is the Snowflake compute resources. Today, we spin up additional compute resources to perform ETL operations automatically, saving us time and cost.
With the increasing need for more storage, one big benefit of the cloud is that the transparent infrastructure can be extended when needed. The scalability of the cloud allows your organization to add or reduce capacity as your needs change. This took away a lot of our effort in keeping up with the growing data, which is now in petabytes at Discover.
The best advantage of cloud is that most cloud computing services are pay-as-you-go. This means that if you don't take advantage of what cloud has to offer, then at least you won't have to be dropping money on it. For example, the pay-as-you-go system also applies to the data storage space needed to service your stakeholders and clients, which means that you'll get exactly as much space as you need and not be charged for any space that you don't. Taken together, these factors result in lower costs and higher returns. I'll let Shivani take over to the next C.
Shivani Anand
Shivani Anand: Thank you, Prajakta. Collaboration, the second mighty C. As we have already mentioned, without true collaboration with partners on both the business and technology end, companies cannot fully adopt and use the cloud platform. Let me share one of our use cases to emphasize the importance of collaboration.
At the very initial stage of our data migration journey, we focused on the technical aspect of moving data faster and securely to the cloud, without much engagement with our business partners. Once the data was available on cloud, some of the important analytics and machine learning processes were not working as expected. Why? Due to the fact that the sensitive data sets were tokenized or masked, and there was no solution available for the business or data science team to securely de-tokenize and use these values, which was very important for their processes. As a lesson learned, we changed our way of working and started engaging our business partners throughout the journey, which has been immensely helpful in driving the cloud data migration initiative.
Similarly, partnering with our cross-functional technology teams helps us overcome blockers and build reliable, secure, and cost-effective enterprise solutions. Most of the blockers are due to the lack of understanding of new technologies. One of the blockers we faced was the inability to load wide tables in cloud, as the cloud data warehouse, which is Snowflake, for better performance, has a soft limit on the maximum number of columns per table. Joint team effort by the cross-functional technology experts not only resolved the issue, but created a solution which helped faster data onboarding for multiple teams.
For true collaboration to work, we built small, autonomous, cross-functional teams working shoulder to shoulder with shared goals and objectives. Each team member has a clear definition and common understanding of terminology, processes, metrics, and goals. In a nutshell, leveraging the best of each technology to build an award-winning product can only be possible with true partnership. The famous Michael Jordan quote, "Talent wins games, but teamwork and intelligence win championships," seems fitting in this context.
Let's move on to the next C, which is changing mindsets. Cloud computing is a disruptive technology, and in order to adopt it, every individual in an enterprise has to go through the exercise of changing mindsets. I reiterate: cloud computing is very different from on-premise data warehousing. For instance, you have all heard it before: cloud computing is based on the model of pay-as-you-go. The allocation and scalability of the resources are on the fly. No need to splash huge investment upfront, and so on.
Our engineers embrace this flexibility of the cloud and work like a startup by quickly prototyping innovative ideas, leading to enterprise-ready products, which are the key components of our cloud data pipeline. Trust is built in the teams. They are encouraged to test innovative and imperfect prototypes. In an agile approach, it's always good to fail fast. This helps in getting the expected outcome faster than spending time on building a perfect solution. Daily problem-solving hurdles in the cross-functional autonomous teams fuel moving quickly and efficiently. Engineers are given more authority to make decisions and less dependency on hierarchies. Changing mindsets of business, product owners, and so forth is helping change mindsets of engineers, leading to product growth by building reusable and high-quality solutions. Engineers are thinking out of the box and embedding extreme automation in their designs, and being creative by branding these outputs. There is unwillingness from teams to accept the status quo.
On the flip side, lack of understanding and shift in the mindset incurs unexpected cost and impairs the cycle time. For instance, during migration, some processes had SQLs built in and adopted the lift-and-shift approach of migrating data to cloud. These SQLs were designed for on-premise data warehousing and not for cloud computing. When the same processes ran in cloud, guess what happened? The reverse of what was expected. The cycle time increased, which elevated the compute cost and defeated the whole goal of cloud migration.
Clearly, it's not just about getting there, but also about how we get there. Cloud is not just where you run and store your data, but it is how we now work, which is like making extreme automation. Reliability and availability is the way we deliver every product or solution. In cloud, it's the joint responsibility of all parties involved to efficiently use these technologies and solutions to achieve the common goal of gaining speed and quality at an optimal cost. Let's hear from Prajakta the last C, which is continuous learning.
Prajakta Yerpude
Prajakta Yerpude: As Shivani talked about collaboration and changing mindsets, this is followed by continuous learning. And yes, with the ever-growing technology stack, it's very important to keep a learning curve in an organization to implement innovative cloud data solutions.
I definitely agree with what William Pollard, a scholar in the 1800s, quoted, saying that learning and innovation go hand in hand. The arrogance of success is to think that what you did yesterday will be sufficient for tomorrow. At Discover, for example, in my team of cloud data products, all the engineers spend a good amount of time in experimentation and educating themselves on the new technologies that can improve our current processes.
We give key priority to learning new things, where innovation is aimed not only at totally new, unique products, but the great bulk of innovation is aimed at simply improving existing products, delivering them more efficiently. That's exactly what we are learning in our cloud journey, where every day we are enabling our business users with faster data analytics solutions. Discover provides multiple platforms to the employees within the company with opportunities to learn about existing and upcoming technologies, which tremendously helps in bringing innovations in our products.
Let's move on to where we are in our cloud data journey. As we discussed how these four Cs were involved in our successful journey, I would like to mention some of our wins at Discover that caused innovation, improving our business experience holistically.
The crux of any cloud data journey is a pipeline whose job is not just to move your data from one source to a destination, but it involves a lot of steps like extraction, transformation, and eventually loading of the data. At Discover, we are leveraging AWS Cloud and have created products around its services, and are using Snowflake as our destination warehouse. For example, we have a solution called Universal Data Loader, which is an event-driven pipeline that automates data ingestion on cloud to move data from an S3 data lake to its corresponding Snowflake table.
Our journey started with using on-prem ETL solutions to move data to cloud, and then we moved on to a complete event-based data ingestion pipeline. To design such a pipeline, we decided to make use of AWS components like Lambda function to perform operations, DynamoDB to store these events, EC2 for computing complex SQL queries, SNS and SQS to store messages, and so on. We have dedicated engineering platform teams consisting of software engineers, data engineers, and DevOps engineers that handle the development and deployment of these pipelines at Discover, where collaboration happens at its best, and who decide its features every quarter according to the business requirements.
With all the above innovations that are being made within the company, Discover is already said to be one of the top financial services companies to be a true enterprise data-centric organization. I'll let Shivani take over from here.
Shivani Anand
Shivani Anand: Thank you, Prajakta. So what have we learned so far from this journey? Definitely, in most cases, it is not lift and shift. As we have mentioned before, cloud technologies are different from on-premise solutions. Take, for example, the lift-and-shift SQL that I mentioned earlier. Instead of reducing the processing cost and cycle time, it had the opposite effect. It is essential to put upfront effort on learning and understanding of this technology before adoption.
Engage your partners from business and technology to understand the dependencies and the enablement of all the features, which will not only help in successfully onboarding the data on cloud, but also, which is equivalently important, easier adoption by business partners. The recommended approach [caption drop: simple and complex business use case / business / and in parallel / critical] and worked on. It's important to train your partners on cloud adoption. The sooner you adopt this technology, the easier it will be to sunset the legacy processes, leading to a significant cost reduction.
Last but not the least, the whole process is not set and forget. It's important to keep monitoring and collecting vital statistics for continuous improvement related to cost, security, efficiency, and reliability. Key takeaways for you from our lesson learned are to spend upfront time understanding your use cases and technologies, as it's a continuous journey of how efficiently we fly.
To conclude, storing and managing data on cloud and adoption of cloud data for multiple use cases like next-generation machine learning models, artificial intelligence, and analytics is a journey. We started our journey with the four mighty Cs, and we have continued it by embracing and enhancing cloud technologies, extending collaboration with our business partners, encouraging teams to change mindsets regarding new ways of working, and lastly, by expanding continuous learning culture.
To quote our CIO, Aamir: "This is a journey and requires a culture of continuous learning and discontent with the status quo." Thanks for hearing our story. Hope this helps in defining part of your cloud journey. Last note, Discover is hiring talented engineers across domains, including cybersecurity, data, DevOps, infrastructure, and software. To learn more on how technology is driving business at Discover, please check out the link shared by Prajakta on the Slack chat. Thank you, and stay safe.
Prajakta Yerpude: Thank you.