Architecting the Future: Driving AI/Agentic Transformation at Scale at Salesforce
The Salesforce team explores the critical need for large enterprises to embrace AI/agentic transformation and provides a practical blueprint for success, drawing lessons from Salesforce's internal applications. Shiva Nimmagadda Venkata, VP, Software Engineering, Salesforce delves into how several applications leveraged platforms like Heroku to achieve ease-of-use and scale, laying the foundation for integrating advanced AI capabilities.
This session will cover workflows and tooling around "Vibe Coding" with Cursor, an AI-first code editor, and demonstrate how it empowers developers to boost productivity and innovation. It will cover practical aspects of internal deployment, including effective change management strategies, establishing robust guardrails and governance for data security, and sharing key lessons learned. Designed for technology leaders, this talk offers actionable insights on navigating the complexities of AI/agentic transformation to architect a more productive and innovative future for your enterprise.
Chapters
Full transcript
The complete talk, organized by section.
Shiva Nimmagadda Venkata
Hello. Good afternoon, everyone. Thank you so much for coming over right after lunch. Today I am going to talk about a couple of things: AI, how at Salesforce we are using AI for developer productivity and also analytics, and how we are taking this whole agent future in Salesforce.
Since morning, all of us have been to multiple sessions. We spoke in the keynote about how AI is empowering developers and engineers to be more proactive, including talks from Palo Alto Networks, Booking.com, and others. I appreciate all of you being here. I will share what Salesforce Technology is using AI for across engineering and product.
Myself, Shiva. I am a Vice President of Engineering Excellence. I run Engineering Excellence in Salesforce Technology, primarily focusing on AI, analytics, and productivity tooling for all technology.
Forward-looking statement: I am not going to go through it deeply. If you have AI to answer this, you can use AI too.
As part of the agenda today, I am going to walk you through what Salesforce Technology's scale is inside the company. Then I am going to walk you through how we are looking at AI, and how AI is driving productivity inside the Salesforce Technology engineering org. I read a lot of quotes, and one of them is: you cannot improve what you cannot measure. We have deeper developer productivity metrics that we developed using Salesforce-on-Salesforce technologies like Agentforce, Data Cloud, MuleSoft, Heroku, and others. We will talk about how we started looking at developer productivity metrics, which has had a substantial impact in the company. Then I will share some learnings that you can take back, and then we will have Q&A.
First, Salesforce Technology org. Like most companies, Salesforce has a lot of acquisitions. Apart from the main engineering org, we have around 2,000-plus scrum teams. We cater to many developer personas. As acquisitions come in, every team has various sets of scrum processes, so everything becomes more heterogeneous. Salesforce is a global company, with developers from multiple countries and regions. We do around 200-plus releases every year, and many code changes go regularly.
With that heterogeneous nature of teams, we primarily categorize developers into three major personas. One is the huge monolith, the Salesforce core product, where developers write code for the platform. Second are microservice developers, where development and deployment are quicker and the notion of releasing is shorter, faster, and lower cost. Third, to support the entire product, we need infrastructure developers who develop, deploy, and maintain the infrastructure. That is how we try to cater developers into these three categories.
Today I am mainly focusing on three key areas. Number one is how Salesforce is looking at improving developer experience using AI, across things we build and things we leverage from external companies. Then I will walk you through how we use Agentforce to empower developer productivity with AI across the SDLC. And, as I said, you cannot improve what you cannot measure, so we will talk about measuring developer productivity.
The first area is developing with AI. This is a standard software development lifecycle at Salesforce. We turn it into three circles or loops: inner loop, outer loop, and scale loop. The inner loop starts with an idea, work-data management, and then creating code on your local machine, iterating, building, and testing locally. The outer loop is where multiple developers check into a repository, and testing and integration testing happen. Once it goes to production, getting into various regions, customers, operations, scale, and support comes into the scale loop. When we think about developer productivity, we look across all these areas.
Our approach to using AI to improve productivity focuses on key elements in the product development lifecycle or software development lifecycle: plan, code, test, review, and deploy and operate. There are more jobs to be done, but these are the key elements. If we bring AI and agent technology into those jobs to be done, we make developers more productive and in turn deliver product features more quickly to customers.
For planning, we start with work items and backlogs. In coding, developers write code, generate and refactor code, write bugs, and ask questions about the right patterns and rules. For testing, Salesforce has a huge test infrastructure because quality to customers is very important; we have unit tests, functional tests, and a large test focus. For review, although the industry has many AI coding tools for engineers like Cursor and others, the next bigger pain point we observed is code review. Once AI is writing a lot of code, the next burden comes into review and then test. We look at how engineers can get help on pull requests, summaries, and related work. Deploy and operate is another big chunk: when customers have an issue, where exactly is the issue, which infrastructure or node has the problem, how do we fix it, and is it change-induced or product-driven? The quicker operation and operate part is about speed.
At Salesforce, AI is not new. We started augmenting AI for a while. We start with augmenting, then orchestrating, then moving into autonomous. The more augmentation you have, the more it needs coding knowledge. The more autonomous it becomes, the more it goes into domain expertise. Whether you call it autonomous or ambient, this is how we are making the agent revolution in AI across development.
In planning, we have something called GUS, a work management system also built on Salesforce. We have AI built on work-data management, so you can create work by giving a prompt. For code, we have a set of tools. We built an in-house AI editor called Code Genie. We let developers experiment with Cursor, Windsurf, and Claude. For test, we have a test prioritizer. For review, we innovated something called PRISM, a pull request summarizer. In operate, we have Agentforce, which brings together the data.
These are high-level examples of how Salesforce is looking across the SDLC and bringing AI into the equation. With all this AI, what is the impact? Some standard metrics are work cycle time improvement, pull request throughput improvement, and how much code is AI-generated. We have had an AI journey in coding and work cycle time for the last two years, and primarily around coding for the last year. The numbers we discuss are over the last three years.
Are we fully mature? No. We are getting there. As company leaders in AI for developers, we need maturity models for where we are heading and what we are doing. We categorize areas as emerging, advanced, moderate, and so on, and keep pushing forward across the SDLC to get engineers to be more productive.
In GUS we have around 60,000 AI generations. You create a work item, give a prompt, and it creates a summary of what work has to be done. In coding, Code Genie is our internal AI editor, but we allow developers to use various tools. Cursor is great for new projects and Python or TypeScript. Windsurf is good for Java and IDE development. Claude Code is good for CLI. With heterogeneous teams in one big company, everyone has their own preferences: some developers like CLI, some like IDEs such as Visual Studio Code. As a company, we provide various mechanisms to be more productive.
As we shift from planning to coding, AI keeps generating a lot of code. Suggestions from AI are fast, and we observed that code velocity, number of PRs, and PR sizes are increasing. So we built AI-assisted code review, called PRISM. It takes the entire pull request with all files and gives a summary to users. It breaks down the PR into functional components, such as UI files in one section and database changes in another, making it easier to review.
After planning, coding, and review comes test. For a product like Salesforce, used by many customers, test is a key factor. Quality is paramount. Trust and quality are number one for Salesforce. We have test prioritizer, test inventory, and code analysis. We are still emerging in this area. Tests should not be an afterthought; we need to shift them left, and AI is helping us step forward.
For operate, we use Warden AIOps. When the product is deployed into various systems and regions, solving issues on the spot matters. Warden AIOps looks at how to auto-remediate when an issue happens, pinpoint which instance the issue is happening on, and fix it.
When we release product, we need to write release notes and do enablement. Enablement is a key factor. We use AI a lot in driving content creation and enablement, especially when 200-plus releases happen and we need to show customers what we are shipping.
Now I will shift toward Agentforce. We use Agentforce especially in the operate world, where there is lots of knowledge. When new people come into a team, they need time to learn, and there is a lot of knowledge in existing team members, documents, and team systems. We use Agentforce, which takes data from Data Cloud, to support that.
If I place the SDLC in a line - plan, code, deploy, and operate - we deploy agents in those SDLC phases. We are allowing developers to be more productive with AI and agents together. Around 500-plus Slack channels have Agentforce agents deployed, and almost 5,000-plus users are using them actively. Without a human, they get answers, and support questions are sorted quickly.
When we think about Agentforce agents, we primarily start with knowledge. We build a lot of knowledge into the agent. But once knowledge is there, if my team has documents, data on team members, locations, and so on, I need the agent also to be my team member. The agent can answer questions; we call that support. Then, beyond question and answer, the agent can drive actions on behalf of me or on behalf of a user. We call that ASK: Action, Support, and Knowledge. We put pretty much all Agentforce agents into these three buckets to help across the SDLC.
At a high level, the architecture starts with many Slack channels. Every team has its own Slack channel, and support channels exist too. We built Engineering Agent, an agent that helps an engineer, deployed on Heroku. We use Agentforce, Data Cloud, and MuleSoft. Using a single multiplexer of agents on Slack across these channels, we create independent focused agents. For example, in an Agentforce technical support channel, the agent should answer everything about Agentforce. Similarly, a Data Cloud support agent should know all about Data Cloud support.
Our learning is that if you put one single agent across all data, it becomes difficult. If you ask the same question, such as "what is my work item?", the context determines what information is needed. We divided that into modular agents for particular problems. With multiple agents on the right side and multiple Slack channels on the left, Engineering Agent routes based on Slack channel context and answers the user.
Agentforce does not just route agents or create agents with RAG and LLMs. Importantly, we have to ground all this knowledge with data. At Salesforce, we ingest data from Slack, Google documents, source code, work management, and pretty much every system we have into Data Cloud. Data Cloud brings in structured and unstructured data, performs ETL on top of it, and powers Agentforce.
When I am in a Slack channel asking a question about Agentforce technical support, Engineering Agent routes to the Agentforce agent, which goes down to the data ingested into Data Cloud and answers it. Knowledge is not the only thing. We have support, and we have action. For external systems, not just Salesforce - for example HR systems - MuleSoft has connectors. You can connect systems to MuleSoft and make actions through agents without writing new APIs.
With all of this, what are the key learnings? We have been on this journey for the last one or two years. Enablement is one of the things where we spent a lot of energy and time with engineers. Even after we gave Code Genie, Cursor, and other tools, some teams pushed further and some did not. Enabling them, telling success stories, and showing examples were very important.
We started something like an AI thought leader series across Technology, where every week engineers come and present what they are doing. If I go to sessions and talk, it is not as useful. Things coming from engineers - their success stories - make a huge difference in how people adopt AI. We also amplified best practices across Slack channels and allowed teams to contribute. We set rules for every pull request and repository: how to write code, what you should not do, and what you should do. Another learning is that you do not always build internally; you also use external tools like Cursor and Windsurf. That mindset shift helped developers look across all aspects of how they can be more proactive.
I initially planned for metrics to be first. All this AI we build inside the company and all these AI agents are not useful without proper measurement. We need to measure constantly, apply, change strategy, and enable.
For our chief engineering officer, and for many CIOs and CXOs we speak with, these are top-ten questions. There are questions around trust, customer success, culture, and performance. We have talked to around 40-plus CIOs and CXOs, and most resonate with these questions. These are not easy questions: which areas create more incidents? Which areas need deeper data-driven understanding?
We created something called Engineering 360. Before that, Salesforce had lots of dashboards. Everyone has tons of dashboards and fishes for data, and it is not easy to get an idea. Engineering 360 helps provide a 360-degree view. It covers what every developer needs and every leader wants: developer productivity, agile excellence, security, availability, compliance, and more.
Developers interact with many systems, from work items to code repositories. There is structured data, such as pull request activity, product usage, cost, deployments, and builds. There is also unstructured data, such as Google documents and team-member knowledge. We pump all of it into Data Cloud, which harmonizes the data, performs ETL, and provides data back to Tableau. Tableau helps visualize the metrics and structured data. Heroku systems ingest data from various systems into Data Cloud, helping us visualize many dashboards and metrics.
I will quickly show how we look at metrics for developer productivity. This is a live dashboard. It shows a 360-degree view around an engineer or a leader, with security, availability, quality, and more all in one place. There is no more fishing for data across various things. Leaders can see org views, teams can see team views, and clouds can see cloud views.
We built developer productivity primarily on SPACE metrics: satisfaction, productivity, activity, collaboration, and efficiency. There is a good document on what these things are; I believe it is a Stanford paper. Every week and every month, our leaders look at the data and analyze it constantly and rigorously.
That is how, at Salesforce, we took AI agents, brought all data together, and brought metrics into one place to help be more productive. Thank you so much for coming. If you have any questions, I will be outside, and we have sessions tomorrow where we can talk the whole day. Thank you so much.