DevOps during COVID-19
COVID-19 is not just a centennial public health and economic crisis. It is also a unique sudden experiment in digital transformation. The way we work will be different forever.
What previously appeared to be niche differences in DevOps performance now have significant impact.
Sam reviews three months of research since the quarantine and identifies factors that differentiate teams that survive from those that thrive.
Chapters
Full transcript
The complete talk, organized by section.
Sam Guckenheimer
Thank you for coming back.
Yesterday in the plenary session, I chatted with Gene about what we've learned so far around DevOps, COVID, and the future of work.
My goal in this session is to dive more deeply into the data.
So I'll spend some time on the research data.
I'll spend more on what we know about stress during the COVID pandemic.
I'll isolate what we know about working from home from COVID by looking at pre-COVID work-from-home research, and then I'll look at recent studies that compare from pre-COVID epidemic to during the lockdown.
Then I'll talk about what we observe on working effectively, what we observe on applying DevOps more effectively, and what we should expect going forward into the new normal.
So first, let's start with what the data indicate.
And since we're meeting virtually in London, data are plural, and they indicate that developers are doing more.
This is a surprise to many of us.
This is a surprise to me.
I expected to see that when we went into lockdown from COVID, we would see a drop in developer productivity.
We don't find it.
I'm going to share two data sets with you.
They're completely independent, and they're both big.
So this one is from Microsoft internal.
It's data out of the One Engineering System, which is built on Azure DevOps, where we can measure across all Microsoft engineering levels of activity.
And we find that we look at pull requests year over year, they're up in 2020 relative to 2019.
And if we look at this black line here, we see that when we closed the office and made everyone go home mandatorily across the company at the beginning of March, there was no visible change in pull request volume.
We tried to drill further by looking at the data week over week, so in other words, comparing Monday to Monday, Tuesday to Tuesday, and so forth.
And each of these colored bands is a different week, with magenta being March 8th.
By then, everyone was fully working from home.
The offices were completely shut.
So what did we see?
We saw that people were still working quite a bit.
In fact, they were working longer days.
It appears that engineers were starting their day earlier.
They were finishing their days later.
We didn't see the usual midday dip of folks going to lunch as they do when they're in the office and the cafes are open for certain hours.
We didn't see a strong mid-afternoon break that we typically would see in prior weeks.
So it looked like activity was there with longer days, some concern about the implication of those longer days.
Now let's flip to the GitHub data set.
This is data from github.com.
Here we compared 2020 again to 2019, looking at the amount of time for each contributor between the first push of the day and the last push of the day.
So in other words, if you make your first push of the day at 10:00 in the morning and your last push at 5:00 in the afternoon, that's a seven-hour day.
If you start at 9:00 and go to 6:00, that's an 11-hour day.
And we compared the length of those days from 2020 to 2019.
We saw that they're longer in 2020, and that it appears they go up in March, and it appears that they're continuing to be longer as we go further.
This data was put together by my friend and colleague Nicole Forsgren with her associates and published in the Octoverse Pulse Report recently.
We also see a big difference if we look at the volume of code pushes, 2020 versus 2019.
And here, what you see if you look at the beginning of the year, January, February, is you see a whole bunch of random noise in terms of how much code is getting pushed.
But then as we go into March and people are going into lockdown, we suddenly see a big rise in code pushes as more work is getting delivered, more work during the lockdown period.
Now, if we compare pull requests, the requests for changes year over year, 2020 to 2019.
We see that 2020 is a little higher than 2019.
We don't see any particular difference before or after lockdown.
However, if we then look at the cycle time that it takes for those pull requests to get approved and that code to get merged, we suddenly see in March, a big drop in the wait time for pull request approval.
In other words, cycle time is improving.
And we see this in two cohorts.
We see this both for the enterprise cloud accounts, and we see it for the other paid accounts, the smaller team accounts on GitHub.
We also see that open source is climbing radically after lockdown.
The number of new open source projects being created is going way up.
And we also saw, I didn't include the graph, that there are a number of open source projects where activity is going up considerably.
So all of these are indicators of people doing more during the lockdown period.
Doing more with less.
How's it possible?
We all know that we're stressed out during this pandemic.
We know that it's the worst health crisis in a century.
We know that we're going into the worst economic crisis in 80 years.
We've seen in most of our countries, a level of civil protest for social justice, racial justice, control of immoral policing activities, control of systemic problems.
And we've seen the counter-protests under false flags that have frequently turned these into violent confrontations.
And that's happening globally in places like London.
We know that's happening, and we know that even without all of that, just from the pandemic, we would see a lasting mental health crisis.
We saw that in The Lancet review article that looked at previous quarantines from endemics, and that mental health crisis will be here for years afterwards.
We also know that we're in an industry where burnout is a second less talked about pandemic.
Dr.
Christina Maslach at the last summit went through all of the ill effects of burnout on our lives and how much of a toll those burnout symptoms take.
Burnout's getting worse now, compounded with the other stress.
Harvard Business Review last month summarized this well, calling it war room fatigue that we're all feeling.
So why aren't we seeing a broader problem than we are?
Well, I wanted to step outside the pandemic and look at what we know about work from home independent of the pandemic.
And indeed, there's a fantastic study of a large sample randomized clinical trial of work from home.
This was done by James Liang, who's founder and chairman of Ctrip, a public company with 40,000 employees, China's largest travel agency.
And he did this in conjunction with Professor Nicholas Bloom from Stanford and a few associates as co-authors.
It's a beautifully documented, publicly available paper.
Liang was wondering, to support the company's ongoing growth, should they keep investing in all of this expensive office space?
And he listened to workers who were saying, "We'd really rather be at home.
We don't like our commutes." So he said, "Who wants to volunteer to work from home?" And he got more than 1,000 volunteers, and then used an independent random variable, their birthday, to separate them into two cohorts.
So if you had a birthday where the date was one, three, five, and so forth, odd, you would be in a control group.
If you had a birthday that was even, two, four, six, eight, and so forth, you would be in the treatment.
And the treatment group would get to work from home.
Now, remember, these are all volunteers who said they wanted to work from home.
And the ones who worked from home had good working conditions.
They had an ergonomic workspace.
They had decent internet.
They had decent equipment.
They had freedom from distraction at home.
What Ctrip found was, over the nine months of the trial, a 13% improvement in output.
This came somewhat, 3.5% of the 13, from the at-home employees taking more calls, and 9.5% of it came from their improved punctuality.
The bus didn't break down on the commute.
They didn't have to deal with something on the way to work or run an errand and get delayed.
So, the overall impact was pretty astonishing.
So, at the end of the trial-- Ah!
Before I get there, so there was one other interesting thing about the working from home.
Attrition dropped by 50%.
Employees who were working from home were half as likely to leave the company, twice as likely to stay.
Now, there's also one other effect, which is hard to discern.
They were less likely to get promoted.
This could be because they declined promotions, saying, "Hey, I like working from home.
And if I take that offer, I need to go back to the office." It could be because they were overlooked.
It's probably a combination of those.
It is a red flag for how you work effectively as an organization in the new hybrid world.
After the nine-month trial, and the success, Liang opened the option to work from home to all employees.
Now, some interesting things happened.
There are three groups.
The treatment group, the red diamonds here on top, those are the ones who did work from home during the trial period.
Several of them said, "We want to go back to the office.
It feels too isolated." So, they exercised that choice.
Of the control group, these blue pluses, several of them, most of them said, "Yeah, we want to work from home now.
We missed out before." And then of the ones who had not volunteered to work from home, the green ones, a large number of them said, "Yeah, we see what everyone else is doing, and we'd like to do that too." So now, that's an option for everyone at Ctrip.
Profits for the company went up $2,000 per employee by allowing work from home.
Ctrip rolled this out as an option to everyone in the company, and it became a great recruiting tool to get more of the self-motivated employees who wanted to stick with the company.
Now, I realize this is a travel agency.
It's not extraordinarily highly skilled work.
But it is an illustration that the work from home model can serve us well.
So that was, of course, eight years ago, before any of us had heard of SARS-CoV-2 or COVID or anything like that.
Let's talk about a 2020 study that compares the period just before COVID lockdown with the period just after.
So, in other words, February of this year to April of this year.
And in this study, we look at Microsoft employees, individual contributors and managers.
And we are doing primarily a diary study here, where employees are self-reporting every day.
We find that work from home and COVID are pulling in opposite directions.
Working from home, remote work, increases focus time available to the employees, and this was clear pre-lockdown.
COVID introduces stress.
It's decreasing focus time.
We're all stressed out.
Both of these effects, we find, are larger on managers than they are on individual contributors.
Meetings are also pulling in opposite directions.
Working from home decreases the need for scheduled meetings.
COVID, in practice, has increased the scheduled meetings on calendars.
This is visible for ICs.
We don't see a statistically significant effect on managers in this particular study.
Collaboration appears to have become more difficult.
So remote work makes ICs feel collaboratively isolated.
On the one hand, they get increased focus time, more control over the work week, reduced scheduled meeting time, but they do have a harder time getting together to collaborate on the next innovative thing.
For managers, the collaboration with others isn't reduced in the same way, but the cost they pay is that they have noticeably longer work weeks.
Now, in looking at all sorts of qualitative data about working from home, there's a tremendous amount of noise for individual experience.
One of the things that stands out is that the basics we know about effective work in an office environment apply to effective work in a home environment as well.
Again, Professor Nick Bloom shared some great pictures of home working conditions.
This is one of his grad students who had to reserve time in the clothes closet and sit on the shoe rack, hunched over a laptop literally on her lap, in order to participate in an online session.
I look at this and I see chronic back pain, I see chronic stress, I see horrible ergonomic conditions, and horrible problems paying attention.
On the other hand, we do know that there are thrivers who have dedicated workspaces at home where they can focus.
They have good internet bandwidth, low latency.
I know some of my colleagues in India will say, "I'm not going to do video from home because the latency here is pretty bad at home." In the US, 25% of our population does not have broadband, and this is a problem we need to address socially.
The thrivers have ergonomic furniture, good chairs that support their backs, standing desks like I'm using now, multiple high-resolution screens so that you can participate in the video on one screen and in the chat on another.
We set up spaces that are free from interruption.
We observe schedule rituals so that we know this is family time, this is work time.
Our colleagues know this is our work time.
We take care of ourselves.
We get our exercise, we eat well, we sleep well.
We schedule breaks.
We take time off.
This is another problem.
We've seen a huge drop in vacation reporting since the lockdown.
This is something where we need to model that you still take vacation even if you can't travel.
And the thrivers use one-on-ones for social connection.
They invest in maintaining their social capital.
They invest in their connection.
They stay connected to humans.
Now, over time, this is going to be a problem as our worlds change, but we need to use what the technology gives us in order to keep our human connection.
It's also clear that online meetings are a learned competence.
The strugglers, well, they're multitasking, not really paying attention.
They're in distracting spaces.
The meetings are too long and are taking too much time.
People stress over eye contact.
So if I'm here looking at my screen, you notice I'm no longer looking at you.
I'm reading what's down there, and yet when I'm talking to you, it really is less pleasant than if I'm actually looking at you like this.
The strugglers are coping with typically high-latency networks, poor bandwidth, and there's not enough preparation in those meetings.
In contrast, the thrivers, well, they're very deliberate about turn-taking in meetings.
And well-run online meetings have very clear turn-taking, whether they're moderated by someone explicitly for that or whether the group self-moderates.
People use gestures.
In Microsoft Teams or in Zoom, whatever, they raise their hands, or they raise their hands in front of the camera so that they can join in.
While one person's presenting with video, we use chat for side conversations, just like you'd do side conversations in the meeting room.
And when we're not in the room, we use IM for quick response.
In the meeting room, the virtual one, we do check in explicitly because you can't read the room.
I can't see all of you who are listening to me right now.
But in an interactive meeting, I can check in and I can pause and see if you're with me.
We also are intentional about the breaks.
So we set up our calendars in Outlook or whatever so that a default meeting is 25 minutes of 30 with a scheduled break, or a longer meeting is 50 minutes of 60 with 10 minutes in between, and everyone knows that.
We use conferencing for intentional social connection.
We check in.
How are you doing?
How's the family doing?
Is everyone healthy?
Is everyone feeling okay?
What's going on?
We make sure everyone's got good equipment, typically two or more monitors, decent internet, decent furniture, and decent light.
And we position the camera and mic intentionally so that you can focus on me as the speaker and not be distracted by all the stuff that's going on around.
Now, on the defensive side, we also make sure that we have good security hygiene.
The World Health Organization has reported a five-fold increase in cyber attacks, and that's typical.
The attackers assume that you're distracted.
We are all distracted.
Every public system is under stress.
Phishing attacks have gone through the roof.
We need to train our people and train our machine learning algorithms to recognize phishing better and not get sucked in.
Impersonations are becoming common, including identity theft.
Someone filed an unemployment claim using my name and Social Security number that was stolen some years ago from one or another unsecured database, and they tried to claim unemployment, going to some bank account of theirs, and it was a claim that originated offshore, and that's part of a common scam.
And then ransomware is on the rise, particularly for small businesses.
And I'm going to let my dog out just so that you see I'm actually in my office, and my 15-year-old dog actually wants to go outside.
So make sure your security hygiene is good and that you're working with your cloud provider for that and you're using reasonable threat intelligence to do it.
Now I'm going to spend a few moments on DevOps and how you use DevOps for anti-fragility.
So the first thing is rip off the red tape of all those procedures.
As recently as February, we would hear from every regulated customer, "Hey, the FDA doesn't let us do that.
FFIEC doesn't let us do that.
NIST says no," blah, blah, blah, blah, blah.
Starting in March, it was, "Well, how quickly can we use the cloud to help?" A great example is telemedicine.
I get some of my care at the University of Washington Medical Center, and virtual visits were unheard of in February.
Now they're the norm.
Learn from the open source projects.
This is Visual Studio Code.
It's the most contributed to open source project on GitHub.
It's now the world's most popular IDE.
And it works by having a very rigid pipeline with continuous automated testing so that green means green and red means red, and only 100% green builds make it to release.
Successful product teams like VS Code, they'll optimize for asynchronous workflows.
They'll connect with their community continually through issues and discussions.
They'll focus on the outcomes they want to achieve, not spend time on outputs.
People who are running services, and I hope you've seen the talks from Eric and Scott here from CSG, they practice mature incident response.
If you have tech debt, get clean, get rid of it, and then use automation to stay clean.
And use the automation to shift quality left like we saw with VS Code, and shift it right like we see with CSG.
Think about any manual approvals in your process, and are they necessary?
How many can we get rid of, all the way from idea to deploy to data?
Ship to learn frequently.
The shorter you make your delivery cycle, the more validated learning you build up, and the shorter the cycle, the faster that value compounds like compound interest.
And remember, automation doesn't care where you work.
It doesn't care about the pandemics.
Show your users the status of your service, and if you have an incident that affects customers, be clear about it so that you and your customers can build up a relationship of trust.
Finally, let's think about the new normal.
The Bank of England did a study where, to quote Randy Gibson, "The future appears to be here, just not evenly distributed." And they looked across industries and saw that working from home is indeed becoming a pattern.
And as I cited in the plenary session, the MIT report found that the strongest positive financial effects were in non-tech companies.
Do it.
Make working from home healthy.
Make sure people can arrange family care for children or for elderly parents.
Make sure they've got good broadband access.
Make sure they have ergonomic furniture, and don't burden them with unnecessary ceremonies.
And if you do plan a soft return to a hybrid environment, learn from the people who've done it in a healthy way.
Atul Gawande published a great paper in "The New Yorker" about healthcare workers at hospitals using masks, hand hygiene, physical distancing, and daily screening in order to make it possible for healthcare workers to work safely, and then they let the data inform the decisions of what they do.
So embrace the new normal.
Automate to get clean and stay clean.
Don't let people do the work your machines should do.
Do let people have days to renew their social capital.
Do allow certain days to be dedicated for focus.
If you go back to a hybrid model, do work as though everyone's still remote so you don't create two classes of citizens.
Do provide employee choice.
CTrip showed that value.
Measure how it's working continuously, and recognize that in this new cadence, we need to inspect and adapt at least quarterly.
Thank you.