Is Your Gatekeeper Locking Out Quality?

Log in to watch

London 2016

Is Your Gatekeeper Locking Out Quality?

Releasing software is scary. As a tester the fear of breaking things comes with the added pressure that you were responsible for the testing. Testers and people working with testers often expect that testing alone is the determiner of quality releases. When the product doesn’t meet expectations more safely checks get added to the testing and release process. Over time testers can easily become gatekeepers as they try to protect users from poor quality products.

Unfortunately quality is about far more than just testing.

In this experience report, Amy Phillips, with your help, will explore what quality is and look at the ways that Continuous Delivery can help deliver it. A look back over the last 6 years of Songkick development processes will show how removing gatekeepers allowed Songkick to unlock their release cycle and start shipping quality features multiple times a day.

Chapters

Full transcript

The complete talk, organized by section.

Amy Phillips

One of my main dislikes, I think, of DevOps is the name. A lot of people start thinking, "DevOps, okay, we're talking about developers and ops." Which is great. We should be talking about developers and ops, but there's lots of other people involved. Testing is something which I think we just take for granted. So I'm hoping to shake up a little bit the way you think about testing and the way you see it fitting into your DevOps approaches.

I'm Amy Phillips. As you've heard a little bit, Head of Delivery at Songkick. Songkick, we've been around for about nine years now. We're there to help you find out about shows. We are the world's largest independent music ticket provider. We exist as mobile and web, and the idea here is we're able to alert you to shows taking place near you. Hopefully we can do that early enough that you find out about tickets, buy tickets, preferably through Songkick, and then get to go to shows that you want to go to.

Last year we merged with CrowdSurge. CrowdSurge have also been around for a similar length of time. CrowdSurge, big ticketing company, mostly focused on presales. The great thing about presales is it's much more about getting tickets to fans. We're targeting the people who are the artists' fans and getting tickets to them.

We finished off last year working with Adele and ticketing Adele's UK, European, and US tour. Huge number of tickets. For those of you who know something about buying tickets, you'll understand that the thing with music tickets, or any tickets, is when they go on sale, you get this massive spike in traffic. Everybody arrives at your site to buy tickets within the same couple of minutes.

So Songkick may be not thought of as sort of as big as some of these enterprise companies, but certainly the work we're doing to get these tickets out to fans is still pretty significant. One of the reasons Adele was working with us was because we want to get these tickets to fans. We're actively trying to prevent scalpers from taking these tickets, putting them up on secondaries, ridiculous prices. We're doing it not only trying to handle the load and get you the best ticket-buying experience, but also actively identifying scalpers and preventing them from getting tickets.

Our platforms, obviously, we just take for granted that we have the technology to be able to do this. So lots of work going on there from a relatively small company to make these big things happen.

But this is a story about moving to frequent releases. What we found at Songkick, we started this journey many years ago, and it's ongoing. We probably never reach the end. Things get better, but they're never finished. You start a new project, you have a new product, you have to put these things in place again. It's not just a one-size-fits-all and then you're done.

The reason I want to talk to you about this is because what we found, we had a number of problems, and it was quite difficult to fix them and get to a good place. But from speaking to other people, these aren't unique problems. Even though our company is unique, our team is unique, they're not different. They're probably quite familiar to a lot of you here today.

The problem really is releases. We're trying to get quality products out to users, and usually we have a deadline. Often that deadline isn't that far away. We might actually only have limited amounts of time to ship this thing. It might be for different reasons, but usually we're committed to delivering a product within a certain amount of time. Often we have to say upfront when we will ship that product. Maybe we're working with external companies, or maybe even internal, like marketing or someone else is relying on us to ship. So it becomes quite high pressure.

Probably the biggest problem is these things are often infrequent. If you're only shipping every six months or once a year, you probably have people in your team who have never shipped products with you. So it's new and it's uncertain. These things make us anxious, which obviously, the more anxious you get, the more stressed. Everybody starts to really dislike releases. It doesn't really have to be that way.

A typical release is we generally have somebody, developers, a developer, maybe a number, writing some code, and then they maybe hand it over to somebody else. So perhaps the build manager. Somebody else usually compiles a release candidate. That gets handed to the test team, and they do a whole load of testing, maybe for quite a long time.

But what they tend to give you back at this point is a load of bug reports. Then someone else needs to make a decision. What do we do with these bug reports? Do we actually fix them before we release, or do we just go ahead?

Now, we don't usually fix all the bugs before we ship. That's quite unusual. So that means somebody else needs to come in. Somebody needs to agree that we're shipping with these known bugs. Usually we make someone sign off. "Okay, well, you said you were fine with these five bugs, so sign this piece of paper, and it's your fault if you change your mind later."

Then finally, we give it to somebody else, and they push the code to production.

Now, it all sounds quite straightforward. It looks quite neat when you write it out like this. But the problem is that what happens if those bugs, we don't agree, and we find these bugs and we're actually, "No, we don't want to ship that bug"? We have to go back, and we do more coding, and then we pass it to somebody else, and we do another build. It goes on, and it gets increasingly messy.

We've got lots of people involved in this process. So how do we coordinate them? How do they know what to expect and when to expect it? You get lots of different things going on, and we end up building up this pipeline of release process, where there's lots of sign-offs and people are, "Okay, well, you have to meet these 10 exit criteria before I'm willing to take this build off you." It gets slower and slower.

But the biggest problem is, does this actually help us deliver quality? We fool ourselves with thinking that we've got this big, long process with lots of people involved and lots of sign-offs and lots of process. But is that actually making things better?

Hands up, who thinks LinkedIn's website is a quality website? From a technical point of view.

Right. Okay, great. So about half of you. But yeah, it's interesting, right? Who thinks it isn't a quality website?

Okay. So there's definitely some of you who didn't say yes or no, and that's because it depends, right? This is what's so difficult about quality, because everybody has their own internal measure of quality. But it's quite hard to actually... If somebody stops you and says, "What is quality?" it's hard to define. But internally, when you look at something, we have our way of judging it. When we see a product, or we interact with a website, or we use an app, we are making a judgment: is this quality or not based on my personal measure of quality?

I asked the same question out on Twitter. These were the results. Now, I should say, the majority of my network are testers, which gives this a bit of a slant. But it's really interesting because it's really not clear-cut, and this is one of the problems we have.

When we start creating these sign-offs, particularly when we start handing things over to a test team, and then they hand, "Okay, these are bugs," and they hand it back to maybe a business person, we're using these different measures of quality. So it seems that a lot of people don't think LinkedIn is a quality website, and then quite a lot do.

It'd be really interesting to know, I'm making an assumption that actually all of these people who said yes or no have a LinkedIn account. I didn't come across anybody that said, "No, so I have deleted my account," which I thought was an interesting point.

So I asked people what they thought about it, and here are some of the comments. They were fascinating comments. I'm going to put a blog post up with more detail, but what was really interesting was the things that people were saying were just so wildly varying. Some people thought it was great. There were no glitches. Some people think it's just really buggy.

But what's really interesting to me is we're talking a bit about bugs, but we're also talking about the UX. We're talking about usability. We're talking about how many features are on the website. So we start to see quality is so much broader than the number of bugs in the thing you built. That's one part, certainly. But if LinkedIn was bug-free, there would still be people that don't think it's a quality website.

Now, I should add, the reason I've used LinkedIn as my example is LinkedIn's a fascinating website from my point of view, because very widely used. The majority of people, I think, have a LinkedIn account. Serves an absolute great purpose creating professional networks, and yet within the testing industry, it's quite famous for the number of bugs on LinkedIn. Lots of discussions amongst testers of, "Do LinkedIn have any testers in that company? How can they dare get this product out there?"

That's actually probably a bit unfair, because it serves a purpose to some people. So testing, we start to see, is a little bit more complicated. Quality is a little bit more complicated.

Just to give you a definition, because quality is hard to define. This is the definition I tend to use. Jerry Weinberg has some fantastic things out there, great books. If you haven't read his stuff, then recommend that. But nice definition: value to some person who matters.

Now, lots of people matter, right? We've got end users matter, testers matter, but so do developers. Is this code maintainable? Are they happy working on this code? Are the testers happy testing this product? Are the users able to solve problems that they want solving?

So we've gone quite a long way away from testing here. Elisabeth Hendrickson has the definition I like of testing. Testing, we're talking about, it's a retrospective action. We are taking something that exists, and we're looking at it, and we're deciding, does it meet the requirements that were set out for this thing? We're checking against requirements. We're checking against maybe we have style guides, or maybe we have to meet standards or things like that within our company.

We're also exploring it. Are there things that are not intended to be there that we didn't anticipate were there? Then we can make a judgment. But generally, what we're coming out with at the end of this, as I say, is bug reports. Maybe we're coming out and saying, "We've built this thing and it's completely unusable." But it's very retrospective. It's quite late in the day to decide that you've built this entire system, it's not that usable. Maybe go back to the beginning and spend another month having another shot. It would be much better if we could actually do that differently.

So most important thing to take away from that section is quality is not actually that dependent on the number of bugs you have. We don't want the buggiest product in the world. People are expecting it to work. But certainly, it's a little bit more complicated than that.

Now, when we have a release process, which is more gatekeeper level, so we have different people in there, and we're handing off our work to another person in another department, there's lots of questions that come up. You can't just be autonomous in your work. You need to ask people, "Can I commit my code now?" You go to the test team: "Can I deploy this build to the test environment? Is it a good time for you?" Then almost always somebody ends up asking, "Can I release? Is now a good time? Is it going to have an impact on production or not?"

Actually, there's a load more voices than that. It's much more complicated. There are people, "Did you break the build? What's this bug I found? This thing doesn't even work. Who's signing off on this?" There are lots of other voices involved, and that's where our process becomes so much more complicated. That's where things slow down, because what if one of those questions is unanswered? What do we do? Who's signing off, and where do we find that person, and how long will it take to resolve this thing? As we drift in, delays go on, and this is where you end up with these release cycles that take weeks or months. It's from little things like this.

Stepping back, what are we trying to achieve with releases? We want to release when we want to. So as a team, we decide, "Okay, I have a product. I want to get it to production." We should be able to get it to production. It should be quality. We want users to actually like the products that we're giving them. We want them to be able to achieve goals that they want to achieve.

Really, I think as people working inside these companies, we just don't want the unexpected stuff. It's when it's the unexpected downtime on production, the unexpected bugs, the unhappy users contacting support. That's what's hard for us to deal with. There's a lot going on there.

I think this is really common. We have people from different teams, and we're all trying to come together and do this stuff. We'll tend to find, "Okay, well, we had these requirements. Does it meet these requirements?" When it doesn't go quite right, how often do we end up with somebody else, often outside of our product teams, telling us, "Okay, next time, make sure that your release, you never do that again. You're never going to take the site down." And of course, if you've got any technical limitations, then releases become even more difficult.

A few years ago, when Songkick started this transformation, I was our gatekeeper, and it was my job to basically say no. As a tester, you just get used to saying no quite a lot. It would be, "Can I commit this code?" "No, because we're still doing testing." "Can I deploy it to the test environment?" "No, because I'm still testing the previous thing." "Can I release it?" "No, because I'm still testing the previous thing." It went on. People got increasingly frustrated, and we started trying to look at a better way.

Just to give you a bit more context on the Songkick technology team. As I mentioned, we merged with CrowdSurge last year. On the left, you can see these are our people and our roles in our tech teams. We're a pretty small technology team. We are cross-functional. We're using an agile approach, so we have four product teams at the moment. We're generally trying to use service-oriented architecture where we can. We have legacy code. We have lots of different programming languages going on. We're not all based on the same site. We're not all internal to Songkick.

The Scale Factory, kindly represented by John, do a great job of actually being the majority of our tech ops team and supporting us where needed. But we've got lots of different people coming together. We're trying to achieve big things.

So when we had this Songkick gatekeeper process, this was roughly what it looked like. As I say, I was the one saying no. We were having code queuing up at the beginning. "Can I integrate this stuff?" "No, you can't." When something did come in, we were pushing it into our CI environment, triggering off automated tests. We had tons of tests, an incredible number of tests. Took an hour. We had that running parallel over quite a number of servers. End-to-end, we were looking at more like 10, 11 hours.

Then we would manually deploy it to our test environment. Then we would do a load of manual testing on both the feature and on the release. Pretty bad idea to do that, actually, because if you find you've not built the right feature, you're quite late in the process. Then you've got to go back to the beginning and tweak this feature when really you want to be focusing on releasing something out. Trying to do both types of testing at the same time is a pretty bad idea.

We had lots of people involved there. I was testing from a testing point of view, but our designers were involved, our product managers were involved, other developers were involved. That took about two hours, best case. If we could get everybody together and all the different people sat down focused for two hours.

Now, I realize these times are pretty quick. For a lot of people, this is what you're hoping to get to, and that's fine. For us, we knew we could go faster. That was really the thing here.

Eventually, the release would go out. Over time, as we added more things, as we added more testing, as something went wrong and we would add yet more testing, this just slowed down. When you start to build up your tech team, add more developers who are writing more code, that queue at the beginning just grows and grows and grows.

It becomes more uncertain because you've got people going, "Okay, well, I know this feature that I want to receive got coded up last week," and you're saying that, "When's it going to be released?" "Well, once the queue's cleared, so maybe next week, but I can't tell you for certain. It sort of depends on all the other stuff." Becomes quite unpopular.

So we knew we wanted to do something, and when you're in a situation like this and you have a process which isn't really working out for you, isn't really meeting your needs, it can be really hard to know what to do. It's quite easy to identify all the things that aren't working, but it's quite hard to actually make a change.

So what do you do? Well, it's actually not that hard. You just fix the first thing. Whatever the single biggest bottleneck is. In our case, it was actually a perceived bottleneck. We just fixed the thing which everybody was complaining about the most. Don't worry so much on whether it really is the thing slowing you down. If that's the thing frustrating everybody the most, then fix it.

In our case, it was around the queue. So we looked at why we had this queue of code. The reason we had the queue of code was because it was really hard to debug. If we pushed code out and we couldn't work out if something went wrong, it was really hard to work out what caused it.

So we fixed it. For us, we fixed it by actually making our developer environments much more production-like. That means our developers can code, and our testers can test, before we even commit code. Suddenly we're freeing things up. We got rid of our queue, massive frustration went away.

It was only after we'd fixed that problem, it became obvious that that wasn't really the problem. The problem was our testing. Step two was fix the testing.

Now, the problem we had with our testing was we had a lot of tests that weren't really serving much purpose. We had thousands of integration tests running automated. But what we'd never done was actually consider what we were testing or why. What would be the impact of not testing this thing? "Okay, well, maybe it'll break." Okay. So what's the impact of that thing breaking? Does it matter? Maybe it's a feature that's actually not that critical to the site. Maybe we can live with that. Maybe it's fine.

What we found was not only did we have a lot of these tests we didn't care about, but we tended to have quite weak monitoring. Because we didn't have monitoring in production, we tried not to break something. So we found there were many things which we could break as long as we know we've broken them. We don't need to test them. It doesn't matter to us so much if we break them, as long as we know we broke them so we can fix them.

So this becomes things that you sort of separate different layers. Critical pieces of the site, don't break them. Uncritical pieces of the site, fine. Know when you have broken them, and do it quickly and be able to fix them. That was kind of step two.

Then you just repeat that, and then eventually one day you run out of bottlenecks. That's not very true. You keep having bottlenecks, but you just iterate over. It doesn't become this big overwhelming problem.

It ends up with more of a DevOps approach. Because we fixed our technical side, because we fixed our tests, we ended up with an approach much more like this, where we're not handing things off to people. We're collaborating.

Now we do most of our coding and testing on developer environments, people pairing up and actually just fixing the problem as early as possible. We do the feature testing right at the beginning. What's the point in committing code if you don't know if you've built the right thing? Get the product manager involved before you do that. Show them, give them a chance to make changes.

When we do push the code, it's a fully automated pipeline. We don't want to have to stop and find the right person and drag them out of a meeting to check things. Now we don't make these mistakes. We don't forget to do things like if the build passes, it gets automatically deployed to our test environment, and then we do the automated release testing where we split these things out.

Release testing is quite predictable, right? You're saying it used to have this behavior. Does it still have this behavior? You can automate that. Because we're now doing the right testing and we're doing it in the right way, that's now 20 minutes, which adds some weight to, if you do break something and you know about it, you can fix it. It's only 20 minutes. Sometimes people don't even notice you broke it. So you can release it.

Then if we need to do more testing, we test it. But we've got this monitoring, and that's the really key thing. Now we talk much more about monitoring. What do we need to know when we put this feature in? How can we know if it's broken? How can we know if users are using it?

Because that's the other key thing here. We're not just talking about infrastructure. We're not talking about server metrics. We're talking about user metrics. How will you know that that feature you have shipped is doing what you expected it to do? Are people using it? Are people using it in the way you expected? Are they contacting customer support more frequently? That's also an interesting metric. Then you can make a decision, and then you can go back, and hopefully you can make it even better.

Lots of people earlier were talking about the benefit of frequent releases and how that aids quality, because if you can get something in front of users, you can see if they're using it, and then you can improve.

This was the impact we found once we moved on to more of a DevOps approach. Massive spike in releases going through. We're not doing more work. We didn't have a bigger team. We just simply improved the way that we pushed code. So people now started working more iteratively, pushing out a small change and then improving it, rather than trying to build an entire feature for 10 weeks and then ship the whole thing in one go.

So in summary, when we lock down these release processes and try and make them safer by having these handover processes and making people sign off on things, we're not necessarily giving us the quality we expect. You might find you have fewer bugs, but that's probably not a great measure of whether people want to use the product you're giving them.

If you go for more of a collaborative nature on DevOps, then you can often just get rid of that stuff. That should be a motivation. Don't try and fit the same test process into a DevOps environment. It should be part of your transformation. You should be engaging the testers and getting them involved early, pairing, and doing all the things that we typically think of when we think of DevOps, but which we perhaps don't normally associate with testing.

Then if this all sounds interesting, I've got a quick slide if you fancy joining us.

Thank you.

Q&A

Host: Thanks so much, Amy.

Amy Phillips: Thanks.

Host: I reckon we've got time for one question. If anyone's got a question? Fantastic, well in.

Q: So you're very much talking about the test and release deployment part.

Amy Phillips: Yes.

Q: Before that, is all the development agile, or have you got some waterfall in there as well? Or do you care?

Amy Phillips: What's your definition of the difference in terms of development? What would make it look like agile development to you?

Q: What would make it look like waterfall is if I defined everything upfront.

Amy Phillips: Okay. Well, this is a small release, so actually in that sense, we define a reasonable amount of it. But if you consider how small the change is...

Q: Right.

Amy Phillips: ...there's definitely room for maneuver. But that's kind of the point of pairing and having people involved, is we do sit down, and we kick off a feature, as we call it, and, "Okay, this is the thing we want to build." Say we want to have a new component on the website.

Q: Mm.

Amy Phillips: "This is what we're going to do. This is how it will look. This is what we want to capture. But ultimately, this is why we're doing it. This is the context." And then as a team, we can agree on the best way to do that, write the code. Everyone has a look like, "I don't know, terrible idea. Change that around." So in that sense, yeah, we have the flexibility of agile.

But I think from my point of view, it's agile in that everything's very small. And so the whole process is very agile. We have the ability, you ship something, it's very small, it's not working. Okay, back to the drawing board. Try again. We haven't invested six months in all this process and sign-off, and it's very rigid. It's definitely all around flexibility.

Q: I think that, no. You haven't got waterfall in the way that...

Amy Phillips: Right, yes.

Q: ...HSBC have got waterfall, or even Barclays...

Amy Phillips: Absolutely, yes.

Q: ...have got waterfall. The fact that you didn't really understand the question. But absolutely. You do that stuff, yeah. That's good. Yeah. Thank you.

Host: Thanks, Ian. Great. Thanks very much.

Amy Phillips: Thank you.

Host: So stick around. In about 10 minutes, Gareth's up. Don't go anywhere. It's going to be fantastic. Thanks again, Amy. Cheers.

Amy Phillips: Thank you.