Log in to watch

Log in or create a free account to watch this video.

Log in
Las Vegas 2018
Share
Download slides

Lightning Talk: Code + ML - Will Automation Take Our Jobs

Lightning Talk

Chapters

Full transcript

The complete talk, organized by section.

Dr. Stephen Magill

I'm Stephen Magill. I'm going to be talking about the combination of code and machine learning.

So a bit more technical, but I'm going to keep it high level. I'm going to go fast and stick to five minutes. The question is: how is this combination of code and ML going to change development in the future, and will robots be taking over our development jobs?

The background for this is a bunch of work that's happened over the last five or so years in the academic research community, looking at how you can take machine learning concepts and use those in a development or code analysis context. That's led to a bunch of startups focused exclusively on how you can apply learning to code, as well as other companies like ours looking at ways to combine learning with other techniques to enable more robust, useful code analysis.

So the question here is: are all these technologies and tools just going to form this tidal wave of change that eliminates developers and we have to all go find something else to do? Or are they going to enable developers in a new way and allow developers to be more effective, more productive, and have more fun? Look how much fun that guy's having.

The idea is, if these tools can take care of the tedious aspects of development, then we get to focus on the fun stuff, and that's great. I think it's much more the latter than the former, and I'm going to go into why in a little bit. But first, I want to show an example of just some of the things that are possible using machine learning with code.

I'm going to use this example from natural language processing, because that's a great place to look for techniques, because after all, we write code in programming languages, right?

A really cool result from natural language learning is you can do this thing where you take a neural network, you train it by inputting words, English language words on the left, and then you ask it to predict the context in which those words occurred. What other words were nearby them?

In this, you give it a huge corpus of just English language text: articles, books, et cetera. And what happens is, if you then look what's happening inside the network, it learns this cool representation of the meaning of these words. So it's just a bunch of numbers floating around in here, but if you look at those numbers and you plot them, here I plotted them in two dimensions, the spatial relationships tell you really cool things.

This is a collection of country names and capital city names. If you draw a line between here, you can see that the items below the line, those countries are entirely in Europe. Those above the line are at least partly in Asia. So already there's some spatial clustering that tells you something about what those words represent and the countries that they represent.

And then there's other cool effects that fall out of this. You have this effect where distance captures similarity of concepts. So Russia is closer to China than to Italy. That's true geographically, it's true geopolitically, and it's true geometrically in this representation of the data that's learned by the network.

Even cooler, you can use math to create analogies. So if you take Russia, you take the point representing Russia, you subtract Moscow, its capital, and you add Paris, the capital of France, you get a point that is very close to where France is in this display. So you can use these vector operations to discover relationships in the data.

So how does this apply to code? Well, you can take these same techniques and apply them to code. Substitute English language words with method names. And you can use similar approaches to discover that the method `count` does something very similar to the method `getCount`, or that if you take `equals` and add `toLower`, then you get something that's like `equalsIgnoreCase`.

And so this sort of technology has a lot of applications: program de-obfuscation, adding code comments, code completion, code similarity.

And then if you go through and you look at other machine learning techniques, a lot of them have very cool applications to code. So the typical classification task: take an image and say, "Is this a cute cat or an ugly cat?" And I know the second one's an ugly cat because I asked Google for a picture of an ugly cat and it gave me that.

So that corresponds to a code smell detection or a vulnerability detection task. Automated translation, the kind that you would do to convert English phrases to German phrases, corresponds to automated porting among programming languages.

And then there's this image completion task where you take a picture and remove part of it, maybe a telephone pole was in the way, and you ask the neural network to fill in the details. Well, that corresponds to a smarter, more context-aware code completion. And so you can get some really cool code completion tools out of this.

And then that's just scratching the surface. There's a bunch of other tasks people have looked at: focusing attention during code review, automatically generating glue code, checking API usage, predicting performance problems, and even taking actual English language descriptions, "Search for this string in this buffer," and generating code from those.

But what you'll find in common among all of those is the tools are focusing on the formulaic parts of development, these local sort of repetitive tasks. And that's actually great, because then developers get to focus on the fun, creative parts: the architecture, the business logic, the security story, and so forth. And so we can reach this point, hopefully, in the future, where we can develop enterprise-grade, scalable applications without a lot of the minor annoyances and roadblocks that go along with that.

So a bunch of these techniques have open source implementations you can go play with. If you look up that last slide, there are pointers. And if you're interested in this, come find me. I'd love to talk about it.

Thank you.