GitHub Copilot and Open Source: A Love Story That Won’t End Well

Sasha Medvedovsky

Originally published at The New Stack

GitHub has been an important part of the software development world, and of open source software in particular. It has provided free hosting for open source projects (the Apache Software Foundation moved its entire operation to GitHub a few years ago), and played a large part in turning the open source git into the popular source control management (SCM) system it is now.

However, it seems that the cooperation is now coming to an abrupt and ugly ending, with the Software Freedom Conservancy (SFC) joining Free Software Foundation in a recommendation to cut ties with GitHub over the creation of GitHub Copilot.

GitHub’s recently commercialized offering of Copilot (which was free until very recently), which delivers AI-powered code composition/auto-completion, was built upon the sourcing of code from the millions of open source projects hosted in GitHub. Needless to say, not all open source projects were created equal, with many different licenses (learn more about OSS licenses), some of which DO NOT enable the reuse or “copyleft” of code, despite being publicly available on GitHub.

To many open source developers, this constitutes unauthorized use of their work, and a breach of their trust. Obviously, Copilot wouldn’t work without ingesting millions of code samples from GitHub, so it’s safe to say that the open source code is an integral part of it. Moreover, any code created by Copilot could be considered a derivative of this open source code (in some cases whole snippets of open source code could find their way into a closed-source codebase).

It’s true that using the code for training an AI model is somewhat different from simply using the code as it is. But shouldn’t the code’s creators at least be consulted whether they agree to this use of their creation?

If this recent divorce between GitHub and open source organizations may seem surprising, it shouldn’t be. It really stems from a misalignment of goals and ideals.

GitHub Is a Commercial Organization

From the beginning, GitHub has been a commercial organization that has turned open source software — git — into a business. While there’s nothing wrong with doing so — plenty of companies have built thriving businesses through commercial offerings of open source technology — it’s imperative we don’t get confused and consider GitHub an open source company or project. It’s neither. This confusion lies in its business model, where production-grade, hosted git was provided for a fee to commercial organizations, and free for open source projects.

As someone once said, “if the product is free, YOU are the product.” Never has this sentence been more correct than in the case of GitHub. In 2018 Microsoft acquired GitHub for $7.5 billion The common understanding was that the high price (for 2018) was paid not for GitHub’s technology (again, it didn’t develop git, and there were many competitors, e.g. BitBucket and GitLab); but rather for its developer community, which at that time was 28 million strong.

If Microsoft paid for the OSS community, Microsoft was ultimately going to use the community to make profit. Microsoft is a commercial entity with shareholders and has an obligation to make as much profit as possible. Copilot is just the perfect example of that. Microsoft owns both GitHub, and a large stake in OpenAI, the AI company that trained the Copilot AI model. The cooperation makes so much corporate sense that can be summarized as: they have all of the most popular OSS projects in the world that they are hosting, alongside amazing AI capabilities. It just makes sense to use the synergies to make a commercially successful product.

There’s just one problem with this line of thought: hosting the code doesn’t mean that Microsoft owns the code. And this is not the first time this company has made this mistaken assumption.

The Marak faker.js Debacle

One illustrative exchange that took place recently points at the potential dangers.

A developer, who goes by the handle of Marak, intentionally broke the code of his open source Faker mock data generator, because he allegedly felt his work was thankless. He complained about the lack of funding for his popular projects, including Faker, which are used by hundreds of companies.

This opened the whole Pandora’s Box of who really owns open source code. What if companies are using the code in production? The developer can just break the code… And that’s it?

GitHub got involved, and reverted the changes, and denied Marak access to his own projects (around 100).

NPM (incidentally, owned by Microsoft as well) has also reverted his repo to a previous version — effectively taking control of his code.

Imagine the situation: a programmer has created a very useful open source project. They have maintained and provided it for free for hundreds of companies. Then they decide to make a change that the companies did not like. Then Microsoft (through GitHub and NPM) took over their code repositories and reverted their changes.

Does this look like Microsoft understands that the developer owns the code, or do they think that Microsoft owns the code?

Conclusion

I don’t think the open source movement should cut all ties with commercial organizations, or stop using commercial products. Cooperation is a good thing. It’s not a zero-sum game, and it helps to benefit humanity as a whole.

But the boundaries should be clearly set. If a developer doesn’t want their code to be used in commercial applications, they should be given a right to refuse. If they are ok with it, then there’s no problem. But companies (be it Microsoft, Google or Amazon Web Services) shouldn’t just assume that if they give something for free they can take something else in return.

At the company I co-founded, Diversion, we have developed our own SCM. We plan to release it as open source (on our own platform, not on GitHub), and we hope it will become useful to millions of developers.

We will also offer free hosting for open source and indie developers, as our thanks and giveback to the amazing people who’ve given their time and effort for the betterment of all humankind, without asking for anything in return.

In light of these recent developments, I feel that there’s a need to make a promise: we pledge, right here, to honor the software creators’ license agreements, and to not use their code in ways they do not agree with.

To me it’s something that should go without saying; but apparently, it needs to be said explicitly.

Note: Sharone Zitzman contributed to this post.

Share Us