Why should we care about cutting and pasting open source code?

Open Source Software (OSS) has changed how we write software – the developer community shares code and provides useful guidance and continued enhancements to this shared code. Almost all new apps and modern build systems automatically pull in hundreds of these open source components or libraries into products, as well as other third-party components and files.

The problem? Some members of the developer community can also be very casual about copying files, code snippets, images, binaries or entire modules without respecting their open source licences. Even if the developers are strict about reporting licences for their main components, chances are they’re using code that was already casually copied and enhanced.

How do we fix this? Scanning code using source code fingerprinting is the only way to reliably discover what third-party content’s in the code. The risk of undetected code is great – both from a licensing and vulnerability standpoint. “Where did the code come from?”, “What rights do we have to use it?”, and “Have there been security changes to it since we copied it?” are all questions a team should pay attention to.

Source code fingerprinting is the process of comparing the source in an application to a database of open source projects to see if there’s code that’s been cut and pasted or otherwise brought into the application.

Why and when do developers cut and paste code?

There are a few common cases where a developer will cut and paste code into their codebase. The first is when there’s a need for some build up or tear down code in order to use a software component. This code often comes from the original project and shows how to properly call into the larger component. This skeleton code is often not clearly labelled with copyrights and licenses and ends up in the user’s core project code. Open source projects can help with this use case by more clearly labelling skeleton and example code and taking into account the potential license differences between core component code and the code required to call into it.

Another common case is when a core routine is useful, while the rest of the component is unneeded. Algorithms, helper functions and static data are often seen examples of this.

What are the licensing considerations and potential problems?

There’s been a lot of discussion of “How much is too much?” for cut and pasted code over the years, with a tension between the “How else am I supposed to do this?” and the “Source code has copyright and licences associated with it” camps. It may be clear that a whole page of source code that originally has a copyright and licence at the top should have that information preserved, but what about a few lines or a method that comes from a file that lacks this information? Your best source of guidance around what’s appropriate is your company’s IP lawyer. If you don’t have one or they don’t have an opinion on open source licensing policy, there are many outside counsels who specialise in this topic.

A lack of a licence should be a clear warning sign to the developer that they may be causing downstream problems for their project. A little care and resource at this point can save a later headache.

What are problems with how developers do this now?

It’s common for developers to want to give credit where credit is due. The problem with how this is commonly done is that often the original copyright and licence aren’t brought along with the snippet, and the developer may give credit in a flippant way using language such as “code stolen from xyz” or “shamelessly lifted from the Foo project”.  While this language is taken badly by the legal team, it’s often a sign of the developer trying to carve out attribution for this copied code. It’s important to provide clear guidance on how to properly bring in code snippets for licensing and security review purposes. Preserving or adding the proper copyright and license information is important to remain in compliance. It’s also invaluable for future readers of the source code to understand who wrote what.

What are typical policies around the cutting and pasting of source code?

While all companies have different considerations or use cases, this type of guidance typically involves proper procedures for recording the owner, contact information and licence for the snippet in question.

If this information isn’t readily available, a request to the original author is sometimes made. This request includes a pointer to the code in question, the use case and a request for a specific licence if not already specified. If there is no response, or the license the author selects is contrary to your policy, it’s common to look for a new source for code that solves your problem.

By educating developers, creating easy to follow policies and scanning your source code for potential problems, you’ll be able to create a compliant and secure product without having to fix source code related problems at release or M&A time.

Written by Jeff Luszcz, Vice President of Product Management at Flexera

More
articles