As Hyland utilizes Docker technologies to distribute and run Alfresco, and a lot of other software, I thought that speaking about legal compliance could be interesting. Indeed, end users, paying customers but also open source organisations, are more and more worry about it.
Let say you integrate Alfresco into other solutions for your customers, and then create a Docker image with custom work and other software, then share it with your customers so they can deploy and run using Docker and Kubernetes. Are you sure that your image is actually compliant?
In this article I will provide some basics and best practices that should avoid you to fall in the common traps.
Paul Teysen, unsplash.com, CC 0
What is legal compliance and why it is a tricky topic for developers?
In risk management, there are two main topics: security and legal compliance. And while security is usually a well-known topic, with impacts easy to understand, this is not the case of legal compliance.
Let's explain legal compliance in a practical way:
- It is about studying and being able to share the software bill of material (also known as SBOM) with as much detail as possible (list of dependencies, versions, licenses, sources in the case of open source) and ensure that there is no legal issue to distribute software as a package, a firmware, an image, or any other ways.
- With full-proprietary software, the editor usually owns all the intellectual property directly or via partnership contracts.
- With open source software, intellectual property of each aggregated software is owned by different people or organisations. And each open source software is distributed with a license that we must respect to use and (re)distribute the software; for example, as a dependency. Developers and organisations choose a license for their project on purpose, and the license tells us what we can and cannot do. Sometimes, licenses are not compatible between them — meaning that you can install them, use them, modify them, but not redistribute them together as a single package.
- Most software distributed nowadays is a combination of proprietary and open source software.
I would like to highlight something important: when you share a container image, it's super cool because it makes it easy to share, deploy and run software. But an image containing software with different licenses might not be legal compliant. It's all about copyright and intellectual property management, which is not really taught in tech schools. And most of the time, it is not possible to fix that with code, maybe explaining why it is a difficult topic for developers.
So, let’s take the challenge to make it something interesting for once.
Different use cases
First, let's define what is a container. The consensus is that a container is a running instance of an image. It means that you don't share containers. Instead, you share either "images" ready to deploy, or "image files" that your end users will build by themselves before being able to run instances.
So, regarding legal compliance, there is two main use cases, that I will detail below.
The case of images:
An image — or container image — contains one or much software, with dependencies, rules, modification, configuration, etc. already built and ready to run. Example: Docker image.
Here are the basics:
- In term of legal compliance, it is like a zip file with many software (or a firmware) and thus, software licenses must be compatible between them. You must understand that open source license clauses frame the right of use and distribution, explaining what is possible to do and not do. Some are very permissive (MIT, BSD), and some include constraints (LGPL, GPL).
- The license of an image is usually the strongest license of all the included software + dependencies. In some case, there are incompatible aggregations of software. The most well-known is GPL + proprietary software. Because there are many open source software licenses, in some cases, they are not compatible for distribution purposes (even if you can install them separately and run them together).
- This is very difficult to study the SBOM of an image because of the different layers inside built images, and each layer can have their own hidden dependencies.
- Open source licenses require the distributor to share the entire source code. This is becoming a common request from end users, be they free users or paying customers. While it is the case with open source base image (only one software + dependencies in the image, like PostgreSQL), in most case, complex images don't respect this requirement.
Advice in this use case:
- Building and sharing images that are as simple as possible is a good way to ensure they are legal compliant.
- Build complex images for internal/personal use is the best use case.
- Avoid distributing complex images made of many software and dependencies if you can't provide SBOM and check legal compliance.
- Store every open source software sources you used to build your image into a public repository. And put a link to it in a visible way for the end user (“read me” in the sources, “about” in the UI, and even better, a list of third-party software like what you will find in Android parameters...). Note that this archive can also be useful in case a version of a software disappears from a public repository: you are sure to be able to build again in the future, or to provide the sources to an end user requiring them.
So, now that we have seen the pro and con of images, there is actually a very good work around when you need to share/distribute a complex stack of software.
The case of image files:
An image file describes how to build an image with all the required dependencies, rules to build, modifications, etc., but contains no software at all. Example: Dockerfile.
Here are the basics:
- The license of an image file can be completely different from the software it lists. A best practice here seems to use a permissive license like MIT, allowing your end users to fine tune it. A good example is for integration purposes. You can also choose a more restrictive license (even proprietary) to avoid too many different builds; for example, if you sell support.
- In terms of compliance, image files are very good, as you don't distribute software as such. Instead, end users download a file without software in it. And they will download each required software and dependencies at build time, on their own. It is like if they would download and install each software after each software, accepting a license each time. The magic of the image file makes it easy to run them together, for it contains all the required rules, modifications and configurations.
- The main problem with image files compared to images is reproducibility. With an image, every end user runs the exact same thing that you share. With image files, each end user must first build the image. Because of the many image repositories and many software versions available (or not anymore, that’s the problem), different builds might get different outcomes. Yes, it could run anyway, but you might have to support differences in dependency versions.
Advice in this use case:
- If you want to distribute a complex set of software together and ensure that it is legally compliant, use image files instead of images. This is the easiest way.
- In your image file, don't forget to define each version, preferred repositories, and other required information. It must have as much details as possible so each built will be identical.
- Same as for images: store all the software listed in an image file in a repository (public or private), in case the version of some dependencies you use disappears from public repositories (by experience, it will happen). This is true for the editor of the image file, but also in this case for the end user. This way you are sure to be able to build again the image file in the future, example if you apply a security patch.
Image files, when well crafted, are a very good way to share and distribute a software environment. However, it is less easy to create awareness, as most public repositories allow only built images to be shared and listed.
In some fashion, it reminds me the good old time of using Maven and sharing a build script or a pom.xml file.
Tooling and solutions
Studying the compatibility of licenses is a huge work and it is impossible to cover that in an article. There are experts on the field and consulting companies providing services and solutions that will help you to build a software bill of material of your container images and check legal compliance. Of course, when it comes to software distribution, your legal team/advisor must have the last word on what you develop, including the dependencies you can add (or not, sorry in advance) to your project.
That said, there are very good open- source solutions and standards to work on SBOM and legal compliance:
- Fossology, an open- source license compliance software system and toolkit. License, copyright and export scanners are tools available to help with your compliance activities.
- Tern, a software composition analysis tool that generates a Software Bill of Materials for container images and Docker files.
- SPDX, Software Package Data Exchange is an open standard for communicating software bill of material information including components, licenses, copyrights and security references. It also makes your code license machine readable for tools like Fossology, Tern or Reuse.
You can also have a look at Docker SBOM, but be aware that it is still an experimental tool, and only for Docker images. And while it will help to list all software in a container, you will still need to have legal compliance skills. The other tools I listed before will also work with other container technologies.
This article target is to introduce legal compliance with container technologies, including Docker, the trendiest at the time of this writing. I hope that you learned something useful and that you will be able to use some of the advice and workaround to avoid common traps with container image distribution.
Please leave any comments or questions below!
Read more on the topic: