
Reproducible builds, also known as deterministic compilation, is a process of
compiling software which ensures the resulting
binary code
A binary code represents text, computer processor instructions, or any other data using a two-symbol system. The two-symbol system used is often "0" and "1" from the binary number system. The binary code assigns a pattern of binary digits, also ...
can be
reproduced.
Source code compiled using deterministic compilation will always output the same binary.
Reproducible builds can act as part of a
chain of trust
In computer security, a chain of trust is established by validating each component of hardware and software from the end entity up to the root certificate. It is intended to ensure that only trusted software and hardware can be used while still ...
;
[ the source code can be signed, and deterministic compilation can prove that the binary was compiled from trusted source code.
]
Methods
For the compilation process to be deterministic, the input to the compiler must be the same, regardless of the build environment used. This typically involves normalizing variables that may change, such as order of input files, timestamps, locales, and paths.
Additionally, the compilers must not introduce non-determinism themselves. This sometimes happens when using hash tables with a random hash seed value. It can also happen when using the address of variables because that varies from address space layout randomization (ASLR).
Build systems, such as Bazel and Gitian, can be used to automate deterministic build processes.
History
The GNU Project used reproducible builds in the early 1990s. Changelogs from 1992 indicate the ongoing effort.
One of the older projects to promote reproducible builds is the Bitcoin
Bitcoin ( abbreviation: BTC; sign: ₿) is a decentralized digital currency that can be transferred on the peer-to-peer bitcoin network. Bitcoin transactions are verified by network nodes through cryptography and recorded in a public distr ...
project with Gitian. Later, in 2013, the Tor (anonymity network) project started using Gitian for their reproducible builds.
In July 2013 on the Debian
Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of D ...
project started implementing reproducible builds across its entire package archive.
By July 2017 more than 90% of the packages in the repository have been proven to build reproducibly.
In November 2018, the Reproducible Builds project joined the Software Freedom Conservancy
Software Freedom Conservancy, Inc. is an organization that provides a non-profit home and infrastructure support for free and open source software projects. The organization was established in 2006, and as of June 2022, had over 40 member pro ...
.
F-droid uses reproducible builds to provide a guarantee that the distributed APKs use the claimed free source code.
The Tails portable operating system uses reproducible builds and explains to others how to verify their distribution.
NixOS claims 100% reproducible build in June 2021.
Challenges
According to the Reproducible Builds project, timestamps are "the biggest source of reproducibility issues. Many build tools record the current date and time... and most archive formats will happily record modification times on top of their own timestamps." They recommend that "it is better to use a date that is relevant to the source code instead of the build: old software can always be built later" if it is reproducible. They identify several ways to modify build processes to do this:
* Set the SOURCE_DATE_EPOCH environment variable to the number seconds since January 1, 1970, using something from the source code. Tools that support this environment variable will use its value (when set) instead of the current date and time.
* Post-process output to remove timestamps or normalize them. The tool strip-nondeterminism can often help do this.
* Use a library like libfaketime to intercept requests for the current time of day and provide a controlled response.
In some cases other changes must be made to make a build process reproducible. For example, some data structures do not guarantee a stable order in each execution. A typical solution is to modify the build process to specify a sorted output from those structures.
See also
* Bootstrapable builds Bootstrapable builds, a process of compiling software that doesn't depend on (compiler) binaries that aren't build from source by this process.
This process can protect against compiler backdoors: if the build process doesn't depend on binary code ...
References
{{reflist
External links
reproducible-builds.org
Debian Reproducible Builds
Compiling tools