Software Repository Manager
   HOME

TheInfoList



OR:

A software repository, or repo for short, is a storage location for software packages. Often a table of contents is also stored, along with metadata. A software repository is typically managed by source or
version control Version control (also known as revision control, source control, and source code management) is the software engineering practice of controlling, organizing, and tracking different versions in history of computer files; primarily source code t ...
, or repository managers.
Package manager A package manager or package management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner. A package manager deals wi ...
s allow automatically installing and updating repositories, sometimes called "packages".


Overview

Many software publishers and other organizations maintain servers on the
Internet The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
for this purpose, either free of charge or for a subscription fee. Repositories may be solely for particular programs, such as
CPAN The Comprehensive Perl Archive Network (CPAN) is a software repository of over 220,000 software modules and accompanying documentation for 45,500 distributions, written in the Perl programming language by over 14,500 contributors. ''CPAN'' can de ...
for the
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
, or for an entire
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
. Operators of such repositories typically provide a
package management system A package manager or package management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner. A package manager deals wi ...
, tools intended to search for, install and otherwise manipulate software packages from the repositories. For example, many
Linux distribution A Linux distribution, often abbreviated as distro, is an operating system that includes the Linux kernel for its kernel functionality. Although the name does not imply product distribution per se, a distro—if distributed on its own—is oft ...
s use
Advanced Packaging Tool Advanced Package Tool (APT) is a free-software user interface that works with core libraries to handle the installation and removal of software on Debian and Debian-based Linux distributions. APT simplifies the process of managing software on ...
(APT), commonly found in
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
based distributions, or Yellowdog Updater, Modified ( yum) found in
Red Hat Red Hat, Inc. (formerly Red Hat Software, Inc.) is an American software company that provides open source software products to enterprises and is a subsidiary of IBM. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North ...
based distributions. There are also multiple independent package management systems, such as pacman, used in
Arch Linux Arch Linux () is an Open-source software, open source, rolling release Linux distribution. Arch Linux is kept up-to-date by regularly updating the individual pieces of software that it comprises. Arch Linux is intentionally minimal, and is meant ...
and equo, found in
Sabayon Linux Sabayon Linux (formerly ''RR4 Linux'' and ''RR64 Linux'') was an Italian Gentoo-based Linux distribution created by Fabio Erculiani and the Sabayon development team. Sabayon followed the " out of the box" philosophy, aiming to give the user a wid ...
. As software repositories are designed to include useful packages, major repositories are designed to be
malware Malware (a portmanteau of ''malicious software'')Tahir, R. (2018)A study on malware and malware detection techniques . ''International Journal of Education and Management Engineering'', ''8''(2), 20. is any software intentionally designed to caus ...
free. If a computer is configured to use a digitally signed repository from a reputable vendor, and is coupled with an appropriate permissions system, this significantly reduces the threat of malware to these systems. As a side effect, many systems that have these abilities do not need anti-malware software such as
antivirus software Antivirus software (abbreviated to AV software), also known as anti-malware, is a computer program used to prevent, detect, and remove malware. Antivirus software was originally developed to detect and remove computer viruses, hence the name ...
. Most major
Linux distribution A Linux distribution, often abbreviated as distro, is an operating system that includes the Linux kernel for its kernel functionality. Although the name does not imply product distribution per se, a distro—if distributed on its own—is oft ...
s have many repositories around the world that mirror the main repository. At client side, a package manager helps installing from and updating the repositories.


Package management system vs. package development process

A
package management system A package manager or package management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner. A package manager deals wi ...
is different from a
package development process A software package development process is a system for developing software packages. Such packages are used to reuse and share code, e.g., via a software repository. A package development process includes a formal system for package checking tha ...
. A typical use of a package management system is to facilitate the integration of code from possibly different sources into a coherent stand-alone operating unit. Thus, a package management system might be used to produce a distribution of Linux, possibly a distribution tailored to a specific restricted application. A package development process, by contrast, is used to manage the co-development of code and documentation of a collection of functions or routines with a common theme, producing thereby a package of software functions that typically will not be complete and usable by themselves. A good package development process will help users conform to good documentation and coding practices, integrating some level of
unit testing Unit testing, component or module testing, is a form of software testing by which isolated source code is tested to validate expected behavior. Unit testing describes tests that are run at the unit-level to contrast testing at the Integration ...
.


Selected repositories

The following table lists a few languages with repositories for contributed software. The "Autochecks" column describes the routine checks done. Very few people have the ability to test their software under multiple operating systems with different versions of the core code and with other contributed packages they may use. For the
R programming language R is a programming language for statistical computing and data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science. The core R language is extended by a large number of so ...
, the Comprehensive R Archive Network (CRAN) runs tests routinely. To understand how this is valuable, imagine a situation with two developers, Sally and John. Sally contributes a package A. Sally only runs the current version of the software under one version of Microsoft Windows, and has only tested it in that environment. At more or less regular intervals, CRAN tests Sally's contribution under a dozen combinations of operating systems and versions of the core R language software. If one of them generates an error, she gets that error message. With luck, that error message details may provide enough input to allow enable a fix for the error, even if she cannot replicate it with her current hardware and software. Next, suppose John contributes to the repository a package B that uses a package A. Package B passes all the tests and is made available to users. Later, Sally submits an improved version of A, which unfortunately, breaks B. The autochecks make it possible to provide information to John so he can fix the problem. This example exposes both a strength and a weakness in the R contributed-package system: CRAN supports this kind of
automated testing In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive bu ...
of contributed packages, but packages contributed to CRAN need not specify the versions of other contributed packages that they use. Procedures for requesting specific versions of packages exist, but contributors might not use those procedures. Beyond this, a repository such as CRAN running regular checks of contributed packages actually provides an extensive if ''ad hoc'' test suite for development versions of the core language. If Sally (in the example above) gets an error message she does not understand or thinks is inappropriate, especially from a development version of the language, she can (and often does with R) ask the core development-team for the language for help. In this way, the repository can contribute to improving the quality of the core language software. (Parts of this table were copied from a "List of Top Repositories by Programming Language" on
Stack Overflow In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many fa ...
) Many other programming languages, among them C, C++, and Fortran, do not possess a central software repository with universal scope. Notable repositories with limited scope include: *
Netlib Netlib is a repository of software for scientific computing maintained by AT&T, Bell Laboratories, the University of Tennessee and Oak Ridge National Laboratory. Netlib comprises many separate programs and libraries. Most of the code is written in ...
, mainly mathematical routines for Fortran and C, historically one of the first open software repositories; *
Boost Boost, boosted or boosting may refer to: Science, technology and mathematics * Boost, positive manifold pressure in turbocharged engines * Boost (C++ libraries), a set of free peer-reviewed portable C++ libraries * Boost (material), a material b ...
, a strictly curated collection of high-quality libraries for C++; some code developed in Boost later became part of the C++ standard library.


Package managers

Package managers help manage repositories and the distribution of them. If a repository is updated, a package manager will typically allow the user to update that repository through the package manager. They also help with managing things such as dependencies between other software repositories. Some examples of Package Managers include:


Repository managers

In an enterprise environment, a software repository is usually used to store artifacts, or to mirror external repositories which may be inaccessible due to security restrictions. Such repositories may provide additional functionality, like access control, versioning, security checks for uploaded software, cluster functionality etc. and typically support a variety of formats in one package, so as to cater for all the needs in an enterprise, and thus aiming to provide a single point of truth. One example is Sonatype Nexus Repository. At server side, a software repository is typically managed by source control or repository managers. Some of the repository managers allow to aggregate other repository location into one URL and provide a caching proxy. When doing continuous builds many artifacts are produced and often centrally stored, so automatically deleting the ones which are not released is important.


Relationship to continuous integration

As part of the development lifecycle, source code is continuously being built into binary artifacts using
continuous integration Continuous integration (CI) is the practice of integrating source code changes frequently and ensuring that the integrated codebase is in a workable state. Typically, developers Merge (version control), merge changes to an Branching (revisio ...
. This may interact with a binary repository manager much like a developer would by getting artifacts from the repositories and pushing builds there. Tight integration with CI servers enables the storage of important metadata such as: * Which user triggered the build (whether manually or by committing to revision control) * Which modules were built * Which sources were used (commit id, revision, branch) * Dependencies used * Environment variables * Packages installed


Artifacts and packages

Artifacts and packages inherently mean different things. Artifacts are simply an output or collection of files (ex. JAR, WAR, DLLS, RPM etc.) and one of those files may contain metadata (e.g. POM file). Whereas packages are a single archive file in a well-defined format (ex.
NuGet NuGet (pronounced "New Get")And The Winner Is, NuGet
haacked ...
) that contain files appropriate for the package type (ex. DLL, PDB). Many artifacts result from builds but other types are crucial as well. Packages are essentially one of two things: a library or an application. Compared to source files, binary artifacts are often larger by orders of magnitude, they are rarely deleted or overwritten (except for rare cases such as snapshots or nightly builds), and they are usually accompanied by much metadata such as id, package name, version, license and more.


Metadata

Metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
describes a binary artifact, is stored and specified separately from the artifact itself, and can have several additional uses. The following table shows some common metadata types and their uses:


See also

*
Package manager A package manager or package management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer in a consistent manner. A package manager deals wi ...
*
RPM Package Manager RPM Package Manager (RPM) (originally Red Hat Package Manager, now a recursive acronym) is a free and open-source package management system. The name RPM refers to the file format and the package manager program itself. RPM was intended primar ...
*
Synaptic (software) Synaptic is a GTK-based graphical user interface designed for the APT (software), APT package manager used by the Debian Linux distribution and its derivatives. Synaptic is usually used on systems based on Deb (file format), deb packages but can ...
*
FreeBSD Ports The FreeBSD Ports collection is a package management system for the FreeBSD operating system. Ports in the collection vary with contributed software. There were 38,487 ports available in February 2020 and 36,504 in September 2024. It has also be ...
* Definitive media library *
dpkg dpkg is the software at the base of the package management system in the free software, free operating system Debian and its numerous Debian family, derivatives. dpkg is used to install, remove, and provide information about deb (file format), . ...
* Simtel


References

{{Computer science
Repository Repository may refer to: Archives and online databases * Content repository, a database with an associated set of data management tools, allowing application-independent access to the content * Disciplinary repository (or subject repository), an ...