Nextflow
   HOME

TheInfoList



OR:

Nextflow is a
scientific workflow system A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application. Scientific workflow syst ...
predominantly used for
bioinformatic Bioinformatics () is an interdisciplinary field of science Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divi ...
data analysis. It establishes standards for programmatically creating a series of dependent computational steps and facilitates their execution on various local and
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles, suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
resources.


Purpose

Many scientific data analyses require a significant amount of sequential processing steps. Custom scripts may suffice when developing new methods or infrequently running particular analyses, but scale poorly to complex task successions or many samples.
Scientific workflow system A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application. Scientific workflow syst ...
s like Nextflow allow formalizing an analysis as a data analysis pipeline. Pipelines, also known as workflows, specify the order and conditions of computing steps. They are accomplished by special purpose programs, so-called workflow executors, which ensure predictable and reproducible behavior in various computing environments. Workflow systems also provide built-in solutions to common challenges of workflow development, such as the application to multiple samples, the validation of input and intermediate results, conditional execution of steps, error handling, and report generation. Advanced features of workflow systems may also include scheduling capabilities, graphical user interfaces for monitoring workflow executions, and the management of dependencies by containerizing the whole workflow or its components. Typically, scientific workflow systems initially present a steep learning challenge as all their features and complexities are built on in addition to the actual analysis. However, the standards and abstraction imposed by workflow systems ultimately improve the traceability of analysis steps, which is particularly relevant when collaborating on pipeline development, as is customary in scientific settings.


Characteristics


Specification of workflows

In Nextflow, pipelines are constructed from individual processes that work in parallel to perform computational tasks. Each process is defined with input requirements and output declarations. Instead of running in a fixed sequence, a process starts executing when all its input requirements are fulfilled. By specifying the output of one process as the input of another, a logical and sequential connection between processes is established. This reactive implementation is a key
design pattern A design pattern is the re-usable form of a solution to a design problem. The idea was introduced by the architect Christopher Alexander and has been adapted for various other disciplines, particularly software engineering. The " Gang of Four" ...
of Nextflow and is also known as the functional dataflow model. Processes and entire workflows are programmed in a
domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
(DSL) which is provided by Nextflow which is based on
Apache Groovy Apache Groovy is a Java-syntax-compatible object-oriented programming language for the Java platform. It is both a static and dynamic language with features similar to those of Python, Ruby, and Smalltalk. It can be used as both a programming l ...
. While Nextflow's DSL is used to declare the workflow logic, developers can use their scripting language of choice within a process and mix multiple languages in a workflow. It is also possible to port existing scripts and workflows to Nextflow. Supported scripting languages include bash, csh, ksh,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
, and R. Any scripting language that uses the standard Unix shebang declaration (#!/bin/bash) is compatible with Nextflow. Below is an example of a workflow consisting of only one process: process hello_world workflow To enable easy collaboration on workflows, Nextflow natively support for source-code management systems and
DevOps DevOps is the integration and automation of the software development and information technology operations. DevOps encompasses necessary tasks of software development and can lead to shortening development time and improving the development life ...
platforms including
GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
,
GitLab GitLab is a software forge primarily developed by GitLab Inc. It is available as a community edition and a commercial edition. History GitLab was created in 2011 by Ukrainian programmer Dmitriy Zaporozhets as a side project written in Rub ...
, and others.


Execution of workflows

Nextflow's DSL allows workflows to be deployed and run across different computing environments without having to modify the pipeline code. Nextflow comes with specific executors for various platforms, including major cloud providers. It supports the following environments for pipeline execution: * ''Local'': This is the default executor where Nextflow pipelines run on Linux or Mac OS, and the execution occurs on the computer where the pipeline is launched. * ''HPC workload managers'': Nextflow supports workload managers such as Slurm, SGE, LSF, Moab, PBS Pro, PBS/Torque, HTCondor, NQSII, and OAR. * ''Kubernetes'': Nextflow can be used with local or cloud-based Kubernetes implementations (GKE, EKS, or AKS). * ''Cloud batch services'': It is compatible with AWS Batch and Azure Batch * ''Other environments'': Nextflow can also be used with Apache Ignite, Google Life Sciences, and various container frameworks for portability.


Containers for portability across computing environments

In Nextflow, there is tight integration with software containers. Workflows and single processes can utilize containers for their execution across different computing environments, eliminating the need for complex installation and configuration routines. Nextflow supports container frameworks such as Docker, Singularity, Charliecloud, Podman, and Shifter. These containers can be automatically retrieved from external repositories when the pipeline is executed. Additionally, it was revealed at Nextflow Summit 2022 that future versions of Nextflow will support a dedicated container provisioning service for better integration of customized containers into workflows.


Developmental history

Nextflow was originally developed at the Centre for Genomic Regulation in Spain and released as an open-source project on GitHub in July 2013. In October 2018, the project license for Nextflow was changed from
GPLv3 The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
to Apache 2.0. In July 2018, Seqera Labs was launched as a spin-off from the Centre for Genomic Regulation. The company employs many of Nextflow's core developers and maintainers and provides commercial services and consulting with a focus on Nextflow. In July 2020, a major extension and revision of Nextflow's
domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
was introduced to allow for sub-workflows and additional improvements. In the same year, monthly downloads of Nextflow reached approximately 55,000.


Adoption and reception


The ''nf-core'' community

The nf-core project has been adopted by several sequencing facilities including the
Centre for Genomic Regulation The Centre for Genomic Regulation (CRG, ''Centre de Regulació Genòmica'' in Catalan) is a biomedical and genomics research centre based in Barcelona. Most of its facilities and laboratories are located in the Barcelona Biomedical Research Par ...
, the Quantitative Biology Center in
Tübingen Tübingen (; ) is a traditional college town, university city in central Baden-Württemberg, Germany. It is situated south of the state capital, Stuttgart, and developed on both sides of the Neckar and Ammer (Neckar), Ammer rivers. about one in ...
, the
Francis Crick Institute The Francis Crick Institute (formerly the UK Centre for Medical Research and Innovation) is a biomedical research centre in London, which was established in 2010 and opened in 2016. The institute is a partnership between Cancer Research UK, Im ...
,
A*STAR The Agency for Science, Technology and Research (A*STAR) is a statutory board under the Ministry of Trade and Industry of Singapore. The agency supports R&D that is aligned to areas of competitive advantage and national needs for Singapore ...
Genome Institute of
Singapore Singapore, officially the Republic of Singapore, is an island country and city-state in Southeast Asia. The country's territory comprises one main island, 63 satellite islands and islets, and one outlying islet. It is about one degree ...
, and the Swedish National Genomics Infrastructure as their preferred
Scientific workflow system A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application. Scientific workflow syst ...
. These facilities have collaborated to share, harmonize, and curate bioinformatic pipelines, leading to the creation of the nf-core project. Led by Phil Ewels, at the Swedish National Genomics Infrastructure at the time, nf-core focuses on ensuring reproducibility and portability of pipelines across different hardware, operating systems, and software versions. In July 2020, Nextflow and nf-core received a grant from the
Chan Zuckerberg Initiative The Chan Zuckerberg Initiative (CZI) is an organization established and owned by Facebook founder Mark Zuckerberg and his wife Priscilla Chan with an investment of 99 percent of the couple's wealth from their Facebook shares over their lifetim ...
in recognition of their importance as open-source software. As of 2024, the nf-core organization hosts 117 Nextflow pipelines for the biosciences and more than 1382 process modules. With more than 1200 developers and scientists involved, it is the largest collaborative effort and community for developing bioinformatic data analysis pipelines.


By domain and research subject

Nextflow is the preferred tool for processing sequencing data and conducting genomic data analysis by domain and research subject. Over the past five years, numerous pipelines have been published for various applications and analyses in the genomics field. One notable use case is its role in pathogen surveillance during the
COVID-19 pandemic The COVID-19 pandemic (also known as the coronavirus pandemic and COVID pandemic), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), began with an disease outbreak, outbreak of COVID-19 in Wuhan, China, in December ...
. Swift and highly automated processing of raw data, variant analysis, and lineage designation were essential for monitoring the emergence of new virus variants and tracing their global spread. Nextflow-enabled pipelines played a crucial role in this effort. Nextflow also plays a significant role for the non-profit plasmid repository Addgene, using it to confirm the integrity of all deposited plasmids. In addition to genomics, Nextflow is gaining popularity in other domains of biomedical data processing, where complex workflows on large amounts of primary data are required. These domains include
Drug screening A drug test (also often toxicology screen or tox screen) is a technical analysis of a biological specimen, for example urine, hair, blood, breath, sweat, or oral fluid/saliva—to determine the presence or absence of specified parent drugs o ...
, Diffusion magnetic resonance imaging (dMRI) in radiology, and
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
data processing, the latter with a particular focus on proteomics


See also

*
Galaxy A galaxy is a Physical system, system of stars, stellar remnants, interstellar medium, interstellar gas, cosmic dust, dust, and dark matter bound together by gravity. The word is derived from the Ancient Greek, Greek ' (), literally 'milky', ...

Snakemake


References


External links

* {{Official website, https://nextflow.io/
nf-core project

Seqera Labs
Software using the Apache license Workflow languages Workflow technology Scientific Data Systems