Choosing a Bioinformatics Workflow Management System

 Bioinformatics workflow management systems (WfMSs) are designed to facilitate large-scale bioinformatics analysis. Typically, these workflows operate on files. They have two main elements: a language and an execution engine. Each WfMS has its own unique method of logging messages.

In the bioinformatics community, the most popular WfMSs are Nextflow and CWL. The two have similar semantics and engine features. However, they are often used in different environments.

Nextflow combines a workflow language with an execution engine. This allows for easy extensibility of pipelines. Additionally, it handles software dependencies. It also supports data streaming. With Nextflow, functions are treated as first class objects. These functions can be used in the same ways as variables.

Nextflow is built on Singularity, a Docker container platform. Another notable feature is the ability to execute workflows in different configurations. As a result, the workflow can be restarted from a task without refactoring the whole pipeline.

Nextflow offers a straightforward coding experience, making it a viable choice for scientists. Also, it offers excellent reproducibility. But the asynchronous execution of workflow steps can make interpreting logs difficult. Fortunately, a graphical user interface (GUI) is available.

Other features to consider when choosing a workflow system include scalability, portability, and compatibility. These features are particularly important in bioinformatics. Depending on the workflow, it is possible to perform large-scale analyses on heterogeneous computational resources.

A common characteristic of bioinformatics workflows is the movement of large datasets from one location to another. This is often done using Linux pipes. When the data is moving, it is vital to ensure that security and permissions are properly maintained. For example, it is important to check whether users have permission to access the filesystem.



Comments