Containerized Workflows¶
Workflow Management Using Snakemake¶
In this breakout session you’ll learn about snakemake, a workflow management system consisting of a text-based workflow specification language and a scalable execution environment. You will be introduced to the Snakemake workflow definition language and how to use the execution environment to scale workflows to compute servers and clusters while adapting to hardware specific constraints.
Snakemake is designed specifically for computationally intensive and/or complex data analysis pipelines. The name is a reference to the programming language Python, which forms the basis for the Snakemake syntax.
See Snakemake Slides here and pdf.
Setup¶
Right-Click the button below and login to CyVerse Discovery Environment for a quick launch of Snakemake VICE Jupyter lab app.
To run Snakemake inside a docker container, run the following on your instance with docker installed:
docker run -it --entrypoint bash cyversevice/jupyterlab-snakemake
git clone https://github.com/NBISweden/workshop-reproducible-research.git
cd workshop-reproducible-research/docker/
git checkout devel
ls
Dry-Run RNAseq Snakefile
snakemake -n
Run RNAseq Snakefile
snakemake
Why Snakemake¶
From where and how to get data for your analysis, to where and how to treat the outputs, workflow managers can help you achieve better scientific reproducibility and scalability. Once you learn to properly use Snakemake (or similar workflow management tools), keeping track of and sharing your work becomes second nature, not only saving you time whenever you need to re-run all or part of an analysis but helping you reduce errors that naturally get introduced whenever a non-automated activity is done (i.e., as part of the human condition of doing computational science and not being a bot!).