Welcome to Compi Application framework for portable computational pipelines

What is Compi?


Compi is an extremely simple application framework for portable computational pipelines. A computational pipeline can be seen as a set of processing steps that run one after one (or ocassionally in parallel if they are independent).

There are many fields where computational pipelines constitute the main architecture of applications, such as big data analysis or bioinformatics.

Many pipelines combine third party tools along with custom made processes, conforming the final pipeline. Compi is the framework helping you to create the final - and portable - application.

Features


Language agnostic

Compi pipelines are defined in XML, where each task is run in an external program written in any programming language. If your program is a mere combination of existing tools, you have not to program at all! Define the steps of the pipeline and its parameters and that's it!

Portable

Thanks to Docker, pipelines can be packaged in a docker image along with their dependencies, making them really portable! You have to complete the Dockerfile we provide to you with the dependencies your pipeline needs. Notwithstanding, you can also run your pipeline locally without Docker.

User interface generation

If you define your pipeline with Compi, a Command-Line user interface is provided for your users to run your pipeline. Thus, Compi is in fact an application framework in charge of dealing with user interaction, multithreaded pipeline execution and logging, saving your time with these aspects. You can focus in things that are really specific to your application.

Parallel execution

Compi pipelines run independent tasks in parallel, but you don't have to worry about parallel execution management, Compi does it for you! You can also restart your pipeline from any step, without repeating previous steps that may have completed in previous runs.

Basics


Pipeline Specification (XML)

Pipelines are specified in an XML file. The main purpose of this file is to define which atomic tasks your pipeline has, their dependencies (those tasks that need to be run before each task), and the parameters the user can specify and that will be used inside the tasks.

This file contains:

  • <task> elements, which define your pipeline steps, their dependencies are defined with the after attribute. Inside the element, you place the code to be run when the task starts.
  • <param> elments, which declare and describe the parameters of your pipeline. Inside any task you can use a parameter with ${parameter_name}.

<?xml version="1.0" encoding="UTF-8"?>
<pipeline xmlns="http://www.esei.uvigo.es/compi-pipeline"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <params>
        <param name="name" shortName="n">Your name</param>
        <param name="output" shortName="o">Output file</param>
    </params>
    <tasks>
        <task id="greetings" params="name output">
            echo "Hi ${name}" > ${output}
        </task>
        <task id="bye" after="greetings" params="name output" 
          interpreter="/usr/bin/perl -e &quot;${task_code}&quot;">
            my $filename = $ENV{'output'};
            open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
            print $fh "bye ".$ENV{'name'}."\n"
        </task>
    </tasks>
</pipeline>
                    

Hello World!

Create a pipeline is very easy. You need the Compi Development Kit (compi-dk command). Here, you can see how to install compi-dk in your system, create a pipeline project and build the docker image.

1. Download and install compi-dk

wget http://static.sing-group.org/compi/downloads/compi-dk-1.1-installer.bsx
sudo ./compi-dk-1.1-installer.bsx

2. Create a new pipeline project

compi-dk new-project -p ~/my-pipeline -n my-pipeline
cd ~/my-pipeline

Here you could edit your pipeline.xml and the Dockerfile to include dependencies

3. Build the Docker image

compi-dk build

Now you have a new docker image named "my-pipeline".

5. Stare at the pipeline's CLI!

docker run my-pipeline

Since you not provide any pipeline parameter, it will bring up the help, showing all parameters. This is the CLI of your pipeline!

4. Run it!

docker run -v /tmp:/data my-pipeline -t 10 -- -p1 param-one -p2 param-two -o /data/output.txt -l one,two,three
cat /tmp/output.txt

Please, note that pipeline parameters are passed after '--'. The -t option establishes the number of threads.

Downloads


Date Description Version Link
July 3, 2018 Compi Development Kit - Self-extracted installer (Linux 64-bit) 1.1 compi-dk-1.1-installer.bsx