Mass-Up Manual
This manual offers you a complete help about all the Mass-Up functions and is organized into four sections: (i) the load menu, (ii) the preprocess menu, (iii) the analysis menu, and (iv) the Mass-Up main datatypes.
The load menu contains the following operations (click on each one to navigate to its help):
- Import files.
- Import configuration file.
- Load raw data.
- Load peak list.
- Load matched peak list.
- Load discriminant peak list.
- Load hierarchical clustering results.
- Load biclustering results.
- Save data.
- Interoperability.
The preprocess menu contains the following operations:
The analysis menu contains the following operations:
- Quality Control.
- Intra-label biomarker discovery.
- Inter-label biomarker discovery.
- Principal Component Analysis (PCA).
- Hierarchical Clustering.
- Biclustering analysis.
- Classification Analysis.
Finally, these are the most relevant datatypes that you will manage in Mass-Up, using the previous operations.
Contents
- Import files.
- Import configuration file.
- Load raw data.
- Load peak list.
- Load matched peak list.
- Load discriminant peak list.
- Load hierarchical clustering results.
- Load biclustering results.
- Save data.
- Interoperability.
Import Files 
This operation loads several files, allowing the user to design the experiment, that is, setting the number of samples, the labels and assigning files to each sample.
This operation follows the workflow of the image bellow.
Import Files Workflow
Usage
You can execute this operation by clicking the button or following the menu File/Import/Import dataset.
First, a dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).
Import Dataset File Dialog - Step 1
If you choose Labeled, the next dialog will allow you to choose number of labels.
Import Dataset Dialog - Step 2 (Labeled) - Number of labels
If you choose Unlabeled, the next dialog will allow you to choose number of samples.
Import Dataset Dialog - Step 2 (Unabeled) - Number of samples
And, if you have chosen Labeled, a third dialog will allow you introduce the names of the labels and the number of samples per label.
Import Dataset Dialog - Step 3 (Labeled) - Label names and number of samples
Finally, according to your choices, a dialog will appear allowing you to design your experiment.
In order to design an experiment, you have to:
- Select the data type: raw spectra, peak list or aligned peak list.
- Add files to your experiment by clicking the
button.
- Assign files to samples. To do that, you have two alternatives:
- Select one or more files and drag and drop them into the corresponding sample.
- Select one or more files and use the
button in order to automatically distribute them to the samples shown on the right side of the dialog. See below for a detailed description of this feature.
- Assign names to samples. When the autofill button is used, samples are also named automatically by taking the name of the first file.
When the experiment design is complete, you can load the data by clicking the Load button. This option also allows you to store the experiment design configuration as a Mass-Up Configuration File (.muc) for further uses. The format of this .muc file is explained here.
The autofill button
The button allows you to automatically distribute the selected files to the samples shown on the right side of the dialog. This feature is particularly useful when you need to import large amounts of files and distribute them into different classes and samples, allowing you to save a lot of time.
To illustrate this feature, lets suppose a dataset with:
- Two conditions: A and B.
- Five samples per class.
- Fifty raw data files in .mzML format, shown in the image below.

Ten peak list files added to the dialog.
First, you have to select the twenty five files corresponding to class A and click the button in order to automatically fill the five samples of this class. The autofill function assumes that each sample must have the same number of files so, in this case, five files are assigned to each sample. As you can see in the image, the names of the samples are automatically taken from associated files by removing file extensions.

Five peak list files automatically distributed to class A samples.
And finally, you have to select the twenty five files corresponding to class B and click the button again in order to automatically fill the five samples of this class.
Import Configuration File 
This operation loads a Mass-up Configuration file (.muc) containing a distribution of files in samples Import Operation.
Usage
You can execute this operation by clicking the button or following the menu File/Import/Import from configuration File.
A dialog will appear allowing you to choose *.muc files.
Import Configuration File Dialog
This operation reads the saved configuration and loads all the data.
Load Raw Data 
This operation loads a several raw files (ie. .mzXML, .mzML or .csv) into one or several Raw Data elements.
Usage
You can execute this operation by clicking the button or following the menu File/Load Data/Raw Data.
A dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).
Load Raw Data Dialog
Labeled experiment
If you choose Labeled, the next dialog will allow you to choose the folders containing the samples of your experiment.
Load Raw Data Dialog - Labeled
It is important that to note that you must add one folder per label in your experiment. At the same time, each label folder, must contain one folder per sample. And finally, the sample folders contain the raw files.
For example, imagine that you have two labels: CONDITION-A and CONDITION-B. For each label you have three samples and for each sample you have three raw spectra. You must have the following organization in your file system:
/CONDITION-A /SAMPLE-A.1 /SAMPLE-A.1.1.mzXML /SAMPLE-A.1.2.mzXML /SAMPLE-A.1.3.mzXML /SAMPLE-A.2 /SAMPLE-A.2.1.mzXML /SAMPLE-A.2.2.mzXML /SAMPLE-A.2.3.mzXML /SAMPLE-A.3 /SAMPLE-A.3.1.mzXML /SAMPLE-A.3.2.mzXML /SAMPLE-A.3.3.mzXML /CONDITION-B /SAMPLE-B.1 /SAMPLE-B.1.1.mzXML /SAMPLE-B.1.2.mzXML /SAMPLE-B.1.3.mzXML /SAMPLE-B.2 /SAMPLE-B.2.1.mzXML /SAMPLE-B.2.2.mzXML /SAMPLE-B.2.3.mzXML /SAMPLE-B.3 /SAMPLE-B.3.1.mzXML /SAMPLE-B.3.2.mzXML /SAMPLE-B.3.3.mzXML
Unlabeled experiment
If you choose Unlabeled, the next dialog will allow you to choose the folder containing the samples of your experiment.
Load Raw Data Dialog - Unlabeled
It is important that to note that you must add one folder that contains all the samples in your experiment. The sample folders contain the raw files.
For example, imagine that you have six samples and for each sample you have three raw spectra. You must have the following organization in your file system:
/SAMPLE-1 /SAMPLE-1.1.mzXML /SAMPLE-1.2.mzXML /SAMPLE-1.3.mzXML /SAMPLE-2 /SAMPLE-2.1.mzXML /SAMPLE-2.2.mzXML /SAMPLE-2.3.mzXML /SAMPLE-3 /SAMPLE-3.1.mzXML /SAMPLE-3.2.mzXML /SAMPLE-3.3.mzXML /SAMPLE-4 /SAMPLE-4.1.mzXML /SAMPLE-4.2.mzXML /SAMPLE-4.3.mzXML /SAMPLE-5 /SAMPLE-5.1.mzXML /SAMPLE-5.2.mzXML /SAMPLE-5.3.mzXML /SAMPLE-6 /SAMPLE-6.1.mzXML /SAMPLE-6.2.mzXML /SAMPLE-6.3.mzXML
Load Peak List 
This operation loads a several peak list files (.csv) into one or several Peak List elements.
Usage
You can execute this operation by clicking the button or following the menu File/Load Data/Peak List.
A dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).
Load Peak List Dialog
Labeled experiment
If you choose Labeled, the next dialog will allow you to choose the folders containing the samples of your experiment.
Load Peak List Dialog - Labeled
It is important that to note that you must add one folder per label in your experiment. At the same time, each label folder, must contain one folder per sample. And finally, the sample folders contain the peak list files.
For example, imagine that you have two labels: CONDITION-A and CONDITION-B. For each label you have three samples and for each sample you have three peak list. You must have the following organization in your file system:
/CONDITION-A /SAMPLE-A.1 /SAMPLE-A.1.1.csv /SAMPLE-A.1.2.csv /SAMPLE-A.1.3.csv /SAMPLE-A.2 /SAMPLE-A.2.1.csv /SAMPLE-A.2.2.csv /SAMPLE-A.2.3.csv /SAMPLE-A.3 /SAMPLE-A.3.1.csv /SAMPLE-A.3.2.csv /SAMPLE-A.3.3.csv /CONDITION-B /SAMPLE-B.1 /SAMPLE-B.1.1.csv /SAMPLE-B.1.2.csv /SAMPLE-B.1.3.csv /SAMPLE-B.2 /SAMPLE-B.2.1.csv /SAMPLE-B.2.2.csv /SAMPLE-B.2.3.csv /SAMPLE-B.3 /SAMPLE-B.3.1.csv /SAMPLE-B.3.2.csv /SAMPLE-B.3.3.csv
Unlabeled experiment
If you choose Unlabeled, the next dialog will allow you to choose the folder containing the samples of your experiment.
Load Peak List Dialog - Unlabeled
It is important that to note that you must add one folder that contains all the samples in your experiment. The sample folders contain the peak list files.
For example, imagine that you have six samples and for each sample you have three peak lists. You must have the following organization in your file system:
/SAMPLE-1 /SAMPLE-1.1.csv /SAMPLE-1.2.csv /SAMPLE-1.3.csv /SAMPLE-2 /SAMPLE-2.1.csv /SAMPLE-2.2.csv /SAMPLE-2.3.csv /SAMPLE-3 /SAMPLE-3.1.csv /SAMPLE-3.2.csv /SAMPLE-3.3.csv /SAMPLE-4 /SAMPLE-4.1.csv /SAMPLE-4.2.csv /SAMPLE-4.3.csv /SAMPLE-5 /SAMPLE-5.1.csv /SAMPLE-5.2.csv /SAMPLE-5.3.csv /SAMPLE-6 /SAMPLE-6.1.csv /SAMPLE-6.2.csv /SAMPLE-6.3.csv
Load Matched Peak List 
This operation loads a several matched peak list files (.csv) into one several Unlabeled Matched Peak List elements or into an Labeled Matched Peak List Set, depending on the experiment type.
Usage
You can execute this operation by clicking the button or following the menu File/Load Data/Matched Peak List.
A dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).
Load Matched Peak List Dialog
Labeled experiment
If you choose Labeled, the next dialog will allow you to choose the folders containing the samples of your experiment.
Load Matched Peak List Dialog - Labeled
The operation loads one Labeled Matched Peak List Set containing one Labeled Peak List element for each label.
It is important that to note that you must add one folder per label in your experiment. At the same time, each label folder, must contain one folder per sample. And finally, the sample folders contain the peak list files.
For example, imagine that you have two labels: CONDITION-A and CONDITION-B. For each label you have three samples and for each sample you have three peak list. You must have the following organization in your file system:
/CONDITION-A /SAMPLE-A.1 /SAMPLE-A.1.1.csv /SAMPLE-A.1.2.csv /SAMPLE-A.1.3.csv /SAMPLE-A.2 /SAMPLE-A.2.1.csv /SAMPLE-A.2.2.csv /SAMPLE-A.2.3.csv /SAMPLE-A.3 /SAMPLE-A.3.1.csv /SAMPLE-A.3.2.csv /SAMPLE-A.3.3.csv /CONDITION-B /SAMPLE-B.1 /SAMPLE-B.1.1.csv /SAMPLE-B.1.2.csv /SAMPLE-B.1.3.csv /SAMPLE-B.2 /SAMPLE-B.2.1.csv /SAMPLE-B.2.2.csv /SAMPLE-B.2.3.csv /SAMPLE-B.3 /SAMPLE-B.3.1.csv /SAMPLE-B.3.2.csv /SAMPLE-B.3.3.csv
Unlabeled experiment
If you choose Unlabeled, the next dialog will allow you to choose the folder containing the samples of your experiment.
Load Matched Peak List Dialog - Unlabeled
It is important that to note that you must add one folder that contains all the samples in your experiment. The sample folders contain the peak list files.
For example, imagine that you have six samples and for each sample you have three peak lists. You must have the following organization in your file system:
/SAMPLE-1 /SAMPLE-1.1.csv /SAMPLE-1.2.csv /SAMPLE-1.3.csv /SAMPLE-2 /SAMPLE-2.1.csv /SAMPLE-2.2.csv /SAMPLE-2.3.csv /SAMPLE-3 /SAMPLE-3.1.csv /SAMPLE-3.2.csv /SAMPLE-3.3.csv /SAMPLE-4 /SAMPLE-4.1.csv /SAMPLE-4.2.csv /SAMPLE-4.3.csv /SAMPLE-5 /SAMPLE-5.1.csv /SAMPLE-5.2.csv /SAMPLE-5.3.csv /SAMPLE-6 /SAMPLE-6.1.csv /SAMPLE-6.2.csv /SAMPLE-6.3.csv
Discriminant Peak List 
This operation loads a discriminant peak list file (.csv) into one Discriminant Peak List element.
Usage
You can execute this operation by clicking the button or following the menu File/Load Data/Discriminant Peak List.
A dialog will appear allowing you to choose the file that contain the discriminant peak list.
Load Discriminant Peak List Dialog
Load Hierarchical Clustering 
This operation loads the results of the Hierarchical Clustering operation.
Usage
You can execute this operation by following the menu File/Load Analysis/Clustering.
A dialog will appear allowing you to choose the folder containing the results of the Hierarchical Clustering operation.
Load Clustering Results dialog
Load Biclustering 
This operation loads the results of the Biclustering operation.
Usage
You can execute this operation by following the menu File/Load Analysis/Biclustering.
A dialog will appear allowing you to choose the folder containing the results of the Biclustering operation.
Load Biclustering Results dialog
Save Data 
This operation saves data.
Usage
You can execute this operation by clicking the button or following the menu File/Save Data.
A dialog will appear allowing you to choose the data that you want to save.
Save Data dialog
The first combobox allows you to select the type of data (Labeled Raw Data, Labeled Peak List, Labeled Aligned Peak List,
Unlabeled Raw Data, Unlabeled Peak List or Unlabeled Aligned Peak List) that is showed in the second combobox.
Note that you only can save data of the same type in one operation.
Interoperability
Import data saved with Mass-Up in external applications
The Save Data operation allows you to save your data into .csv files. This may be interesting if you have preprocessed a raw dataset and want to store the preprocessed data for further analysis with Mass-Up or other applications, such as R.
R
If you want to load a .csv spectra files with R, the easiest way is to use the MALDIquant and MALDIquantForeign packages, which allow you to import spectra from .csv files with the import function.
Import a single .csv spectra file with MALDIquant
Consider that you have a file called spectra.csv with the following content:
Mass,Intensity 72.38649,4.7928915 92.86101,11.554423 103.110954,23.025375 115.28742,8.338575 135.57188,76.37024 137.58994,57.889793
You can import this spectra into a list by running the following R commands:
library("MALDIquantForeign"); spectra <- import("spectra.csv");
Note that spectra is a list so if you type spectra[[1]] in the R console, you will see the loaded data:
> spectra[[1]] S4 class type : MassSpectrum Number of m/z values : 6 Range of m/z values : 72.386 - 137.59 Range of intensity values: 4.793e+00 - 7.637e+01 Memory usage : 1.523 KiB File : /tmp/spectra.csv
Import a dataset with MALDIquant
Consider that you have saved your preprocessed dataset into a directory called dataset, which has three condition sub-directories called:
- HEALTHY: which has two samples (HA and HA) with five replicates each one.
- MYELOMA: which has five samples (MA, MB, MC, MD and ME) with five replicates each one.
- LYMPHOMA: which has five samples (LA, LB, LC, LD and LE) with five replicates each one.
If you want to load all the spectra into a list, you just have to run the following R commands:
library("MALDIquantForeign"); spectra <- import("dataset");
Within this command, all the spectra are loaded into a plain list so you should process this list in order to extract the spectra from the sample or condition that you want. Let's consider that you want to create one separated list for each sample. The first thing you can do is to get the sample names by reading the directory names:
sampleNames <- list.dirs(path="dataset", recursive=TRUE) sampleNames <- sampleNames[c(3:4,6:10,12:16)] sampleNames <- gsub(".//HEALTHY/", "", sampleNames) sampleNames <- gsub(".//LYMPHOMA/", "", sampleNames) sampleNames <- gsub(".//MYELOMA/", "", sampleNames)Now, in sampleNames you have a list with the names of your samples:
> sampleNames [1] "HA" "HB" "LA" "LB" "LC" "LD" "LE" "MA" "MB" "MC" "MD" "ME"Since all samples have the same number of replicates, it is easy to retrieve them with the following code snippet: if you want to get a list with the spectra of the ith sample, you just have to set the ith variable:
ith <- 1 # A value between 1 and length(sampleNames) spectraIndex <- (ith-1)*5 sample.name <- sampleNames[ith] sample.spectra <- spectra[spectraIndex:(spectraIndex+5)]And now, you have the information of the first sample stored in sample.name and sample.spectra:
> sample.name [1] "HA" > sample.spectra [[1]] S4 class type : MassSpectrum Number of m/z values : 155 Range of m/z values : 656.165 - 3349.394 Range of intensity values: 3e-03 - 1e+00 Memory usage : 3.938 KiB File : /tmp/dataset/HEALTHY/HA/spectrum1.csv [[2]] S4 class type : MassSpectrum Number of m/z values : 144 Range of m/z values : 656.152 - 3349.637 Range of intensity values: 5e-03 - 1e+00 Memory usage : 3.766 KiB File : /tmp/dataset/HEALTHY/HA/spectrum2.csv [[3]] S4 class type : MassSpectrum Number of m/z values : 116 Range of m/z values : 656.173 - 3348.615 Range of intensity values: 2e-03 - 1e+00 Memory usage : 3.328 KiB File : /tmp/dataset/HEALTHY/HA/spectrum3.csv [[4]] S4 class type : MassSpectrum Number of m/z values : 139 Range of m/z values : 656.162 - 3349.348 Range of intensity values: 4e-03 - 1e+00 Memory usage : 3.688 KiB File : /tmp/dataset/HEALTHY/HA/spectrum4.csv [[5]] S4 class type : MassSpectrum Number of m/z values : 118 Range of m/z values : 656.177 - 3349.325 Range of intensity values: 7e-03 - 1e+00 Memory usage : 3.359 KiB File : /tmp/dataset/HEALTHY/HA/spectrum5.csv
Mass-Up Configuration File
Here we describe in detail the Mass-up Configuration File so that you can create your own .muc files to import datasets into Mass-Up.
<?xml version="1.0" encoding="UTF-8"?> <!-- Dataset definition indicating whether it is labeled or not and the data type: RAW Spectra, Peak lists or Aligned peak lists --!> <massupdatasetloader labeled="true" type="RAW Spectra"> <!-- Absolute paths to data files --!> <files> <file>D:\Mass-Up-Data\sample1_replicate1.csv</file> <file>D:\Mass-Up-Data\sample1_replicate2.csv</file> <file>D:\Mass-Up-Data\sample1_replicate3.csv</file> <file>D:\Mass-Up-Data\sample2_replicate1.csv</file> <file>D:\Mass-Up-Data\sample2_replicate2.csv</file> <file>D:\Mass-Up-Data\sample2_replicate3.csv</file> <file>D:\Mass-Up-Data\sample3_replicate1.csv</file> <file>D:\Mass-Up-Data\sample3_replicate2.csv</file> <file>D:\Mass-Up-Data\sample3_replicate3.csv</file> <file>D:\Mass-Up-Data\sample4_replicate1.csv</file> <file>D:\Mass-Up-Data\sample4_replicate2.csv</file> <file>D:\Mass-Up-Data\sample4_replicate3.csv</file> </files> <!-- Names of samples in dataset --!> <samplenames> <samplename>sample1</samplename> <samplename>sample2</samplename> <samplename>sample3</samplename> <samplename>sample4</samplename> </samplenames> <!-- Names of classes (labels) in dataset --!> <classes> <class>A</class> <class>B</class> </classes> <!-- Mappings of files to samples --!> <filesamplemappings> <mapping>0</mapping> <mapping>0</mapping> <mapping>0</mapping> <mapping>1</mapping> <mapping>1</mapping> <mapping>1</mapping> <mapping>2</mapping> <mapping>2</mapping> <mapping>2</mapping> <mapping>3</mapping> <mapping>3</mapping> <mapping>3</mapping> </filesamplemappings> <!-- Mappings of samples to classes --!> <sampleclassmappings> <mapping>0</mapping> <mapping>0</mapping> <mapping>1</mapping> <mapping>1</mapping> </sampleclassmappings> </massupdatasetloader>
The file above corresponds to a .muc file with:
- <files>: twelve RAW spectra files in .csv format.
- <samplenames>: four samples (sample1, sample2, sample3 and sample4).
- <classes>: two classes (A, B).
Each file must be mapped to a sample and each sample must be mapped to a class, using <filesamplemappings> and <sampleclassmappings> respectively. The following image illustrates this mapping.
Mass-Up Configuration File
Labeled and unlabeled experiments
The file format defined corresponds to a labeled experiment. If your experiment is unlabeled and there are no classes, just omit <classes> and <sampleclassmappings> blocks.
Processing large datasets
Some experiments can involve hundreds to thousands of MALDI-TOF MS spectra. The maximum number of spectra that Mass-Up is able to handle simultaneously depend on both the number of spectra and the size (i.ei. number of peaks) of each one.
If you are experiencing problems processing large datasets, here are some general advises to deal with them:
- The memory used by Mass-Up can be increased by editing the run.bat or run.sh that can be found the installation directory. In this file you can change the value of the MEMORY parameter. By default, this parameter is set to -Xmx2G, which means that Mass-Up will use up to 2 Gigabytes of RAM. To process large datasets, this amount can be increased up to a value near to computer's available RAM (for example, if you have 8Gb of RAM, you can set this parameter to -Xmx6G or -Xmx8G).
- Do not keep loaded spectra in Mass-Up that you don't need. If you preprocess your dataset with Mass-Up, don't keep the raw spectra in the Mass-Up's clipboard if it is not needed, because raw spectra use much more memory than the processed spectra. This is done by default by the preprocessing operations so, if you don't change this option in the operations, you don't have to worry.
- If Mass-Up can't load your raw dataset all at once, you can preprocess your spectra by batches. A raw dataset requires much more memory than a peak list. In addition, the preprocessing operations previous to peak matching can be done independently for each sample. Therefore, you can load only one batch of your samples, preprocess them and then store them. Once you have preprocessed all your dataset batchets, you can load the full preprocessed dataset (i.e. peak lists) and perform the peak matching.
Contents
Preprocess data 
This operation preprocesses one or more Raw Data elements, applying the selected methods to each spectra. If you apply peak detection, this operation returns one or more Peak List elements. Otherwise, it returns one or more Raw Data elements.
Usage
You can execute this operation by clicking the button or following the menu Preprocess/Preprocess data.
A dialog will appear allowing you to select the following options:
- Raw Data: previously loaded Raw Data.
- Intensity transformation (scaling) method: None (not apply), Logaritmic, Logaritmic with base 10, Logaritmic with base 2 or Square root.
- Smoothing method: None (not apply), moving average window (MALDIquant) or Savitzky Golay (MALDIquant).
- Baseline correction method: None (not apply), Top Hat, Snip, Median or Convex Hull (all from MALDIquant).
- Standardization method: None (not apply), total ion current (TIC), Probabilistic Quotient Normalization (PQN) or median (all from MALDIquant).
- Peak detection method: None (not apply), MassSpecWavelet or MALDIquant.
- If you choose "None", no peak detection will be performed. However, you can choose whether you want to convert the input data into a peak list (which can be useful if peak detection has been already applied to the input raw data) or not.
- If you choose "MassSpecWavelet", you also have to set:
- Signal to noise ratio: SNR threshold used to identify peaks. Default is 6.
- Peak scale range: The CWT scale range of the peak. Default is 2.
- Amplitude threshold: The minimum peak amplitude. Default is 0.0001.
- If you choose "MALDIquant", you also have to set:
- Signal to noise ratio: SNR threshold used to identify peaks. Default is 3.
- Half window size: The resulting window reaches from mass[currentIndex-halfWindowSize] to mass[currentIndex+halfWindowSize]. A local maximum have to be the highest one in the given window to be recognized as peak. Default is 60.
- Minimum peak intensity: a non-negative number indicating the minimum peak intensity to filter out peaks. Peaks with an intensity lower than this threshold are discarded.
- Keep original data: if you select this option, original data is maintained in the cilpboard. Otherwhise (by default), original data is removed.
Save Data dialog
Peak Matching 
This operation matches one or more Peak List elements, following the workflow presented bellow.
Peak Lists Alignment Workflow
Usage
You can execute this operation by clicking the button or following the menu Preprocess/Align Peak Lists.
A dialog will appear allowing you to select the following options:
- Peak Lists to match: previously loaded Peak Lists. You can choose between Labeled or Unlabeled Peak Lists.
- Intra-sample matching method (optional): None (not apply), the Forward algorithm or MALDIquant.
If you choose the Forward algorithm, you also have to set:- Tolerance type: PPM (Points per million), Absolute or Relative. Default is PPM.
- Tolerance value: tolerance used together with the tolerance type to consider two peaks the same. Default is 300.
- Reference type: First, Median, Last, Average, Average up or Average down. Default is Average (AVG).
- Tolerance value: tolerance used to consider two peaks the same. Default is 0.002.
- Inter-sample matching method: Forward algorithm or MALDIquant.
If you choose the Forward algorithm, you also have to set:- Tolerance type: PPM (Points per million), Absolute or Relative. Default is PPM.
- Tolerance value: tolerance used together with the tolerance type to consider two peaks the same. Default is 300.
- Reference type: First, Median, Last, Average, Average up or Average down. Default is Average (AVG).
- Tolerance value: tolerance used to consider two peaks the same. Default is 0.002.
Align Peak Lists dialog
Contents
- Quality Control.
- Intra-label biomarker discovery.
- Inter-label biomarker discovery.
- Principal Component Analysis (PCA).
- Hierarchical Clustering.
- Biclustering analysis.
- Classification Analysis.
Quality Control 
This operation performs Quality Control on Peak Lists.
Usage
You can execute this operation by clicking the button or following the menu Analysis/Peak List Quality Control.
A dialog will appear allowing you to select the following options:
- Peak Lists: previously loaded Peak Lists. You can choose between Labeled or Unlabeled Peak Lists.
- Intra-sample alignment method: Forward algorithm or MALDIquant.
If you choose the Forward algorithm, you also have to set:- Tolerance type: PPM (Points per million), Absolute or Relative. Default is PPM.
- Tolerance value: tolerance used together with the tolerance type to consider two peaks the same. Default is 300.
- Reference type: First, Median, Last, Average, Average up or Average down. Default is Average (AVG).
- Tolerance value: tolerance used to consider two peaks the same. Default is 0.002.
- Sample: the name of the sample.
- Class: if available, the label of the sample.
- Spectra: number of spectra (replicates) contained in the sample.
- Min. Mass: minimun m/z value present in the sample.
- Max. Mass: maximum m/z value present in the sample.
- Min. Masses: minimum number of peaks of the spectrum with less peaks in the sample.
- Max. Masses: maximum number of peaks of the spectrum with less peaks in the sample.
- Avg. Masses: average number of peaks of the spectra in the sample.
- Std. Dev.: standard deviation of the number of peaks of the spectra in the sample.
- POPXX: where XX is a percentage of the number of spectra. Count of the number of peaks with Percentage of Presence (POP) XX.
- Align. Masses: count of masses that have been matched across the spectra in the sample.
- Split >= XX: percentage of masses that have a POP higher or equal to XX.
- Count >= XX: count of masses that have a POP higher or equal to XX.
- Spectra: the name of the spectra.
- Sample: sample which the spectra belongs to.
- Class: if available, the label of the spectra.
- Masses count: number of peaks of the spectra.
- Min. Mass: minimum m/z value present in the spectra.
- Max. Mass: maximum m/z value present in the spectra.
- Min. Int.: minimum intensity value present in the spectra.
- Max. Int.: maximum intensity value present in the spectra.
Peak List Quality Control dialog
Output and visualization
As a result of this operation, a new Quality Control Result is added to the clipboard.
Every time a new Quality Control Result is accessible, it can be inspected with the Quality Control View.
This view presents to tabs: the samples tab, with information relative to the samples, and the replicates tab, with information relative to the replicates.
The samples tab has one data table and two chart tabs: the global analysis and the sample analysis. If the Peak Lists used are Labeled, there will be a third chart tab called Labels analysis.
The data table has as many rows as samples. For each sample row, it presents the following columns:
The global analysis chart tab presents a box plot where the Count >= XX columns are the categories.
Global analysis chart
The labels analysis chart tab presents a box plot where the Count >= XX columns are the categories and moreover, there is one serie per label in the data set.
Labels analysis chart
The sample analysis chart tab presents a box plot where each sample is a serie. For a given sample, its vector of values is constructed from the POPXX columns. Let a Sample A have POP20 = 2, POP40 = 1, POP60 = 3, POP80 = 2 and POP100 = 1, its serie data will be {20, 20, 40, 60, 60, 60, 80, 80, 100}.
Samples analysis chart
The replicates tab has one data table and chart two tabs: the global analysis and the replicates analysis. If the Peak Lists used are Labeled, there will be a third chart tab called Labels analysis.
The data table has as many rows as replicates in the data set. For each replicate row, it presentes the following columns:
Global analysis chart
The labels analysis chart tab is equal to the global analysis chart with one addition: it has one serie per label in the data set.
Labels analysis chart
The replicates analysis chart tab is the same as the sample analysis chart with the difference that here there is one serie per replicate instead of one serie per sample.
Replicates analysis chart
Export results
You can export the data tables by clicking the right-corner button of the tables () and choosing the "Export to CSV" option. You can find more information about this at section Export results.
You can also select the entire table (by pressing keys CTRL+A) or a specific range of cells and copy its contents by pressing keys CTRL+C. Then, you can paste this information in any other programs such as calc sheets or text editors, either pressing keys CTRL+V or the software specific options.
You can export charts by right-clicking on them and then selecting the "Save as.." option. A dialog will appear allowing you to select the file to save the chart.
Biomarker discovery (Intra-class Analysis) 
This operation performs a Biomarker Discovery Analysis on a single Aligned Peak Lists.
Usage
You can execute this operation by clicking the button or following the menu Analysis/Biomarker discovery (Intra-class Analysis).
A dialog will appear allowing you to select the following options:
- Data: a single Labeled/Unlabeled Aligned Peak List.
Biomarker discovery (Intra-class Analysis) dialog
Output and visualization
As a result of this operation, a new Labeled/Unlabeled Intersection is added to the clipboard.
Every time a new Labeled/Unlabeled Intersection Result is accessible, it can be inspected with the Biomarker Discovery View.
Biomarker Discovery View
This view allows the user to filter out peaks in two modes: individual or group. In the individual mode, there are two types of filter:
- Difference: to find those peaks whose intra-strain presence is abnormally different in one sample (in bold) vs. others. user can customize this difference (min) and how many of the remaining samples should be different.
- Thresholds: to find those peaks which are present (or not) in one sample (in bold) vs. the others. User can customize the meaning of 'presence' and the meaning of 'absence' (in percentage of replicas).
Export results
You can export the results of this view by clicking the button. A dialog will appear allowing you to select the file to save the results.
You can also select the entire table (by pressing keys CTRL+A) or a specific range of cells and copy its contents by pressing keys CTRL+C. Then, you can paste this information in any other programs such as calc sheets or text editors, either pressing keys CTRL+V or the software specific options.
Biomarker discovery (Inter-class Analysis) 
This operation performs a Biomarker Discovery Analysis on a Labeled Aligned Peak List set.
Usage
You can execute this operation by clicking the button or following the menu Analysis/Biomarker discovery (Inter-class Analysis).
A dialog will appear allowing you to select the following options:
- Labeled Matched Peak List: you have to select two or more Labeled Matched Peak Lists contained in the same Labeled Matched Peak List Set.
Biomarker discovery (Inter-class Analysis) dialog
Output and visualization
As a result of this operation, a new Inter Labeled Intersection is added to the clipboard.
Every time a new Inter Labeled Intersection is accessible, it can be inspected with the Biomarker Discovery View. This view has presents three tabs: (i) the analysis tab,
(ii) the intra-strain matching tab and (iii) the inter-strain matching tab.
The analysis tab shows a matrix where columns are the samples and rows are the peaks. For each peak (row) there two extra columns: its p and q-values. Each cell value represents the percentage of presence of the row peak in the column sample.
Using the presence/absence thresholds the user can modify the meaning of absence (by default it is 0%) and presence (by default it is 100%). Changing this values may affect the discriminant power of the peaks.
Biomarker Discovery View - Analysis tab
The intra-strain tab has, at the same time, one tab per sample in the Labeled Matched Peak List used. A chart and a matching table is generated for each sample, showing a table with the peaks of each replicates.
Biomarker Discovery View - Intra-strain Matching tab
The inter-strain tab shows a a matrix where the rows are all the peaks present in all the samples of the dataset and the columns are all the samples of the dataset.
Biomarker Discovery View - Inter-strain Matching tab
Export results
You can export the data tables by clicking the right-corner button of the tables () and choosing the "Export to CSV" option. You can find more information about this at section Export results.
You can also select the entire table (by pressing keys CTRL+A) or a specific range of cells and copy its contents by pressing keys CTRL+C. Then, you can paste this information in any other programs such as calc sheets or text editors, either pressing keys CTRL+V or the software specific options.
Principal Component Analysis 
This operation performs a Principal Component Analysis on Labeled Aligned Peak Lists.
Usage
You can execute this operation by clicking the button or following the menu Analysis/Principal Component Analysis.
A dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).
Principal Component Analysis experiment type selection dialog
After selecting the experiment type, a dialog will appear allowing you to select the following options:
- Data: Labeled Matched Peak List Set or Unlabeled Matched Peak List.
- Max. Components: maximum number of PC to retain.
- Variance Covered: amount of variance to account for when retaining PC.
Principal Component Analysis dialog
Output and visualization
As a result of this operation, a new PCA Data is added to the clipboard.
Every time a new PCA Data is accessible, it can be inspected with the 3D Principal Component Analysis View.
Principal Component Analysis View
The upper part of this view shows you the 3D representation of the data, using the principal components selected using the controls in the right.
The bottom part of this view has two tabs:
- Principal components table: a table showing the input samples transformed into its principal components.
- Summary: the analysis summary, showing you the correlation matrix obtained, the eigenvalues and the eigenvectors.
Principal Component Analysis View - Summary
Export results
You can export the graphical representation of the PCA by clicking the button. A dialog will appear allowing you to select the file to save the image.
You can export the principal components table by clicking the right-corner button of the tables () and choosing the "Export to CSV" option. You can find more information about this at section Export results.
You can also select the entire table (by pressing keys CTRL+A) or a specific range of cells and copy its contents by pressing keys CTRL+C. Then, you can paste this information in any other programs such as calc sheets or text editors, either pressing keys CTRL+V or the software specific options.
Finally, you can also export the analysis summary by selecting all the text and copy this selection.
Clustering Analysis 
This operation performs a Clustering Analysis on Aligned Peak Lists.
Usage
You can execute this operation by clicking the button or following the menu Analysis/Clustering Analysis.
A dialog will appear allowing you to select the following options:
- Data: Labeled Aligned Peak List Set or Unlabeled Peak List.
- Minimum variance: peaks with a variance lower or equals to this value are removed.
- Peak list: if provided, only these peaks will be analyzed.
- Cluster Reference Value: which value use when comparing two clusters.
- Distance Function: function used to measure the distance between to clusters.
- Conversion Values: presence, percentage of presence or intensity.
- Intra-sample Minimum Presence: when using the percentage of presence, a value between 0 and 100 indicating the minimum percentage of presence of a peak to be considered.
- Deep Clustering: check if you want to perform a spectrum-based clustering instead of sample-based.
- Output directory: optionally, a directory to store the clustering results. If it is not provided, a temporary directory will be used.
Clustering Analysis dialog
Output and visualization
As a result of this operation, a new Clustering is added to the clipboard.
Every time a new Clustering Result is accessible, it can be inspected with the Cluster Explorer View.
Cluster Explorer View
Export results
The files generated by this analysis that are stored in the chosen Output directory can be opened using JTreeViewer (http://jtreeview.sourceforge.net/).
In addition, you can export the hierarchical clustering using the "Export" menu. This menu offers you the following options:
Cluster Explorer View Export Menu
- Export to Postscript: exports the entire hierarchical clustering into a Postscript file. Note that if you have selected an specific area of the heatmap, this will be the exported area.
- Export to Image: exports the entire hierarchical clustering into a PNG/JPG image file. Note that if you have selected an specific area of the heatmap, this will be the exported area.
- Export ColorBar to Postscript: exports the color bar used in the heatmap into a Postscript file.
- Export ColorBar to Image: exports the color bar used in the heatmap into a PNG/JPG file.
- Save Tree Image: exports the peaks tree into a PNG/JPG file.
- Save Thumbnail Image: exports the heatmap into a PNG/JPG file.
- Save Zoomed Image: exports the zoomed heatmap into a PNG/JPG file.
- Export as CSV: exports the heat map matrix into a CSV file.
Biclustering Analysis 
This operation performs a Biclustering Analysis on Aligned Peak Lists.
Usage
You can execute this operation by clicking the button or following the menu Analysis/Biclustering Analysis.
A dialog will appear allowing you to select the following options:
- Data: Labeled Aligned Peak List Set or Unlabeled Peak List.
- Minimum inter-sample presence: masses must be present at least in this percentage of the total samples.
- Maximum inter-sample presence: masses must be present at maximum in this percentage of the total samples.
- Use Missing Values.
- Bicluster type: pattern type of target biclusters. It can be a Presence pattern, an Absence pattern or a Presence-Absence pattern.
- Biclustering algorithm: biclustering algorithm to use. It can be BiMax or Bibit.
- Biclustering mode: specify whether samples or masses are passed in rows to the biclustering.
- Min. samples in bicluster: minimum number of samples in a bicluster.
- Min. masses in bicluster: minimum number of masses in a bicluster.
- Class biclusters: Check if you want to retrieve only class-biclusters.
- Output directory: directory to store the biclustering results.
Biclustering Analysis dialog
Output and visualization
As a result of this operation, a new Biclustering is added to the clipboard.
Every time a new Biclustering Result is accessible, it can be inspected with the Bicluster Explorer View.
Bicluster Explorer View
Export results
You can export the biclustering results using the "Export" menu.
Bicluster Explorer View Menu
This menu offers you the following options:
- Export table to CSV: allows you to export the biclusters table as a CSV file. This can also be done by clicking the right-corner button of the
table (
) and choosing the "Export to CSV" option. You can find more information about this at section Export results.
- Export heat map to image: allows you to export the biclusters heatmap as an image. A dialog will appear allowing you to select the file to save the image..
- Export selected bicluster: if you have selected a bicluster in the table, this option allows you to export its information into a file.
You can also export the selected bicluster information by doing right-click in the bicluster information area (at the bottom of the Bicluster Explorer View), a context menu will appear allowing you to:
- Copy to clipboard: copy the selected bicluster information into the clipboard.
- Export to file: export the selected bicluster information into a file.
Create Classification Analysis 
This operation creates a Classification Analysis from a Labeled Aligned Peak List set.
Usage
You can execute this operation by clicking the button or following the menu Analysis/Create Classification Analysis.
A dialog will appear allowing you to select the following options:
- Name: experiment name.
- Data: Labeled Aligned Peak List Set.
Create Classification Analysis dialog
Output and visualization
As a result of this operation, a new Classification Analysis is added to the clipboard.
Every time a new Classification Analysis is accessible, it can be inspected with the Classification Analysis View. This view allows the user to choose and configure a classifier, select a
evaluation model (cross-validation or percentage split) and evaluate the performance of the classifier with the input data.
Classification view
Export results
You can select the classifier output and copy its contents by pressing keys CTRL+C. Then, you can paste this information in any other programs.
Contents
Raw Data 
Which represents any set of one or more samples whose replicates are Raw Data files (ie. .mzXML, .mzML or .csv). You can load these files with the Load Raw Data operation. Every time a new Raw Data is accessible, it can be inspected with the data viewer.
Mass-up is able to read raw data from standard Mass Spectrometry formats for raw data storage like mzXML and mzML as well as from comma-separated values files (.csv), where the first column is the mass and the second is the intensity.
The following lines show an example of a valid .csv file:
Mass,Intensity 72.38649,4.7928915 92.86101,11.554423 103.110954,23.025375 115.28742,8.338575 135.57188,76.37024 137.58994,57.889793
For further information about formats you can visit the following links:
- mzML: http://www.psidev.info/mzml_1_0_0
- mzXML: http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML
Peak List 
Which represents any set of one or more samples whose replicates are Peak List files (ie.: .csv) and the result produced of applying Peak Detection to Raw Data. You can load these files with the Load Peak List operation. Every time a new Peak List is accessible, it can be inspected with the data viewer.
Mass-up is able to peak lists from comma-separated values files (.csv), where the first column is the mass and the second is the intensity.
The following lines show an example of a valid .csv file:
Mass,Intensity 72.38649,4.7928915 92.86101,11.554423 103.110954,23.025375 115.28742,8.338575 135.57188,76.37024 137.58994,57.889793
Matched Peak List 
Which represents any set of one or more samples whose replicates are Matched Peak List files (ie.: .csv) and the result produced of applying Peak Matching to Peak List. You can load these files with the Load Matched Peak List operation. Every time a new Matched Peak List is accessible, it can be inspected with the data viewer.
Mass-up is able to matched peak lists from comma-separated values files (.csv), where the first column is the mass and the second is the intensity.
The following lines show an example of a valid .csv file:
Mass,Intensity 72.38649,4.7928915 92.86101,11.554423 103.110954,23.025375 115.28742,8.338575 135.57188,76.37024 137.58994,57.889793
A Matched Peak List is a special case of Peak List where all samples and replicates have their peaks matched.
Labeled Matched Peak List Set 
Which represents a set of Labeled Matched Peak List as a result of applying Peak Matching to several Peak Lists. You can load also these data with the Load Matched Peak List operation using the Labeled mode. Every time a new Labeled Matched Peak List Set is accessible, it can be inspected with the data viewer.
A Labeled Matched Peak List Set is addressed to group together several Labeled Peak List that have been matched and therefore, can be used in subsequent analyses.
Discriminant Peak List 
Which represents a Discriminant Peak List files (ie.: .csv). You can load these files with the Load Discriminant Peak List operation.
Mass-up is able to discriminant peak lists from comma-separated values files (.csv), where the first column is the mass.
The following lines show an example of a valid .csv file:
discriminant peak list 93.86352 318.46384 325.30566 152.7414 322.36334 305.01138 266.65726 162.83794 291.34518