Mass-Up Manual

This manual offers you a complete help about all the Mass-Up functions and is organized into four sections: (i) the load menu, (ii) the preprocess menu, (iii) the analysis menu, and (iv) the Mass-Up main datatypes.

The load menu contains the following operations (click on each one to navigate to its help):

Import files.
Import configuration file.
Load raw data.
Load peak list.
Load matched peak list.
Load discriminant peak list.
Load hierarchical clustering results.
Load biclustering results.
Save data.
Interoperability.

The preprocess menu contains the following operations:

Preprocess.
Peak matching.

The analysis menu contains the following operations:

Quality Control.
Intra-label biomarker discovery.
Inter-label biomarker discovery.
Principal Component Analysis (PCA).
Hierarchical Clustering.
Biclustering analysis.
Classification Analysis.

Finally, these are the most relevant datatypes that you will manage in Mass-Up, using the previous operations.

Raw data.
Peak list.
Matched peak list.
Matched peak list set.
Discriminant peak list.

Import files.
Import configuration file.
Load raw data.
Load peak list.
Load matched peak list.
Load discriminant peak list.
Load hierarchical clustering results.
Load biclustering results.
Save data.
Interoperability.

Operation

Import Files

This operation loads several files, allowing the user to design the experiment, that is, setting the number of samples, the labels and assigning files to each sample.

This operation follows the workflow of the image bellow.

Import Files Workflow

Usage

You can execute this operation by clicking the button or following the menu File/Import/Import dataset.

First, a dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).

Import Dataset File Dialog - Step 1

If you choose Labeled, the next dialog will allow you to choose number of labels.

Import Dataset Dialog - Step 2 (Labeled) - Number of labels

If you choose Unlabeled, the next dialog will allow you to choose number of samples.

Import Dataset Dialog - Step 2 (Unabeled) - Number of samples

And, if you have chosen Labeled, a third dialog will allow you introduce the names of the labels and the number of samples per label.

Import Dataset Dialog - Step 3 (Labeled) - Label names and number of samples

Finally, according to your choices, a dialog will appear allowing you to design your experiment.

Import Dataset Main Dialog

In order to design an experiment, you have to:

Select the data type: raw spectra, peak list or aligned peak list.
Add files to your experiment by clicking the button.
Assign files to samples. To do that, you have two alternatives:
- Select one or more files and drag and drop them into the corresponding sample.
- Select one or more files and use the button in order to automatically distribute them to the samples shown on the right side of the dialog. See below for a detailed description of this feature.
Assign names to samples. When the autofill button is used, samples are also named automatically by taking the name of the first file.

When the experiment design is complete, you can load the data by clicking the Load button. This option also allows you to store the experiment design configuration as a Mass-Up Configuration File (.muc) for further uses. The format of this .muc file is explained here.

The autofill button

The button allows you to automatically distribute the selected files to the samples shown on the right side of the dialog. This feature is particularly useful when you need to import large amounts of files and distribute them into different classes and samples, allowing you to save a lot of time.

To illustrate this feature, lets suppose a dataset with:

Two conditions: A and B.
Five samples per class.
Fifty raw data files in .mzML format, shown in the image below.

Ten peak list files added to the dialog.

First, you have to select the twenty five files corresponding to class A and click the button in order to automatically fill the five samples of this class. The autofill function assumes that each sample must have the same number of files so, in this case, five files are assigned to each sample. As you can see in the image, the names of the samples are automatically taken from associated files by removing file extensions.

Five peak list files automatically distributed to class A samples.

And finally, you have to select the twenty five files corresponding to class B and click the button again in order to automatically fill the five samples of this class.

Operation

Import Configuration File

This operation loads a Mass-up Configuration file (.muc) containing a distribution of files in samples Import Operation.

Usage

You can execute this operation by clicking the button or following the menu File/Import/Import from configuration File.

A dialog will appear allowing you to choose *.muc files.

Import Configuration File Dialog

This operation reads the saved configuration and loads all the data.

Operation

Load Raw Data

This operation loads a several raw files (ie. .mzXML, .mzML or .csv) into one or several Raw Data elements.

Usage

You can execute this operation by clicking the button or following the menu File/Load Data/Raw Data.

A dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).

Load Raw Data Dialog

Labeled experiment

If you choose Labeled, the next dialog will allow you to choose the folders containing the samples of your experiment.

Load Raw Data Dialog - Labeled

It is important that to note that you must add one folder per label in your experiment. At the same time, each label folder, must contain one folder per sample. And finally, the sample folders contain the raw files.
For example, imagine that you have two labels: CONDITION-A and CONDITION-B. For each label you have three samples and for each sample you have three raw spectra. You must have the following organization in your file system:

/CONDITION-A
	/SAMPLE-A.1
		/SAMPLE-A.1.1.mzXML
		/SAMPLE-A.1.2.mzXML
		/SAMPLE-A.1.3.mzXML
	/SAMPLE-A.2
		/SAMPLE-A.2.1.mzXML
		/SAMPLE-A.2.2.mzXML
		/SAMPLE-A.2.3.mzXML
	/SAMPLE-A.3
		/SAMPLE-A.3.1.mzXML
		/SAMPLE-A.3.2.mzXML
		/SAMPLE-A.3.3.mzXML
/CONDITION-B
	/SAMPLE-B.1
		/SAMPLE-B.1.1.mzXML
		/SAMPLE-B.1.2.mzXML
		/SAMPLE-B.1.3.mzXML
	/SAMPLE-B.2
		/SAMPLE-B.2.1.mzXML
		/SAMPLE-B.2.2.mzXML
		/SAMPLE-B.2.3.mzXML
	/SAMPLE-B.3
		/SAMPLE-B.3.1.mzXML
		/SAMPLE-B.3.2.mzXML
		/SAMPLE-B.3.3.mzXML

Unlabeled experiment

If you choose Unlabeled, the next dialog will allow you to choose the folder containing the samples of your experiment.

Load Raw Data Dialog - Unlabeled

It is important that to note that you must add one folder that contains all the samples in your experiment. The sample folders contain the raw files.
For example, imagine that you have six samples and for each sample you have three raw spectra. You must have the following organization in your file system:

/SAMPLE-1
	/SAMPLE-1.1.mzXML
	/SAMPLE-1.2.mzXML
	/SAMPLE-1.3.mzXML
/SAMPLE-2
	/SAMPLE-2.1.mzXML
	/SAMPLE-2.2.mzXML
	/SAMPLE-2.3.mzXML
/SAMPLE-3
	/SAMPLE-3.1.mzXML
	/SAMPLE-3.2.mzXML
	/SAMPLE-3.3.mzXML
/SAMPLE-4
	/SAMPLE-4.1.mzXML
	/SAMPLE-4.2.mzXML
	/SAMPLE-4.3.mzXML
/SAMPLE-5
	/SAMPLE-5.1.mzXML
	/SAMPLE-5.2.mzXML
	/SAMPLE-5.3.mzXML
/SAMPLE-6
	/SAMPLE-6.1.mzXML
	/SAMPLE-6.2.mzXML
	/SAMPLE-6.3.mzXML

Operation

Load Peak List

This operation loads a several peak list files (.csv) into one or several Peak List elements.

Usage

You can execute this operation by clicking the button or following the menu File/Load Data/Peak List.

A dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).

Load Peak List Dialog

Labeled experiment

If you choose Labeled, the next dialog will allow you to choose the folders containing the samples of your experiment.

Load Peak List Dialog - Labeled

It is important that to note that you must add one folder per label in your experiment. At the same time, each label folder, must contain one folder per sample. And finally, the sample folders contain the peak list files.
For example, imagine that you have two labels: CONDITION-A and CONDITION-B. For each label you have three samples and for each sample you have three peak list. You must have the following organization in your file system:

/CONDITION-A
	/SAMPLE-A.1
		/SAMPLE-A.1.1.csv
		/SAMPLE-A.1.2.csv
		/SAMPLE-A.1.3.csv
	/SAMPLE-A.2
		/SAMPLE-A.2.1.csv
		/SAMPLE-A.2.2.csv
		/SAMPLE-A.2.3.csv
	/SAMPLE-A.3
		/SAMPLE-A.3.1.csv
		/SAMPLE-A.3.2.csv
		/SAMPLE-A.3.3.csv
/CONDITION-B
	/SAMPLE-B.1
		/SAMPLE-B.1.1.csv
		/SAMPLE-B.1.2.csv
		/SAMPLE-B.1.3.csv
	/SAMPLE-B.2
		/SAMPLE-B.2.1.csv
		/SAMPLE-B.2.2.csv
		/SAMPLE-B.2.3.csv
	/SAMPLE-B.3
		/SAMPLE-B.3.1.csv
		/SAMPLE-B.3.2.csv
		/SAMPLE-B.3.3.csv

Unlabeled experiment

If you choose Unlabeled, the next dialog will allow you to choose the folder containing the samples of your experiment.

Load Peak List Dialog - Unlabeled

It is important that to note that you must add one folder that contains all the samples in your experiment. The sample folders contain the peak list files.
For example, imagine that you have six samples and for each sample you have three peak lists. You must have the following organization in your file system:

/SAMPLE-1
	/SAMPLE-1.1.csv
	/SAMPLE-1.2.csv
	/SAMPLE-1.3.csv
/SAMPLE-2
	/SAMPLE-2.1.csv
	/SAMPLE-2.2.csv
	/SAMPLE-2.3.csv
/SAMPLE-3
	/SAMPLE-3.1.csv
	/SAMPLE-3.2.csv
	/SAMPLE-3.3.csv
/SAMPLE-4
	/SAMPLE-4.1.csv
	/SAMPLE-4.2.csv
	/SAMPLE-4.3.csv
/SAMPLE-5
	/SAMPLE-5.1.csv
	/SAMPLE-5.2.csv
	/SAMPLE-5.3.csv
/SAMPLE-6
	/SAMPLE-6.1.csv
	/SAMPLE-6.2.csv
	/SAMPLE-6.3.csv

Operation

Load Matched Peak List

This operation loads a several matched peak list files (.csv) into one several Unlabeled Matched Peak List elements or into an Labeled Matched Peak List Set, depending on the experiment type.

Usage

You can execute this operation by clicking the button or following the menu File/Load Data/Matched Peak List.

A dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).

Load Matched Peak List Dialog

Labeled experiment

If you choose Labeled, the next dialog will allow you to choose the folders containing the samples of your experiment.

Load Matched Peak List Dialog - Labeled

The operation loads one Labeled Matched Peak List Set containing one Labeled Peak List element for each label.
It is important that to note that you must add one folder per label in your experiment. At the same time, each label folder, must contain one folder per sample. And finally, the sample folders contain the peak list files.
For example, imagine that you have two labels: CONDITION-A and CONDITION-B. For each label you have three samples and for each sample you have three peak list. You must have the following organization in your file system:

/CONDITION-A
	/SAMPLE-A.1
		/SAMPLE-A.1.1.csv
		/SAMPLE-A.1.2.csv
		/SAMPLE-A.1.3.csv
	/SAMPLE-A.2
		/SAMPLE-A.2.1.csv
		/SAMPLE-A.2.2.csv
		/SAMPLE-A.2.3.csv
	/SAMPLE-A.3
		/SAMPLE-A.3.1.csv
		/SAMPLE-A.3.2.csv
		/SAMPLE-A.3.3.csv
/CONDITION-B
	/SAMPLE-B.1
		/SAMPLE-B.1.1.csv
		/SAMPLE-B.1.2.csv
		/SAMPLE-B.1.3.csv
	/SAMPLE-B.2
		/SAMPLE-B.2.1.csv
		/SAMPLE-B.2.2.csv
		/SAMPLE-B.2.3.csv
	/SAMPLE-B.3
		/SAMPLE-B.3.1.csv
		/SAMPLE-B.3.2.csv
		/SAMPLE-B.3.3.csv

Unlabeled experiment

If you choose Unlabeled, the next dialog will allow you to choose the folder containing the samples of your experiment.

Load Matched Peak List Dialog - Unlabeled

/SAMPLE-1
	/SAMPLE-1.1.csv
	/SAMPLE-1.2.csv
	/SAMPLE-1.3.csv
/SAMPLE-2
	/SAMPLE-2.1.csv
	/SAMPLE-2.2.csv
	/SAMPLE-2.3.csv
/SAMPLE-3
	/SAMPLE-3.1.csv
	/SAMPLE-3.2.csv
	/SAMPLE-3.3.csv
/SAMPLE-4
	/SAMPLE-4.1.csv
	/SAMPLE-4.2.csv
	/SAMPLE-4.3.csv
/SAMPLE-5
	/SAMPLE-5.1.csv
	/SAMPLE-5.2.csv
	/SAMPLE-5.3.csv
/SAMPLE-6
	/SAMPLE-6.1.csv
	/SAMPLE-6.2.csv
	/SAMPLE-6.3.csv

Operation

Discriminant Peak List

This operation loads a discriminant peak list file (.csv) into one Discriminant Peak List element.

Usage

You can execute this operation by clicking the button or following the menu File/Load Data/Discriminant Peak List.

A dialog will appear allowing you to choose the file that contain the discriminant peak list.

Load Discriminant Peak List Dialog

Operation

Load Hierarchical Clustering

This operation loads the results of the Hierarchical Clustering operation.

Usage

You can execute this operation by following the menu File/Load Analysis/Clustering.

A dialog will appear allowing you to choose the folder containing the results of the Hierarchical Clustering operation.

Load Clustering Results dialog

Operation

Load Biclustering

This operation loads the results of the Biclustering operation.

Usage

You can execute this operation by following the menu File/Load Analysis/Biclustering.

A dialog will appear allowing you to choose the folder containing the results of the Biclustering operation.

Load Biclustering Results dialog

Operation

Save Data

This operation saves data.

Usage

You can execute this operation by clicking the button or following the menu File/Save Data.

A dialog will appear allowing you to choose the data that you want to save.

Save Data dialog

The first combobox allows you to select the type of data (Labeled Raw Data, Labeled Peak List, Labeled Aligned Peak List, Unlabeled Raw Data, Unlabeled Peak List or Unlabeled Aligned Peak List) that is showed in the second combobox.
Note that you only can save data of the same type in one operation.

Interoperability

Import data saved with Mass-Up in external applications

The Save Data operation allows you to save your data into .csv files. This may be interesting if you have preprocessed a raw dataset and want to store the preprocessed data for further analysis with Mass-Up or other applications, such as R.

R

If you want to load a .csv spectra files with R, the easiest way is to use the MALDIquant and MALDIquantForeign packages, which allow you to import spectra from .csv files with the import function.

Import a single .csv spectra file with MALDIquant

Consider that you have a file called spectra.csv with the following content:

	Mass,Intensity
	72.38649,4.7928915
	92.86101,11.554423
	103.110954,23.025375
	115.28742,8.338575
	135.57188,76.37024
	137.58994,57.889793

You can import this spectra into a list by running the following R commands:

	library("MALDIquantForeign");
	spectra <- import("spectra.csv");

Note that spectra is a list so if you type spectra[[1]] in the R console, you will see the loaded data:

	> spectra[[1]]
	S4 class type            : MassSpectrum         
	Number of m/z values     : 6                    
	Range of m/z values      : 72.386 - 137.59      
	Range of intensity values: 4.793e+00 - 7.637e+01
	Memory usage             : 1.523 KiB            
	File                     : /tmp/spectra.csv

Import a dataset with MALDIquant

Consider that you have saved your preprocessed dataset into a directory called dataset, which has three condition sub-directories called:

HEALTHY: which has two samples (HA and HA) with five replicates each one.
MYELOMA: which has five samples (MA, MB, MC, MD and ME) with five replicates each one.
LYMPHOMA: which has five samples (LA, LB, LC, LD and LE) with five replicates each one.

Each sub-directory may contain one or more sub-directories for each sample, which at the same time can have one or more .csv spectra files.

If you want to load all the spectra into a list, you just have to run the following R commands:

	library("MALDIquantForeign");
	spectra <- import("dataset");

Within this command, all the spectra are loaded into a plain list so you should process this list in order to extract the spectra from the sample or condition that you want. Let's consider that you want to create one separated list for each sample. The first thing you can do is to get the sample names by reading the directory names:

	sampleNames <- list.dirs(path="dataset", recursive=TRUE)
	sampleNames <- sampleNames[c(3:4,6:10,12:16)]
	sampleNames <- gsub(".//HEALTHY/", "", sampleNames)
	sampleNames <- gsub(".//LYMPHOMA/", "", sampleNames)
	sampleNames <- gsub(".//MYELOMA/", "", sampleNames)

Now, in sampleNames you have a list with the names of your samples:

	> sampleNames
	[1] "HA" "HB" "LA" "LB" "LC" "LD" "LE" "MA" "MB" "MC" "MD" "ME"

Since all samples have the same number of replicates, it is easy to retrieve them with the following code snippet: if you want to get a list with the spectra of the ith sample, you just have to set the ith variable:

	ith		<- 1 # A value between 1 and length(sampleNames)
	spectraIndex 	<- (ith-1)*5
	sample.name 	<- sampleNames[ith]
	sample.spectra 	<- spectra[spectraIndex:(spectraIndex+5)]

And now, you have the information of the first sample stored in sample.name and sample.spectra:

	> sample.name
	[1] "HA"
	
	> sample.spectra
	[[1]]
	S4 class type            : MassSpectrum      
	Number of m/z values     : 155               
	Range of m/z values      : 656.165 - 3349.394
	Range of intensity values: 3e-03 - 1e+00     
	Memory usage             : 3.938 KiB         
	File                     : /tmp/dataset/HEALTHY/HA/spectrum1.csv
	
	[[2]]
	S4 class type            : MassSpectrum      
	Number of m/z values     : 144               
	Range of m/z values      : 656.152 - 3349.637
	Range of intensity values: 5e-03 - 1e+00     
	Memory usage             : 3.766 KiB         
	File                     : /tmp/dataset/HEALTHY/HA/spectrum2.csv
	
	[[3]]
	S4 class type            : MassSpectrum      
	Number of m/z values     : 116               
	Range of m/z values      : 656.173 - 3348.615
	Range of intensity values: 2e-03 - 1e+00     
	Memory usage             : 3.328 KiB         
	File                     : /tmp/dataset/HEALTHY/HA/spectrum3.csv
	
	[[4]]
	S4 class type            : MassSpectrum      
	Number of m/z values     : 139               
	Range of m/z values      : 656.162 - 3349.348
	Range of intensity values: 4e-03 - 1e+00     
	Memory usage             : 3.688 KiB         
	File                     : /tmp/dataset/HEALTHY/HA/spectrum4.csv
	
	[[5]]
	S4 class type            : MassSpectrum      
	Number of m/z values     : 118               
	Range of m/z values      : 656.177 - 3349.325
	Range of intensity values: 7e-03 - 1e+00     
	Memory usage             : 3.359 KiB         
	File                     : /tmp/dataset/HEALTHY/HA/spectrum5.csv

Mass-Up Configuration File

Here we describe in detail the Mass-up Configuration File so that you can create your own .muc files to import datasets into Mass-Up.

	<?xml version="1.0" encoding="UTF-8"?>
	
	<!-- Dataset definition indicating whether it is labeled or not and the data type: RAW Spectra, Peak lists or Aligned peak lists --!>
	<massupdatasetloader labeled="true" type="RAW Spectra">
	
	  <!-- Absolute paths to data files --!>
	  <files>
	    <file>D:\Mass-Up-Data\sample1_replicate1.csv</file>
	    <file>D:\Mass-Up-Data\sample1_replicate2.csv</file>
	    <file>D:\Mass-Up-Data\sample1_replicate3.csv</file>
	    <file>D:\Mass-Up-Data\sample2_replicate1.csv</file>
	    <file>D:\Mass-Up-Data\sample2_replicate2.csv</file>
	    <file>D:\Mass-Up-Data\sample2_replicate3.csv</file>
	    <file>D:\Mass-Up-Data\sample3_replicate1.csv</file>
	    <file>D:\Mass-Up-Data\sample3_replicate2.csv</file>
	    <file>D:\Mass-Up-Data\sample3_replicate3.csv</file>
	    <file>D:\Mass-Up-Data\sample4_replicate1.csv</file>
	    <file>D:\Mass-Up-Data\sample4_replicate2.csv</file>
	    <file>D:\Mass-Up-Data\sample4_replicate3.csv</file>
	  </files>
	  
	  <!-- Names of samples in dataset --!>	  
	  <samplenames>
	    <samplename>sample1</samplename>
	    <samplename>sample2</samplename>
	    <samplename>sample3</samplename>
	    <samplename>sample4</samplename>
	  </samplenames>
	  
	  <!-- Names of classes (labels) in dataset --!>	  
	  <classes>
	    <class>A</class>
	    <class>B</class>
	  </classes>
	  
	  <!-- Mappings of files to samples --!>	  
	  <filesamplemappings>
	    <mapping>0</mapping>
	    <mapping>0</mapping>
	    <mapping>0</mapping>
	    <mapping>1</mapping>
	    <mapping>1</mapping>
	    <mapping>1</mapping>
	    <mapping>2</mapping>
	    <mapping>2</mapping>
	    <mapping>2</mapping>
	    <mapping>3</mapping>
	    <mapping>3</mapping>
	    <mapping>3</mapping>
	  </filesamplemappings>
	  
	  <!-- Mappings of samples to classes --!>	  
	  <sampleclassmappings>
	    <mapping>0</mapping>
	    <mapping>0</mapping>
	    <mapping>1</mapping>
	    <mapping>1</mapping>
	  </sampleclassmappings>
	</massupdatasetloader>

The file above corresponds to a .muc file with:

<files>: twelve RAW spectra files in .csv format.
<samplenames>: four samples (sample1, sample2, sample3 and sample4).
<classes>: two classes (A, B).

Each file must be mapped to a sample and each sample must be mapped to a class, using <filesamplemappings> and <sampleclassmappings> respectively. The following image illustrates this mapping.

Mass-Up Configuration File

Labeled and unlabeled experiments

The file format defined corresponds to a labeled experiment. If your experiment is unlabeled and there are no classes, just omit <classes> and <sampleclassmappings> blocks.

Processing large datasets

Some experiments can involve hundreds to thousands of MALDI-TOF MS spectra. The maximum number of spectra that Mass-Up is able to handle simultaneously depend on both the number of spectra and the size (i.ei. number of peaks) of each one.

If you are experiencing problems processing large datasets, here are some general advises to deal with them:

The memory used by Mass-Up can be increased by editing the run.bat or run.sh that can be found the installation directory. In this file you can change the value of the MEMORY parameter. By default, this parameter is set to -Xmx2G, which means that Mass-Up will use up to 2 Gigabytes of RAM. To process large datasets, this amount can be increased up to a value near to computer's available RAM (for example, if you have 8Gb of RAM, you can set this parameter to -Xmx6G or -Xmx8G).
Do not keep loaded spectra in Mass-Up that you don't need. If you preprocess your dataset with Mass-Up, don't keep the raw spectra in the Mass-Up's clipboard if it is not needed, because raw spectra use much more memory than the processed spectra. This is done by default by the preprocessing operations so, if you don't change this option in the operations, you don't have to worry.
If Mass-Up can't load your raw dataset all at once, you can preprocess your spectra by batches. A raw dataset requires much more memory than a peak list. In addition, the preprocessing operations previous to peak matching can be done independently for each sample. Therefore, you can load only one batch of your samples, preprocess them and then store them. Once you have preprocessed all your dataset batchets, you can load the full preprocessed dataset (i.e. peak lists) and perform the peak matching.

Preprocess.
Peak matching.

Operation

Preprocess data

This operation preprocesses one or more Raw Data elements, applying the selected methods to each spectra. If you apply peak detection, this operation returns one or more Peak List elements. Otherwise, it returns one or more Raw Data elements.

Usage

You can execute this operation by clicking the button or following the menu Preprocess/Preprocess data.

A dialog will appear allowing you to select the following options:

Raw Data: previously loaded Raw Data.
Intensity transformation (scaling) method: None (not apply), Logaritmic, Logaritmic with base 10, Logaritmic with base 2 or Square root.
Smoothing method: None (not apply), moving average window (MALDIquant) or Savitzky Golay (MALDIquant).
Baseline correction method: None (not apply), Top Hat, Snip, Median or Convex Hull (all from MALDIquant).
Standardization method: None (not apply), total ion current (TIC), Probabilistic Quotient Normalization (PQN) or median (all from MALDIquant).
Peak detection method: None (not apply), MassSpecWavelet or MALDIquant.
- If you choose "None", no peak detection will be performed. However, you can choose whether you want to convert the input data into a peak list (which can be useful if peak detection has been already applied to the input raw data) or not.
- If you choose "MassSpecWavelet", you also have to set:
  - Signal to noise ratio: SNR threshold used to identify peaks. Default is 6.
  - Peak scale range: The CWT scale range of the peak. Default is 2.
  - Amplitude threshold: The minimum peak amplitude. Default is 0.0001.
- If you choose "MALDIquant", you also have to set:
  - Signal to noise ratio: SNR threshold used to identify peaks. Default is 3.
  - Half window size: The resulting window reaches from mass[currentIndex-halfWindowSize] to mass[currentIndex+halfWindowSize]. A local maximum have to be the highest one in the given window to be recognized as peak. Default is 60.
Minimum peak intensity: a non-negative number indicating the minimum peak intensity to filter out peaks. Peaks with an intensity lower than this threshold are discarded.
Keep original data: if you select this option, original data is maintained in the cilpboard. Otherwhise (by default), original data is removed.

Save Data dialog

Operation

Peak Matching

This operation matches one or more Peak List elements, following the workflow presented bellow.

Peak Lists Alignment Workflow

Usage

You can execute this operation by clicking the button or following the menu Preprocess/Align Peak Lists.

A dialog will appear allowing you to select the following options:

Peak Lists to match: previously loaded Peak Lists. You can choose between Labeled or Unlabeled Peak Lists.
Intra-sample matching method (optional): None (not apply), the Forward algorithm or MALDIquant.
If you choose the Forward algorithm, you also have to set:
- Tolerance type: PPM (Points per million), Absolute or Relative. Default is PPM.
- Tolerance value: tolerance used together with the tolerance type to consider two peaks the same. Default is 300.
- Reference type: First, Median, Last, Average, Average up or Average down. Default is Average (AVG).
If you choose MALDIquant, you also have to set:
- Tolerance value: tolerance used to consider two peaks the same. Default is 0.002.
You can also set here if you want to generate a consensus spectrum for each sample and the Percentage of Presence (POP) value to controll this process.
Inter-sample matching method: Forward algorithm or MALDIquant.
If you choose the Forward algorithm, you also have to set:
- Tolerance type: PPM (Points per million), Absolute or Relative. Default is PPM.
- Tolerance value: tolerance used together with the tolerance type to consider two peaks the same. Default is 300.
- Reference type: First, Median, Last, Average, Average up or Average down. Default is Average (AVG).
If you choose MALDIquant, you also have to set:
- Tolerance value: tolerance used to consider two peaks the same. Default is 0.002.

Align Peak Lists dialog

Quality Control.
Intra-label biomarker discovery.
Inter-label biomarker discovery.
Principal Component Analysis (PCA).
Hierarchical Clustering.
Biclustering analysis.
Classification Analysis.

Operation

Quality Control

This operation performs Quality Control on Peak Lists.

Usage

You can execute this operation by clicking the button or following the menu Analysis/Peak List Quality Control.

A dialog will appear allowing you to select the following options:

Peak Lists: previously loaded Peak Lists. You can choose between Labeled or Unlabeled Peak Lists.
Intra-sample alignment method: Forward algorithm or MALDIquant.
If you choose the Forward algorithm, you also have to set:
- Tolerance type: PPM (Points per million), Absolute or Relative. Default is PPM.
- Tolerance value: tolerance used together with the tolerance type to consider two peaks the same. Default is 300.
- Reference type: First, Median, Last, Average, Average up or Average down. Default is Average (AVG).
If you choose MALDIquant, you also have to set:
- Tolerance value: tolerance used to consider two peaks the same. Default is 0.002.

Peak List Quality Control dialog

Output and visualization

As a result of this operation, a new Quality Control Result is added to the clipboard. Every time a new Quality Control Result is accessible, it can be inspected with the Quality Control View.

This view presents to tabs: the samples tab, with information relative to the samples, and the replicates tab, with information relative to the replicates.

The samples tab has one data table and two chart tabs: the global analysis and the sample analysis. If the Peak Lists used are Labeled, there will be a third chart tab called Labels analysis.
The data table has as many rows as samples. For each sample row, it presents the following columns:

Sample: the name of the sample.
Class: if available, the label of the sample.
Spectra: number of spectra (replicates) contained in the sample.
Min. Mass: minimun m/z value present in the sample.
Max. Mass: maximum m/z value present in the sample.
Min. Masses: minimum number of peaks of the spectrum with less peaks in the sample.
Max. Masses: maximum number of peaks of the spectrum with less peaks in the sample.
Avg. Masses: average number of peaks of the spectra in the sample.
Std. Dev.: standard deviation of the number of peaks of the spectra in the sample.
POPXX: where XX is a percentage of the number of spectra. Count of the number of peaks with Percentage of Presence (POP) XX.
Align. Masses: count of masses that have been matched across the spectra in the sample.
Split >= XX: percentage of masses that have a POP higher or equal to XX.
Count >= XX: count of masses that have a POP higher or equal to XX.

Global analysis chart

Labels analysis chart

Samples analysis chart

The replicates tab has one data table and chart two tabs: the global analysis and the replicates analysis. If the Peak Lists used are Labeled, there will be a third chart tab called Labels analysis.
The data table has as many rows as replicates in the data set. For each replicate row, it presentes the following columns:

Spectra: the name of the spectra.
Sample: sample which the spectra belongs to.
Class: if available, the label of the spectra.
Masses count: number of peaks of the spectra.
Min. Mass: minimum m/z value present in the spectra.
Max. Mass: maximum m/z value present in the spectra.
Min. Int.: minimum intensity value present in the spectra.
Max. Int.: maximum intensity value present in the spectra.

Global analysis chart

Labels analysis chart

Replicates analysis chart

Export results

You can export the data tables by clicking the right-corner button of the tables () and choosing the "Export to CSV" option. You can find more information about this at section Export results.

You can also select the entire table (by pressing keys CTRL+A) or a specific range of cells and copy its contents by pressing keys CTRL+C. Then, you can paste this information in any other programs such as calc sheets or text editors, either pressing keys CTRL+V or the software specific options.

You can export charts by right-clicking on them and then selecting the "Save as.." option. A dialog will appear allowing you to select the file to save the chart.

Operation

Biomarker discovery (Intra-class Analysis)

This operation performs a Biomarker Discovery Analysis on a single Aligned Peak Lists.

Usage

You can execute this operation by clicking the button or following the menu Analysis/Biomarker discovery (Intra-class Analysis).

A dialog will appear allowing you to select the following options:

Data: a single Labeled/Unlabeled Aligned Peak List.

Biomarker discovery (Intra-class Analysis) dialog

Output and visualization

As a result of this operation, a new Labeled/Unlabeled Intersection is added to the clipboard. Every time a new Labeled/Unlabeled Intersection Result is accessible, it can be inspected with the Biomarker Discovery View.

Biomarker Discovery View

This view allows the user to filter out peaks in two modes: individual or group. In the individual mode, there are two types of filter:

Difference: to find those peaks whose intra-strain presence is abnormally different in one sample (in bold) vs. others. user can customize this difference (min) and how many of the remaining samples should be different.
Thresholds: to find those peaks which are present (or not) in one sample (in bold) vs. the others. User can customize the meaning of 'presence' and the meaning of 'absence' (in percentage of replicas).

In the group mode user can directly choose the samples that want 'in' and 'not in', and also changing the meaning of 'in' and 'not in' (in percentage of replicas).

Export results

You can export the results of this view by clicking the button. A dialog will appear allowing you to select the file to save the results.

Operation

Biomarker discovery (Inter-class Analysis)

This operation performs a Biomarker Discovery Analysis on a Labeled Aligned Peak List set.

Usage

You can execute this operation by clicking the button or following the menu Analysis/Biomarker discovery (Inter-class Analysis).

A dialog will appear allowing you to select the following options:

Labeled Matched Peak List: you have to select two or more Labeled Matched Peak Lists contained in the same Labeled Matched Peak List Set.

Biomarker discovery (Inter-class Analysis) dialog

Output and visualization

As a result of this operation, a new Inter Labeled Intersection is added to the clipboard. Every time a new Inter Labeled Intersection is accessible, it can be inspected with the Biomarker Discovery View. This view has presents three tabs: (i) the analysis tab, (ii) the intra-strain matching tab and (iii) the inter-strain matching tab.

The analysis tab shows a matrix where columns are the samples and rows are the peaks. For each peak (row) there two extra columns: its p and q-values. Each cell value represents the percentage of presence of the row peak in the column sample.

Using the presence/absence thresholds the user can modify the meaning of absence (by default it is 0%) and presence (by default it is 100%). Changing this values may affect the discriminant power of the peaks.

Biomarker Discovery View - Analysis tab

The intra-strain tab has, at the same time, one tab per sample in the Labeled Matched Peak List used. A chart and a matching table is generated for each sample, showing a table with the peaks of each replicates.

Biomarker Discovery View - Intra-strain Matching tab

The inter-strain tab shows a a matrix where the rows are all the peaks present in all the samples of the dataset and the columns are all the samples of the dataset.

Biomarker Discovery View - Inter-strain Matching tab

Export results

You can export the data tables by clicking the right-corner button of the tables () and choosing the "Export to CSV" option. You can find more information about this at section Export results.

Operation

Principal Component Analysis

This operation performs a Principal Component Analysis on Labeled Aligned Peak Lists.

Usage

You can execute this operation by clicking the button or following the menu Analysis/Principal Component Analysis.

A dialog will appear allowing you to choose the experiment type (Labeled or Unlabeled).

Principal Component Analysis experiment type selection dialog

After selecting the experiment type, a dialog will appear allowing you to select the following options:

Data: Labeled Matched Peak List Set or Unlabeled Matched Peak List.
Max. Components: maximum number of PC to retain.
Variance Covered: amount of variance to account for when retaining PC.

Principal Component Analysis dialog

Output and visualization

As a result of this operation, a new PCA Data is added to the clipboard. Every time a new PCA Data is accessible, it can be inspected with the 3D Principal Component Analysis View.

Principal Component Analysis View

The upper part of this view shows you the 3D representation of the data, using the principal components selected using the controls in the right.

The bottom part of this view has two tabs:

Principal components table: a table showing the input samples transformed into its principal components.
Summary: the analysis summary, showing you the correlation matrix obtained, the eigenvalues and the eigenvectors.

Principal Component Analysis View - Summary

Export results

You can export the graphical representation of the PCA by clicking the button. A dialog will appear allowing you to select the file to save the image.

You can export the principal components table by clicking the right-corner button of the tables () and choosing the "Export to CSV" option. You can find more information about this at section Export results.

Finally, you can also export the analysis summary by selecting all the text and copy this selection.

Operation

Clustering Analysis

This operation performs a Clustering Analysis on Aligned Peak Lists.

Usage

You can execute this operation by clicking the button or following the menu Analysis/Clustering Analysis.

A dialog will appear allowing you to select the following options:

Data: Labeled Aligned Peak List Set or Unlabeled Peak List.
Minimum variance: peaks with a variance lower or equals to this value are removed.
Peak list: if provided, only these peaks will be analyzed.
Cluster Reference Value: which value use when comparing two clusters.
Distance Function: function used to measure the distance between to clusters.
Conversion Values: presence, percentage of presence or intensity.
Intra-sample Minimum Presence: when using the percentage of presence, a value between 0 and 100 indicating the minimum percentage of presence of a peak to be considered.
Deep Clustering: check if you want to perform a spectrum-based clustering instead of sample-based.
Output directory: optionally, a directory to store the clustering results. If it is not provided, a temporary directory will be used.

Clustering Analysis dialog

Output and visualization

As a result of this operation, a new Clustering is added to the clipboard. Every time a new Clustering Result is accessible, it can be inspected with the Cluster Explorer View.

Cluster Explorer View

Export results

The files generated by this analysis that are stored in the chosen Output directory can be opened using JTreeViewer (http://jtreeview.sourceforge.net/).

In addition, you can export the hierarchical clustering using the "Export" menu. This menu offers you the following options:

Cluster Explorer View Export Menu

Export to Postscript: exports the entire hierarchical clustering into a Postscript file. Note that if you have selected an specific area of the heatmap, this will be the exported area.
Export to Image: exports the entire hierarchical clustering into a PNG/JPG image file. Note that if you have selected an specific area of the heatmap, this will be the exported area.
Export ColorBar to Postscript: exports the color bar used in the heatmap into a Postscript file.
Export ColorBar to Image: exports the color bar used in the heatmap into a PNG/JPG file.
Save Tree Image: exports the peaks tree into a PNG/JPG file.
Save Thumbnail Image: exports the heatmap into a PNG/JPG file.
Save Zoomed Image: exports the zoomed heatmap into a PNG/JPG file.
Export as CSV: exports the heat map matrix into a CSV file.

Operation

Biclustering Analysis

This operation performs a Biclustering Analysis on Aligned Peak Lists.

Usage

You can execute this operation by clicking the button or following the menu Analysis/Biclustering Analysis.

A dialog will appear allowing you to select the following options:

Data: Labeled Aligned Peak List Set or Unlabeled Peak List.
Minimum inter-sample presence: masses must be present at least in this percentage of the total samples.
Maximum inter-sample presence: masses must be present at maximum in this percentage of the total samples.
Use Missing Values.
Bicluster type: pattern type of target biclusters. It can be a Presence pattern, an Absence pattern or a Presence-Absence pattern.
Biclustering algorithm: biclustering algorithm to use. It can be BiMax or Bibit.
Biclustering mode: specify whether samples or masses are passed in rows to the biclustering.
Min. samples in bicluster: minimum number of samples in a bicluster.
Min. masses in bicluster: minimum number of masses in a bicluster.
Class biclusters: Check if you want to retrieve only class-biclusters.
Output directory: directory to store the biclustering results.

Biclustering Analysis dialog

Output and visualization

As a result of this operation, a new Biclustering is added to the clipboard. Every time a new Biclustering Result is accessible, it can be inspected with the Bicluster Explorer View.

Bicluster Explorer View

Export results

You can export the biclustering results using the "Export" menu.

Bicluster Explorer View Menu

This menu offers you the following options:

Export table to CSV: allows you to export the biclusters table as a CSV file. This can also be done by clicking the right-corner button of the table () and choosing the "Export to CSV" option. You can find more information about this at section Export results.
Export heat map to image: allows you to export the biclusters heatmap as an image. A dialog will appear allowing you to select the file to save the image..
Export selected bicluster: if you have selected a bicluster in the table, this option allows you to export its information into a file.

You can also export the selected bicluster information by doing right-click in the bicluster information area (at the bottom of the Bicluster Explorer View), a context menu will appear allowing you to:

Copy to clipboard: copy the selected bicluster information into the clipboard.
Export to file: export the selected bicluster information into a file.

Operation

Create Classification Analysis

This operation creates a Classification Analysis from a Labeled Aligned Peak List set.

Usage

You can execute this operation by clicking the button or following the menu Analysis/Create Classification Analysis.

A dialog will appear allowing you to select the following options:

Name: experiment name.
Data: Labeled Aligned Peak List Set.

Create Classification Analysis dialog

Output and visualization

As a result of this operation, a new Classification Analysis is added to the clipboard. Every time a new Classification Analysis is accessible, it can be inspected with the Classification Analysis View. This view allows the user to choose and configure a classifier, select a evaluation model (cross-validation or percentage split) and evaluate the performance of the classifier with the input data.

Classification view

Export results

You can select the classifier output and copy its contents by pressing keys CTRL+C. Then, you can paste this information in any other programs.

Raw data.
Peak list.
Matched peak list.
Matched peak list set.
Discriminant peak list.

Raw Data

Which represents any set of one or more samples whose replicates are Raw Data files (ie. .mzXML, .mzML or .csv). You can load these files with the Load Raw Data operation. Every time a new Raw Data is accessible, it can be inspected with the data viewer.

Mass-up is able to read raw data from standard Mass Spectrometry formats for raw data storage like mzXML and mzML as well as from comma-separated values files (.csv), where the first column is the mass and the second is the intensity.

The following lines show an example of a valid .csv file:

	Mass,Intensity
	72.38649,4.7928915
	92.86101,11.554423
	103.110954,23.025375
	115.28742,8.338575
	135.57188,76.37024
	137.58994,57.889793

For further information about formats you can visit the following links:

mzML: http://www.psidev.info/mzml_1_0_0
mzXML: http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML

Peak List

Which represents any set of one or more samples whose replicates are Peak List files (ie.: .csv) and the result produced of applying Peak Detection to Raw Data. You can load these files with the Load Peak List operation. Every time a new Peak List is accessible, it can be inspected with the data viewer.

Mass-up is able to peak lists from comma-separated values files (.csv), where the first column is the mass and the second is the intensity.

The following lines show an example of a valid .csv file:

	Mass,Intensity
	72.38649,4.7928915
	92.86101,11.554423
	103.110954,23.025375
	115.28742,8.338575
	135.57188,76.37024
	137.58994,57.889793

Matched Peak List

Which represents any set of one or more samples whose replicates are Matched Peak List files (ie.: .csv) and the result produced of applying Peak Matching to Peak List. You can load these files with the Load Matched Peak List operation. Every time a new Matched Peak List is accessible, it can be inspected with the data viewer.

Mass-up is able to matched peak lists from comma-separated values files (.csv), where the first column is the mass and the second is the intensity.

The following lines show an example of a valid .csv file:

	Mass,Intensity
	72.38649,4.7928915
	92.86101,11.554423
	103.110954,23.025375
	115.28742,8.338575
	135.57188,76.37024
	137.58994,57.889793

A Matched Peak List is a special case of Peak List where all samples and replicates have their peaks matched.

Matched Peak List Set

Labeled Matched Peak List Set

Which represents a set of Labeled Matched Peak List as a result of applying Peak Matching to several Peak Lists. You can load also these data with the Load Matched Peak List operation using the Labeled mode. Every time a new Labeled Matched Peak List Set is accessible, it can be inspected with the data viewer.

A Labeled Matched Peak List Set is addressed to group together several Labeled Peak List that have been matched and therefore, can be used in subsequent analyses.

Peak List

Discriminant Peak List

Which represents a Discriminant Peak List files (ie.: .csv). You can load these files with the Load Discriminant Peak List operation.

Mass-up is able to discriminant peak lists from comma-separated values files (.csv), where the first column is the mass.

The following lines show an example of a valid .csv file:

	discriminant peak list 
	93.86352
	318.46384
	325.30566
	152.7414
	322.36334
	305.01138
	266.65726
	162.83794
	291.34518

Mass-Up Manual

Contents

Import Files

Usage

The autofill button

Import Configuration File

Load Raw Data

Labeled experiment

Unlabeled experiment

Load Peak List

Labeled experiment

Unlabeled experiment

Load Matched Peak List

Labeled experiment

Unlabeled experiment

Discriminant Peak List

Load Hierarchical Clustering

Load Biclustering

Save Data

Interoperability

Import data saved with Mass-Up in external applications

R

Import a single .csv spectra file with MALDIquant

Import a dataset with MALDIquant

Mass-Up Configuration File

Labeled and unlabeled experiments

Processing large datasets

Contents

Preprocess data

Peak Matching

Contents

Quality Control

Biomarker discovery (Intra-class Analysis)

Biomarker discovery (Inter-class Analysis)

Principal Component Analysis

Clustering Analysis

Biclustering Analysis

Create Classification Analysis

Contents

Raw Data

Peak List

Matched Peak List

Labeled Matched Peak List Set

Discriminant Peak List