Introduction ************ SEDA (*SEquence DAtaset builder*) is an open-source, multiplatform application for processing FASTA files containing DNA and protein sequences. As the following image shows, SEDA uses the Input-Process-Output (IPO) model to process sequence files in FASTA format (https://en.wikipedia.org/wiki/FASTA_format). This means that every operation in SEDA takes as input one or more FASTA files and produces one or more FASTA files. .. figure:: images/introduction/1.png :align: center According to the FASTA format, each file may contain one or more sequences. Each sequence is composed by a header line which begins with ‘>’ and one or more lines containing the nucleotide or amino acid sequences represented using single-letter codes. The header of a sequence typically should give a name (unique identifier) for the sequence, and may also contain additional information (called description). The description is separated by a blank space from the sequence name/identifier. The following example shows a sequence in FASTA format. .. code-block:: console >SEQUENCE_NAME_IDENTIFIER Description ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG ACTGACTGACTGACTGACTGACTG To facilitate the usage of the application, SEDA processing operations are grouped in six main groups: - *Alignment-related*: including functions to align sequences using Clustal Omega, concatenate sequences, and create consensus sequences. - *BLAST*: including an operation for performing batch BLAST analyses and a two-way ortholog identification method. - *Filtering*: including different operations to filter sequences (e.g. those that meet some criteria, remove isoforms or duplicated sequences, among others). - *Gene Annotation*: including gene annotation pipelines based on Augustus as implemented in SAPP, getorf from EMBOSS, Splign/Compart (https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi), and ProSplign/ProCompart (https://www.ncbi.nlm.nih.gov/sutils/static/prosplign/prosplign.html). - *Reformatting*: providing operations to change the format of the FASTA files such as a powerful operation for changing sequence headers, among others. - *General*: containing operations whose functionality is not related to the other groups (e.g. splitting files and extracting random sequences, DNA to Protein translation, or FASTA files comparison, among others).