SpartaABC Logo
SpartaABC
A web server to simulate sequences
based on indel parameters inferred using
an approximate Bayesian computation algorithm

Research Site | Pupko Group

HOME     FAQ     MANUAL


SpartaABC Manual

We provide a docker version of SpartaABC. Docker is a platform that allows running and sharing programs efficiently on multiple platforms. This allows running SpartaABC on any operation system that runs Docker, including Linux, Windows, and Mac OS X. If Docker is not yet installed, please visit https://www.docker.com/ to install Docker. We will demonstrate the usage of the pipeline on Linux. The full source for SpartaABC is available on Github.

(1) Installing SpartaABC

To get the latest build of SpartaABC enter the following in your terminal (in Windows, please type it in the terminal, which you can open using the command CMD from Window’s start menu):

docker pull orenavram/sparta:spartaabc

When the command ends you should see the following message:

Status: Downloaded newer image for orenavram/sparta:spartaabc
docker.io/orenavram/sparta:spartaabc

The whole process should look very similar to this:

Next please run the following two commands:

docker image tag orenavram/sparta:spartaabc spartaabc
docker rmi orenavram/sparta:spartaabc

These commands simplify calling the SpartaABC program (in technical terms, they allow simplifying the image tag). To check that everything is properly installed, please run the following command:

docker run spartaabc --help

If everything up to this point went correctly you should see the following:

(2) Running SpartaABC

To run the pipeline we first need to create a target directory in which we will place our input files and to which the output will be delivered:

mkdir target_dir

Next we need to move our input files to the target directory. To run SpartaABC you must provide the following:

  • A valid MSA file in FASTA format.
  • A valid tree file in Newick, including branch lengths.

Here are the Unix command to move such input files to the target directory:

mv msa_file target_dir
mv tree_file target_dir

Where msa_file is the name of your MSA file, tree_file is the name of your tree file and target_dir is the directory in which these files are placed.

Then to run SpartaABC with default parameters see the parameters and examples sections below.

Pipeline output

If SpartaABC ran correctly the target_dir should contain the results of the run. The following file should be present:

  • sum_res.csv

Examples

Running SpartaABC in ‘amino’ mode:

msa.fasta:

>1159906.H9BR03_9NIDO
MCNCLLQLRELYKLCNERNITRDDVLELIDPLIKTRCFAYSLVVLANANPIALSILPRKILINGEPLLLEYGNIYGKDFLYRPSLQVILEEEELN
>572290.B6VDY1_THCOV
MCNCLYQVKALVEYSKTH--GKTDVLELLDPLVKTRCFAYTLVVCINANPVAFSILPRKLLINGEPLLIEYGNVYGKDFIIRPSLQVILEEEC--
>572287.B6VDX2_9NIDO
MCNCIKQVAALVQHCKATNIHPSDVLELNDPLVAVKCLAYTLVLVTNADPVAFSILPRKILINGEPLLIEHGNVYGKDFLVRPSLQVILEEEVTD
>1586324.A0A0E3Y5V9_9NIDO
MCNCHLQLRDLYRLCNKRHIRREDVPELIDPLVKTRCFAYSLVVLANANPIAFSILPRKILINGEPLLLEYGSIYGKDFIIRPSLQVILEDELN-
>1159908.H9BR28_9NIDO
MCKCAQYIKVFNTN--IHGHSNTSLLTIDDHMLKYKCFAFALANLLVDNPIAGALLPRKMLINGKPILIEYGKIKADKFLINPPSKVEFYDD---

RAxML_tree.tree:

(1159908.H9BR28_9NIDO:1.12098,(572290.B6VDY1_THCOV:2.1089e-06,(572287.B6VDX2_9NIDO:0.328448,(1159906.H9BR03_9NIDO:0.107594,1586324.A0A0E3Y5V9_9NIDO:0.137885):0.333936):0.12119):1.12098);

Run command:

docker run -v /target_dir:/input spartaabc --path /input \
  --msaf msa.fasta \
  --trf newick_tree.tree \
  --mode amino

Please make sure that the target directory provided is a full path. Relative paths will not work.

After successfully running, you should see the following summary(there could be some small deviations, see FAQ):

The summary file ‘sum_res.csv’ should be in your input folder.

Running SpartaABC in ‘nuc’ mode:

msa.fasta:

>sbay
GTATGTTAGACCATTTAGAACGAGTCAGGATCCACTGATTTATTCCGTAA
GCG----ACATTATATTAACCAACCCTTTATA---G
>scer
GTATGTTGGATTACGCAAAGCAATTCAGGATCTACTGATCCAATC---AA
GCTTGTGATTTTTTATTAACAAATTCTCTATAATAG
>spar
GTATGTTAAATTATGCAAAACAACTCAGGATCTTCTGGTCCAGTC---AA
GCGTGTAATTT-TTATTAACAAGTTCTTTATAATAG
>smik
GTATGTTAGATTGTGCAAGAAAATTCAGATTCTACTTTCTCAATT---AA
GCGTGTAATATTTTATTAACAAATCCTTTTTA---G

newick_tree.tree:

(sbay:0.255377,(spar:0.081007,scer:0.084970):0.030395,smik:0.178788);

Run command:

docker run -v /target_dir:/input spartaabc --path /input \
  --msaf msa.fasta \
  --trf newick_tree.tree \
  --mode nuc \
  --submodel GTR \
  --freq 0.369764 0.165546 0.306709 0.157981 \
  --rates 0.443757853 0.084329474 0.115502265 0.107429571 0.000270340 \
  --inv-prop 0.000000 \
  --gamma-shape 99.852225 \
  --gamma-cats 4

Please make sure that the target directory provided is a full path. Relative paths will not work.

After successfully running, you should see the following summary(there could be some small deviations, see FAQ):

The summary file ‘sum_res.csv’ should be in your input folder.

SpartaABC Parameters

SpartaABC includes several optional parameters:

SpartaABC INDEL simulation parameters

–minr:

Used to define the lower boundary of the prior distribution of the INDEL rate (I_R and D_R for RIM. For SIM, the lower bound of the prior of R_ID is set to the sum of the lower bounds for I_R and D_R, thus if the lower bound is set to 0.05, the lower bound for the SIM prior will be 0.1).
Default: 0
Expects: float

–maxr:

Used to define the upper boundary of the INDEL rate.
Default: 0.05
Expects: float

–bn:

Specifies the number of kept MSAs, out of all simulated MSAs. The posterior estimate of model parameters is computed from these kept MSAs.
Default: 100
Expects: integer

Example command:

docker run -v /target_dir:/input spartaabc --path /input \
  --msaf msa_file.fasta \
  --trf tree_file.tree \
  --minr 0.01 \
  --maxr 0.1 \
  --bn 200

MSA Bias Correction Parameters

SpartaABC simulates MSAs with INDELs according to the given tree. This process creates exact MSAs since the sequence history is known. After the simulations are completed, a correction is needed to allow for an accurate comparison with the original MSA.

–numalign:

With this parameter we can define the number of generated MSAs to be used for learning the biases introduced by the alignment programs.
Default: 200.
Expects: integer.

–mode:

Define which type of characters are aligned.
Required
Expects: amino or nuc

–submodel:

Define which model of nucleotide substitutions should be used during the alignment correction. Currently supported models include JC and GTR+I+G when running in nuc mode and WAG when running in amino mode.
Default: JC
Expects: JC or GTR

If GTR is used the model expects five additional parameters:

–freq:

Specifies the nucleotide frequencies in the order T C A G.
Default: 0.25 0.25 0.25 0.25
Expects: float float float float

–rates:

Specifies the substitution rate parameters as described in the general rate matrix Q, In the order: a b c d e.

From: To: T C A G
T . \(a \times \pi_C\) \(b \times \pi_A\) \(c \times \pi_G\)
C \(a \times \pi_T\) . \(d \times \pi_A\) \(e \times \pi_G\)
A \(b \times \pi_T\) \(d \times \pi_C\) . \(f \times \pi_G\)
G \(c \times \pi_T\) \(e \times \pi_C\) \(f \times \pi_A\) .

Expects: float float float float float

–inv-prop:

Specifies the invariable sites proportion.
Default: 0.25
Expects: float

–gamma-shape:

Specifies the shape parameter for the gamma distribution.
Default: 0.5
Expects: float

–gamma-cats:

Specifies the number of categories to be used in the discrete gamma approximation.
Default: 10
Expects: integer

Here is an example command to run the pipeline with the GTR+I+G model:

docker run -v /target_dir:/input spartaabc --path /input \
  --msaf msa_file.fasta \
  --trf tree_file.tree \
  --mode nuc \
  --submodel GTR \
  --freq 0.419586 0.213656 0.263450 0.103308 \
  --rates 0.156974803 0.036887524 0.001154569 0.000002267 0.389328070 \
  --inv-prop 0.232846 \
  --gamma-shape 23.439602 \
  --gamma-cats 4