![]() |
SpartaABC A web server to simulate sequences based on indel parameters inferred using an approximate Bayesian computation algorithm |
|
Research Site | Pupko Group |
||
We provide a docker version of SpartaABC. Docker is a platform that allows running and sharing programs efficiently on multiple platforms. This allows running SpartaABC on any operation system that runs Docker, including Linux, Windows, and Mac OS X. If Docker is not yet installed, please visit https://www.docker.com/ to install Docker. We will demonstrate the usage of the pipeline on Linux. The full source for SpartaABC is available on Github.
To get the latest build of SpartaABC enter the following in your terminal (in Windows, please type it in the terminal, which you can open using the command CMD from Window’s start menu):
docker pull orenavram/sparta:spartaabc
When the command ends you should see the following message:
Status: Downloaded newer image for orenavram/sparta:spartaabc
docker.io/orenavram/sparta:spartaabc
The whole process should look very similar to this:
Next please run the following two commands:
docker image tag orenavram/sparta:spartaabc spartaabc
docker rmi orenavram/sparta:spartaabc
These commands simplify calling the SpartaABC program (in technical terms, they allow simplifying the image tag). To check that everything is properly installed, please run the following command:
docker run spartaabc --help
If everything up to this point went correctly you should see the following:
To run the pipeline we first need to create a target directory in which we will place our input files and to which the output will be delivered:
mkdir target_dir
Next we need to move our input files to the target directory. To run SpartaABC you must provide the following:
Here are the Unix command to move such input files to the target directory:
mv msa_file target_dir
mv tree_file target_dir
Where msa_file is the name of your MSA file, tree_file is the name of your tree file and target_dir is the directory in which these files are placed.
Then to run SpartaABC with default parameters see the parameters and examples sections below.
If SpartaABC ran correctly the target_dir should contain the results of the run. The following file should be present:
msa.fasta:
>1159906.H9BR03_9NIDO
MCNCLLQLRELYKLCNERNITRDDVLELIDPLIKTRCFAYSLVVLANANPIALSILPRKILINGEPLLLEYGNIYGKDFLYRPSLQVILEEEELN
>572290.B6VDY1_THCOV
MCNCLYQVKALVEYSKTH--GKTDVLELLDPLVKTRCFAYTLVVCINANPVAFSILPRKLLINGEPLLIEYGNVYGKDFIIRPSLQVILEEEC--
>572287.B6VDX2_9NIDO
MCNCIKQVAALVQHCKATNIHPSDVLELNDPLVAVKCLAYTLVLVTNADPVAFSILPRKILINGEPLLIEHGNVYGKDFLVRPSLQVILEEEVTD
>1586324.A0A0E3Y5V9_9NIDO
MCNCHLQLRDLYRLCNKRHIRREDVPELIDPLVKTRCFAYSLVVLANANPIAFSILPRKILINGEPLLLEYGSIYGKDFIIRPSLQVILEDELN-
>1159908.H9BR28_9NIDO
MCKCAQYIKVFNTN--IHGHSNTSLLTIDDHMLKYKCFAFALANLLVDNPIAGALLPRKMLINGKPILIEYGKIKADKFLINPPSKVEFYDD---
RAxML_tree.tree:
(1159908.H9BR28_9NIDO:1.12098,(572290.B6VDY1_THCOV:2.1089e-06,(572287.B6VDX2_9NIDO:0.328448,(1159906.H9BR03_9NIDO:0.107594,1586324.A0A0E3Y5V9_9NIDO:0.137885):0.333936):0.12119):1.12098);
Run command:
docker run -v /target_dir:/input spartaabc --path /input \
--msaf msa.fasta \
--trf newick_tree.tree \
--mode amino
Please make sure that the target directory provided is a full path. Relative paths will not work.
After successfully running, you should see the following summary(there could be some small deviations, see FAQ):
The summary file ‘sum_res.csv’ should be in your input folder.
msa.fasta:
>sbay
GTATGTTAGACCATTTAGAACGAGTCAGGATCCACTGATTTATTCCGTAA
GCG----ACATTATATTAACCAACCCTTTATA---G
>scer
GTATGTTGGATTACGCAAAGCAATTCAGGATCTACTGATCCAATC---AA
GCTTGTGATTTTTTATTAACAAATTCTCTATAATAG
>spar
GTATGTTAAATTATGCAAAACAACTCAGGATCTTCTGGTCCAGTC---AA
GCGTGTAATTT-TTATTAACAAGTTCTTTATAATAG
>smik
GTATGTTAGATTGTGCAAGAAAATTCAGATTCTACTTTCTCAATT---AA
GCGTGTAATATTTTATTAACAAATCCTTTTTA---G
newick_tree.tree:
(sbay:0.255377,(spar:0.081007,scer:0.084970):0.030395,smik:0.178788);
Run command:
docker run -v /target_dir:/input spartaabc --path /input \
--msaf msa.fasta \
--trf newick_tree.tree \
--mode nuc \
--submodel GTR \
--freq 0.369764 0.165546 0.306709 0.157981 \
--rates 0.443757853 0.084329474 0.115502265 0.107429571 0.000270340 \
--inv-prop 0.000000 \
--gamma-shape 99.852225 \
--gamma-cats 4
Please make sure that the target directory provided is a full path. Relative paths will not work.
After successfully running, you should see the following summary(there could be some small deviations, see FAQ):
The summary file ‘sum_res.csv’ should be in your input folder.
SpartaABC includes several optional parameters:
Used to define the lower boundary of the prior distribution of the INDEL rate (I_R and D_R for RIM. For SIM, the lower bound of the prior of R_ID is set to the sum of the lower bounds for I_R and D_R, thus if the lower bound is set to 0.05, the lower bound for the SIM prior will be 0.1).
Default: 0
Expects: float
Used to define the upper boundary of the INDEL rate.
Default: 0.05
Expects: float
Specifies the number of kept MSAs, out of all simulated MSAs. The posterior estimate of model parameters is computed from these kept MSAs.
Default: 100
Expects: integer
Example command:
docker run -v /target_dir:/input spartaabc --path /input \
--msaf msa_file.fasta \
--trf tree_file.tree \
--minr 0.01 \
--maxr 0.1 \
--bn 200
SpartaABC simulates MSAs with INDELs according to the given tree. This process creates exact MSAs since the sequence history is known. After the simulations are completed, a correction is needed to allow for an accurate comparison with the original MSA.
With this parameter we can define the number of generated MSAs to be used for learning the biases introduced by the alignment programs.
Default: 200.
Expects: integer.
Define which type of characters are aligned.
Required
Expects: amino or nuc
Define which model of nucleotide substitutions should be used during the alignment correction. Currently supported models include JC and GTR+I+G when running in nuc mode and WAG when running in amino mode.
Default: JC
Expects: JC or GTR
If GTR is used the model expects five additional parameters:
Specifies the nucleotide frequencies in the order T C A G.
Default: 0.25 0.25 0.25 0.25
Expects: float float float float
Specifies the substitution rate parameters as described in the general rate matrix Q, In the order: a b c d e.
From: | To: | T | C | A | G |
---|---|---|---|---|---|
T | . | \(a \times \pi_C\) | \(b \times \pi_A\) | \(c \times \pi_G\) | |
C | \(a \times \pi_T\) | . | \(d \times \pi_A\) | \(e \times \pi_G\) | |
A | \(b \times \pi_T\) | \(d \times \pi_C\) | . | \(f \times \pi_G\) | |
G | \(c \times \pi_T\) | \(e \times \pi_C\) | \(f \times \pi_A\) | . |
Expects: float float float float float
Specifies the invariable sites proportion.
Default: 0.25
Expects: float
Specifies the shape parameter for the gamma distribution.
Default: 0.5
Expects: float
Specifies the number of categories to be used in the discrete gamma approximation.
Default: 10
Expects: integer
Here is an example command to run the pipeline with the GTR+I+G model:
docker run -v /target_dir:/input spartaabc --path /input \
--msaf msa_file.fasta \
--trf tree_file.tree \
--mode nuc \
--submodel GTR \
--freq 0.419586 0.213656 0.263450 0.103308 \
--rates 0.156974803 0.036887524 0.001154569 0.000002267 0.389328070 \
--inv-prop 0.232846 \
--gamma-shape 23.439602 \
--gamma-cats 4