General information

Dispom has been implemented within the framework of Jstacs, an open source Java library for statistical analysis and classification of biological sequences. Since Jstacs has been released under GPL 3 (or later), Dispom is distributed under GPL 3 as well. The sources of Dispom can be obtained as part of the Jstacs sources from the Jstacs downloads page. You find the main class of Dispom at projects.dispom.Dispom.java.

Dispom as well as Jstacs are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Start instructions

Once you have unzipped the archive, you can start Dispom e.g. by invoking

java -jar Dispom.jar home=path/to/data/directory/ fg=fgfile.txt bg=bgfile.txt init=best-random=100 p-val=1E-4

to search for motifs that are over-represented in path/to/data/directory/fgfile.txt but not in path/to/data/directory/bgfile.txt, initialize Dispom with the best from 100 randomly drawn starting values, and search for motif occurrences with a p-value of less than 1E-4.

The arguments have the following meaning
name comment type

home the path to the data directory, default = ./ String
ignore the char that is used to mask comment lines in data files, e.g., '>' in a FASTA-file, default = > Character
fg the file name of the foreground data file (the file containing sequences which are expected to contain binding sites of a common motif) String
bg the file name of the background data file, OPTIONAL String
position a switch whether to use uniform, skew-normal, or mixture position distribution, range={UNIFORM, SKEW_NORMAL, MIXTURE}, default = MIXTURE String
mean the mean of the a priori TFBS distribution, default = 250.0 Double
sd the sd of the a priori TFBS distribution, valid range = [1.0, Infinity], default = 150.0 Double
motifs the number of motifs to be searched for, valid range = [1, 5], default = 1 Integer
length the motif length that is used at the beginning, valid range = [1, 50], default = 15 Integer
flankOrder The Markov order of the model for the flanking sequence and the background sequence, valid range = [0, 5], default = 0 Integer
motifOrder The Markov order of the motif model, valid range = [0, 3], default = 0 Integer
bothStrands a switch whether to use both strands or not, default = true Boolean
init the method that is used for initialization, one of 'best-random=', 'best-random-plugin=', 'best-random-motif=', 'enum-all=', 'enum-data=', 'heuristic=', and 'specific=' String=[Integer | String]
adjust a switch whether to adjust the motif length, i.e., either to shrink or expand, default = true Boolean
maxPos a switch whether to use max. pos. in the heuristic or not, default = true Boolean
learning a switch for the learning principle, range={ML, MAP, MCL, MSP}, default = MSP String
threads the number of threads that are use to evaluate the objective function and its gradient, valid range = [1, 128], default = 4 Integer
starts the number of independent starts of Dispom, valid range = [1, 100], default = 1 Integer
xml the file name of the xml file the classifier is written to, default = ./classifier.xml String
p-val a p-value for predicting binding sites, valid range = [0.0, 1.0], OPTIONAL Double

Case studies

In case studies presented in the paper, we started Dispom 50 times. For predicting binding sites, we used 'p-val=1E-4'. For additional information visit the Dispom page at www.jstacs.de

Dispom Predictor

In addition to the Dispom binary, we also provide a program (DispomPredictor.jar) that can be used to predict binding sites using an already trained classifier. Application of DispomPredictor.jar could be to predict binding sites of a motif found on some set of training data on additional, independent test data, or to test different p-values for predictions without the need to start the training process repeatedly as well.

You can start the Dispom predictor by invoking

java -jar DispomPredictor.jar home=path/to/data/directory/ fg=fgfile.txt bg=bgfile.txt p-val=1E-4 xml=./classifier.xml

The arguments have the following meaning
name comment type

home the path to the data directory, default = ./ String
ignore the char that is used to mask comment lines in data files, e.g., '>' in a FASTA-file, default = > Character
fg the file name of the foreground data file (the file containing sequences which are expected to contain binding sites of a common motif) String
bg the file name of the background data file, OPTIONAL String
xml the file name of the xml file the classifier has been written to, default = ./classifier.xml String
p-val a p-value for predicting binding sites, valid range = [0.0, 1.0] Double
one-histogram if no background file is specificed, p-values are computed either using a joint histogram (true), or a sequence-wise histogram (false), default = true, OPTIONAL boolean