Mercurial > repos > rlegendre > ribo_tools

--- a/get_codon_frequency.xml	Thu Jan 22 14:34:38 2015 +0100
+++ b/get_codon_frequency.xml	Thu Jan 22 14:34:53 2015 +0100
@@ -1,5 +1,5 @@
 <tool id="Codon_analysis" name="Codon_density">
-	<description> Analyse Ribo-seq alignments between two conditions to extract codon occupancy </description>
+	<description>To compare Ribo-seq alignments between two sets of conditions, to determine codon occupancy </description>
 	<requirements>
 	    <requirement type="package">samtools</requirement>
 	    <requirement type="python-module">matplotlib</requirement>
@@ -32,7 +32,7 @@
 	</command>

 	<inputs>
-		<param name="annot" type="data" label="References Input Annotation File (gff)" format="gff" />
+		<param name="annot" type="data" label="Reference annotation file (GFF)" format="gff" />
 		<conditional name="replicat_opt">
 			<param name="rep" type="select" label="Replicate option">
 				<option value="yes">Yes (only two replicates by condition)</option>
@@ -40,23 +40,23 @@
 			</param>
 			<when value="yes">
 			## Use conditional balise : if rep =yes : 4 files, else 2 files
-				<param name="file1" type="data" label="First replicate of first condition (bam file)" format="bam" />
-				<param name="file11" type="data" label="Second replicate of first condition (bam file)" format="bam" />
-				<param name="file2" type="data" label="First replicate of second condition (bam file)" format="bam" />
-				<param name="file22" type="data" label="First replicate of second condition (bam file)" format="bam" />
+				<param name="file1" type="data" label="First replicate of first condition (Bam file)" format="bam" />
+				<param name="file11" type="data" label="Second replicate of first condition (Bam file)" format="bam" />
+				<param name="file2" type="data" label="First replicate of second condition (Bam file)" format="bam" />
+				<param name="file22" type="data" label="First replicate of second condition (Bam file)" format="bam" />
 			</when>
 			<when value="no">
-				<param name="file1" type="data" label="First bam file" format="bam" />
-				<param name="file2" type="data" label="Second bam File" format="bam" />
+				<param name="file1" type="data" label="First Bam file" format="bam" />
+				<param name="file2" type="data" label="Second Bam File" format="bam" />
 			</when>
 		</conditional>
-		<param name="site" type="select" label="Please choose a ribosome site for codon analysis">
+		<param name="site" type="select" label="Please choose a ribosomal site for codon analysis">
 			<option value="A">A</option>
 			<option value="P">P</option>
 			<option value="E">E</option>
 		</param>
 		<param name="asite" type="integer" label="Off-set from the 5'end of the footprint to the A-site" value ="15"  />
-		<param name="kmer" type="integer" label="Size of the best phasing reads" value ="28"  />
+		<param name="kmer" type="integer" label="Lenght of the best phasing footprints" value ="28"  />
 		<param name="cond1" type="text" size="25" label="Condition one" help="Required even if no replicate" />
 		<param name="cond2" type="text" size="25" label="Condition two" help="Required even if no replicate" />
 		<param name="color1" type="text" size="50" label="Color for first condition" value ="SkyBlue" help="Enter standard name, hex color string, or rgb code. You cand find all colors here : http://pythonhosted.org/ete2/reference/reference_svgcolors.html" />
@@ -74,14 +74,12 @@

 Summary
 -------
-This tool uses Ribo-seq (bam file) to compare codon translation between two conditions.
-For each footprint, codons at choosen site are saved and an histogram with all normalized codon numbers is plotted in both conditions.
-A second histogram groups all codons corresponding to an amino acid.
-A chisquare test is used for testing if distribution of used codons is the same in both conditions.
+This tool uses Ribo-seq (BAM file) to determine whether codon occupancy differs between two sets of conditions. For each footprint, the codons at the chosen site are recorded and a histogram displaying all the normalised codon numbers is plotted for both sets of conditions. A second histogram groups together all the codons corresponding to a given amino acid. A chi-squared test is then carried out to determine whether the distribution of the codons used is the same in both sets of conditions.
+

 Output
 -------
-This tool provides an html output with graphical outputs and a statistical test result. An additionnal csv file with codon numbers is provided.
+This tool provides an html file containing graphs and a statistical test result. An additional csv file with codon numbers is provided.


 Dependances
--- a/kmer_analysis.xml	Thu Jan 22 14:34:38 2015 +0100
+++ b/kmer_analysis.xml	Thu Jan 22 14:34:53 2015 +0100
@@ -1,5 +1,5 @@
 <tool id="kmer_analysis" name="Kmer">
-	<description>Compute proportion of each kmer and phasing</description>
+	<description>To calculate the proportion and phasing of each kmer</description>
 	<requirements>
 		<requirement type="package">samtools</requirement>
 	    <requirement type="package">numpy</requirement>
@@ -12,7 +12,7 @@
 	</command>

 	<inputs>
-		<param name="gff" type="data" label="References Input Annotation File (gff)" format="gff" />
+		<param name="gff" type="data" label="Reference annotation file (GFF))" format="gff" />
 		<param name="bamfile" type="data" label="Bam file" format="bam" />
 	</inputs>

@@ -25,11 +25,11 @@

 Summary
 -------
-This tool uses Ribo-seq data (bam file) to compute proportion of each kmer (lenght of footprints) and phasing.
+The kmer tool computes the distribution of footprints length from Bam file and determines the proportion of footprints beginning in each frame, for all annotated genes in the GFF file.

 Output
 -------
-This tool provides an html report with all kmer proportion and phasing.
+This tool provides an html report detailing the proportions and phasing of the kmers.


 Dependances
--- a/metagene_frameshift_analysis.xml	Thu Jan 22 14:34:38 2015 +0100
+++ b/metagene_frameshift_analysis.xml	Thu Jan 22 14:34:53 2015 +0100
@@ -1,5 +1,5 @@
 <tool id="frameshift_analysis" name="Frame">
-	<description> Analyse Ribo-seq alignment to extract translational ambiguities events</description>
+	<description>To analyse Ribo-seq alignments for the extraction of translational ambiguities</description>
 	<requirements>
 	    <requirement type="package">samtools</requirement>
 	    <requirement type="python-module">matplotlib</requirement>
@@ -8,16 +8,18 @@
 	    <requirement type="python-module">Bio</requirement>
 	</requirements>
 	<command interpreter="python">
-		metagene_frameshift_analysis.py --input $reference --bam $mapping --cutoff $cutoff --kmer $kmer --fasta $fasta --dirout $output,$output.files_path --box $boxplot> $log
+		metagene_frameshift_analysis.py --gff $reference --bam $mapping --cutoff $cutoff --kmer $kmer --fasta $fasta --dirout $output,$output.files_path --box $boxplot --orf_length $orf --frame $frame > $log

 	</command>

 	<inputs>
-		<param name="reference" type="data" label="References Input Annotation File (gff)" format="gff" />
-		<param name="mapping" type="data" label="Bam Input File" format="bam" />
-		<param name="fasta" type="data" label="Reference in fasta format" format="fasta" />
-		<param name="kmer" type="integer" label="Longer of the best phasing reads" value ="28"  />
+		<param name="reference" type="data" label="Reference annotation file (GFF)" format="gff" />
+		<param name="mapping" type="data" label="Bam file" format="bam" />
+		<param name="fasta" type="data" label="Reference genome in Fasta format" format="fasta" />
+		<param name="kmer" type="integer" label="Lenght of the best phasing footprints" value ="28"  />
+		<param name="frame" type="integer" label="Frame where footprints show best phasing. Must be 1, 2 or 3" value ="1"  />
 		<param name="cutoff" type="integer" label="Cutoff for frame proportion in coding phase (default = 60 %)" value ="60"  />
+		<param name="orf" type="integer" label="Approximate size of the segment (in bp)" value ="300"  />
 	</inputs>

 	<outputs>
@@ -30,35 +32,35 @@
 	<help>
 Summary
 -------
-This tool uses Ribo-seq data (bam file) to extract out-of-frame footprints in all genes from a reference annotation file (GFF3). Subprofile are plotted for each gene with dual coding events.
+This tool uses Ribo-seq data (BAM file) to extract out-of-frame footprints in all genes from a reference annotation file (GFF3). Subprofiles are plotted for each gene with dual coding events.


-*- GFF3 file* : It must contain 9 tabulate-delimited columns : Chromosome, source, feature, start, stop, score, strand, phasing, note. The gene ID was retrieved in note field by "ID=" tag.
+*- GFF3 file*: This file must have nine tabulated-delimited columns: Chromosome, source, feature, start, stop, score, strand, phasing, note. The gene ID is retrieved from the note field, using the "ID=" tag.

-*- Fasta file* : Reference fasta file. Be careful about the chromosome nomenclature used : it must be compatible with your GFF3 annotation file.
+*- Fasta file*: Reference fasta file. Care should be taken with the chromosome nomenclature used, which must be compatible with the GFF3 annotation file.

-*- BAM file* : It must be sorted. It can contain either multiples or unaligned footprints
+*- BAM file*: This file should be sorted and may contain either multiple or unaligned footprints

-*- Kmer* : Lenght of the best phasing footprint. You can compute it running kmer_analysis
-
-*- Cutoff* : Integer value for selecting all genes that have less than 60 % (default) of footprints in coding frame.
+*- Kmer*: Length of the best-phased footprints. It can be calculated by running kmer_analysis

-
+*- Frame*: Frame for which the phasing of the footprints is best. It can be calculated by running kmer_analysis
+
+*- Cutoff*: An integer value for selecting all genes for which fewer than 60 % (default) of the footprints are in the coding frame.

-.................................................................................................................................................................................................
+*- Orf*: Approximate size of the segment.


 Output
 -------
-This tool generates 2 output files :
+This tool generates 3 output files:

-*- html file* : relative to translational ambiguities detection and visualization.
+*- html file*: for the detection and visualisation of translational ambiguities.

-*- Stat file* : statistiques about treated footprints and phasing.
+*- Stat file*: this file provides statistics for the treated footprints and phasing.

-*- Boxplot* : Proportion of footprints in the three frames for all genes.
-
+*- Boxplot*: Proportion of footprints in the three frames, for all genes.
+

 Dependances
 ------------
--- a/metagene_readthrough.xml	Thu Jan 22 14:34:38 2015 +0100
+++ b/metagene_readthrough.xml	Thu Jan 22 14:34:53 2015 +0100
@@ -1,5 +1,5 @@
 <tool id="readthrough_analysis" name="Stop_supp">
-	<description> Analyse Ribo-seq alignment to detect readthrough events</description>
+	<description>To analyse Ribo-seq alignments for the detection of stop codon readthrough events</description>
 	<requirements>
 	    <requirement type="package">samtools</requirement>
 	    <requirement type="python-module">HTseq</requirement>
@@ -8,14 +8,15 @@
 	    <requirement type="python-module">Bio</requirement>
 	</requirements>
 	<command interpreter="python">
-		metagene_readthrough.py --gff $gff --fasta $fasta --bam $mapping --dirout=$output,$output.files_path
+		metagene_readthrough.py --gff $gff --fasta $fasta --bam $mapping --dirout=$output,$output.files_path --extend $ext

 	</command>

 	<inputs>
-		<param name="gff" type="data" label="References Input Annotation File (gff)" format="gff" />
-		<param name="fasta" type="data" label="Reference in fasta format" format="fasta" />
-		<param name="mapping" type="data" label="Bam Input File" format="bam" />
+		<param name="gff" type="data" label="Reference annotation file (GFF)" format="gff" />
+		<param name="fasta" type="data" label="Reference genome in Fasta format" format="fasta" />
+		<param name="mapping" type="data" label="Bam File" format="bam" />
+		<param name="ext" type="integer" label="Length of 3’ UTR extension downstream the annotated stop codon (in bp)" value="300" />
 	</inputs>

 	<outputs>
@@ -25,28 +26,28 @@
 	<help>
 Summary
 -------
-This tool uses Ribo-seq data (bam file) to extract potential genes with readthrough events from a reference annotation file (GFF3).
+This tool uses Ribo-seq data (BAM file) to extract genes displaying potential stop codon readthrough events from a reference annotation file (GFF3).

-C-terminal protein extensions were identified as previously described (Dunn J.G. and al, 2013). Only uniquely mapped footprints whose size is in the range 25 to 34 are considered.
-A gene is read-though if :
+C-terminal protein extensions were identified as previously described (Dunn J.G. et al, 2013). Only uniquely mapped footprints with a size in the 25 to 34 range are considered.
+A gene is considered to display readthrough if:

  i) It is covered by more than 128 footprints.

- ii) There are footprints after stop codon.
+ ii) There are footprints after the stop codon.

- iii) There are footprints overlapping the next in frame stop codon.
+ iii) There are footprints overlapping the next in-frame stop codon.

- iv) There is not Methionine in the next five codons downstream the official stop codon of CDS.
+ iv) There is no methionine codon in the next five codons downstream from the official stop codon of the CDS.

- v) The coverage is homogeneous within the extension.
+ v) Coverage is homogeneous within the extension.

-Stop codon readthrough was estimated by calculating a ratio between footprints in the C-terminal extension and in the CDS. Ribosome density footprints were estimated in RPKM (reads per kilobase per million).
-To control variability due to stop codon peaks, footprints mapping to stop codons are excluded to RPKM computing.
+Stop codon readthrough was estimated by calculating the ratio of the number of footprints in the C-terminal extension to that in the CDS. Ribosome density footprints were estimated in RPKM (reads per kilobase per million). We controlled for variability due to stop codon peaks, by excluding footprints mapping to stop codons from the calculation of RPKM.

+
 Output
 -------
-This tool produces html file with plots for each readthrough gene.
-
+This tool produces an html file with plots for each gene displaying readthrough.
+

 Dependances
 ------------