# HG changeset patch # User rlegendre # Date 1421933693 -3600 # Node ID 313b8f7d2a92e2a0ba8f3d9eb14df83e5e453879 # Parent 707807fee542dc5c811e6732bbc8c9ce03f6b3c2 diff -r 707807fee542 -r 313b8f7d2a92 get_codon_frequency.xml --- a/get_codon_frequency.xml Thu Jan 22 14:34:38 2015 +0100 +++ b/get_codon_frequency.xml Thu Jan 22 14:34:53 2015 +0100 @@ -1,5 +1,5 @@ - Analyse Ribo-seq alignments between two conditions to extract codon occupancy + To compare Ribo-seq alignments between two sets of conditions, to determine codon occupancy samtools matplotlib @@ -32,7 +32,7 @@ - + @@ -40,23 +40,23 @@ ## Use conditional balise : if rep =yes : 4 files, else 2 files - - - - + + + + - - + + - + - + @@ -74,14 +74,12 @@ Summary ------- -This tool uses Ribo-seq (bam file) to compare codon translation between two conditions. -For each footprint, codons at choosen site are saved and an histogram with all normalized codon numbers is plotted in both conditions. -A second histogram groups all codons corresponding to an amino acid. -A chisquare test is used for testing if distribution of used codons is the same in both conditions. +This tool uses Ribo-seq (BAM file) to determine whether codon occupancy differs between two sets of conditions. For each footprint, the codons at the chosen site are recorded and a histogram displaying all the normalised codon numbers is plotted for both sets of conditions. A second histogram groups together all the codons corresponding to a given amino acid. A chi-squared test is then carried out to determine whether the distribution of the codons used is the same in both sets of conditions. + Output ------- -This tool provides an html output with graphical outputs and a statistical test result. An additionnal csv file with codon numbers is provided. +This tool provides an html file containing graphs and a statistical test result. An additional csv file with codon numbers is provided. Dependances diff -r 707807fee542 -r 313b8f7d2a92 kmer_analysis.xml --- a/kmer_analysis.xml Thu Jan 22 14:34:38 2015 +0100 +++ b/kmer_analysis.xml Thu Jan 22 14:34:53 2015 +0100 @@ -1,5 +1,5 @@ - Compute proportion of each kmer and phasing + To calculate the proportion and phasing of each kmer samtools numpy @@ -12,7 +12,7 @@ - + @@ -25,11 +25,11 @@ Summary ------- -This tool uses Ribo-seq data (bam file) to compute proportion of each kmer (lenght of footprints) and phasing. +The kmer tool computes the distribution of footprints length from Bam file and determines the proportion of footprints beginning in each frame, for all annotated genes in the GFF file. Output ------- -This tool provides an html report with all kmer proportion and phasing. +This tool provides an html report detailing the proportions and phasing of the kmers. Dependances diff -r 707807fee542 -r 313b8f7d2a92 metagene_frameshift_analysis.xml --- a/metagene_frameshift_analysis.xml Thu Jan 22 14:34:38 2015 +0100 +++ b/metagene_frameshift_analysis.xml Thu Jan 22 14:34:53 2015 +0100 @@ -1,5 +1,5 @@ - Analyse Ribo-seq alignment to extract translational ambiguities events + To analyse Ribo-seq alignments for the extraction of translational ambiguities samtools matplotlib @@ -8,16 +8,18 @@ Bio - metagene_frameshift_analysis.py --input $reference --bam $mapping --cutoff $cutoff --kmer $kmer --fasta $fasta --dirout $output,$output.files_path --box $boxplot> $log + metagene_frameshift_analysis.py --gff $reference --bam $mapping --cutoff $cutoff --kmer $kmer --fasta $fasta --dirout $output,$output.files_path --box $boxplot --orf_length $orf --frame $frame > $log - - - - + + + + + + @@ -30,35 +32,35 @@ Summary ------- -This tool uses Ribo-seq data (bam file) to extract out-of-frame footprints in all genes from a reference annotation file (GFF3). Subprofile are plotted for each gene with dual coding events. +This tool uses Ribo-seq data (BAM file) to extract out-of-frame footprints in all genes from a reference annotation file (GFF3). Subprofiles are plotted for each gene with dual coding events. -*- GFF3 file* : It must contain 9 tabulate-delimited columns : Chromosome, source, feature, start, stop, score, strand, phasing, note. The gene ID was retrieved in note field by "ID=" tag. +*- GFF3 file*: This file must have nine tabulated-delimited columns: Chromosome, source, feature, start, stop, score, strand, phasing, note. The gene ID is retrieved from the note field, using the "ID=" tag. -*- Fasta file* : Reference fasta file. Be careful about the chromosome nomenclature used : it must be compatible with your GFF3 annotation file. +*- Fasta file*: Reference fasta file. Care should be taken with the chromosome nomenclature used, which must be compatible with the GFF3 annotation file. -*- BAM file* : It must be sorted. It can contain either multiples or unaligned footprints +*- BAM file*: This file should be sorted and may contain either multiple or unaligned footprints -*- Kmer* : Lenght of the best phasing footprint. You can compute it running kmer_analysis - -*- Cutoff* : Integer value for selecting all genes that have less than 60 % (default) of footprints in coding frame. +*- Kmer*: Length of the best-phased footprints. It can be calculated by running kmer_analysis - +*- Frame*: Frame for which the phasing of the footprints is best. It can be calculated by running kmer_analysis + +*- Cutoff*: An integer value for selecting all genes for which fewer than 60 % (default) of the footprints are in the coding frame. -................................................................................................................................................................................................. +*- Orf*: Approximate size of the segment. Output ------- -This tool generates 2 output files : +This tool generates 3 output files: -*- html file* : relative to translational ambiguities detection and visualization. +*- html file*: for the detection and visualisation of translational ambiguities. -*- Stat file* : statistiques about treated footprints and phasing. +*- Stat file*: this file provides statistics for the treated footprints and phasing. -*- Boxplot* : Proportion of footprints in the three frames for all genes. - +*- Boxplot*: Proportion of footprints in the three frames, for all genes. + Dependances ------------ diff -r 707807fee542 -r 313b8f7d2a92 metagene_readthrough.xml --- a/metagene_readthrough.xml Thu Jan 22 14:34:38 2015 +0100 +++ b/metagene_readthrough.xml Thu Jan 22 14:34:53 2015 +0100 @@ -1,5 +1,5 @@ - Analyse Ribo-seq alignment to detect readthrough events + To analyse Ribo-seq alignments for the detection of stop codon readthrough events samtools HTseq @@ -8,14 +8,15 @@ Bio - metagene_readthrough.py --gff $gff --fasta $fasta --bam $mapping --dirout=$output,$output.files_path + metagene_readthrough.py --gff $gff --fasta $fasta --bam $mapping --dirout=$output,$output.files_path --extend $ext - - - + + + + @@ -25,28 +26,28 @@ Summary ------- -This tool uses Ribo-seq data (bam file) to extract potential genes with readthrough events from a reference annotation file (GFF3). +This tool uses Ribo-seq data (BAM file) to extract genes displaying potential stop codon readthrough events from a reference annotation file (GFF3). -C-terminal protein extensions were identified as previously described (Dunn J.G. and al, 2013). Only uniquely mapped footprints whose size is in the range 25 to 34 are considered. -A gene is read-though if : +C-terminal protein extensions were identified as previously described (Dunn J.G. et al, 2013). Only uniquely mapped footprints with a size in the 25 to 34 range are considered. +A gene is considered to display readthrough if: i) It is covered by more than 128 footprints. - ii) There are footprints after stop codon. + ii) There are footprints after the stop codon. - iii) There are footprints overlapping the next in frame stop codon. + iii) There are footprints overlapping the next in-frame stop codon. - iv) There is not Methionine in the next five codons downstream the official stop codon of CDS. + iv) There is no methionine codon in the next five codons downstream from the official stop codon of the CDS. - v) The coverage is homogeneous within the extension. + v) Coverage is homogeneous within the extension. -Stop codon readthrough was estimated by calculating a ratio between footprints in the C-terminal extension and in the CDS. Ribosome density footprints were estimated in RPKM (reads per kilobase per million). -To control variability due to stop codon peaks, footprints mapping to stop codons are excluded to RPKM computing. +Stop codon readthrough was estimated by calculating the ratio of the number of footprints in the C-terminal extension to that in the CDS. Ribosome density footprints were estimated in RPKM (reads per kilobase per million). We controlled for variability due to stop codon peaks, by excluding footprints mapping to stop codons from the calculation of RPKM. + Output ------- -This tool produces html file with plots for each readthrough gene. - +This tool produces an html file with plots for each gene displaying readthrough. + Dependances ------------