view metagene_readthrough.xml @ 8:adc01e560eae

Uploaded
author rlegendre
date Mon, 20 Oct 2014 11:34:11 -0400
parents 015db5db052c
children 313b8f7d2a92
line wrap: on
line source

<tool id="readthrough_analysis" name="Stop_supp">
	<description> Analyse Ribo-seq alignment to detect readthrough events</description>
	<requirements>
	    <requirement type="package">samtools</requirement>
	    <requirement type="python-module">HTseq</requirement>
	    <requirement type="python-module">pysam</requirement>
	    <requirement type="python-module">csv</requirement>
	    <requirement type="python-module">Bio</requirement>
	</requirements>
	<command interpreter="python"> 
		metagene_readthrough.py --gff $gff --fasta $fasta --bam $mapping --dirout=$output,$output.files_path

	</command>

	<inputs>
		<param name="gff" type="data" label="References Input Annotation File (gff)" format="gff" />
		<param name="fasta" type="data" label="Reference in fasta format" format="fasta" />
		<param name="mapping" type="data" label="Bam Input File" format="bam" />
	</inputs>
            
	<outputs>
		<data format="html"  name="output" label="[RP]Readthrough results on  ${on_string}"/>
	</outputs>
        
	<help>
Summary
-------          
This tool uses Ribo-seq data (bam file) to extract potential genes with readthrough events from a reference annotation file (GFF3).

C-terminal protein extensions were identified as previously described (Dunn J.G. and al, 2013). Only uniquely mapped footprints whose size is in the range 25 to 34 are considered.
A gene is read-though if :

 i) It is covered by more than 128 footprints.

 ii) There are footprints after stop codon.

 iii) There are footprints overlapping the next in frame stop codon.

 iv) There is not Methionine in the next five codons downstream the official stop codon of CDS.

 v) The coverage is homogeneous within the extension.

Stop codon readthrough was estimated by calculating a ratio between footprints in the C-terminal extension and in the CDS. Ribosome density footprints were estimated in RPKM (reads per kilobase per million). 
To control variability due to stop codon peaks, footprints mapping to stop codons are excluded to RPKM computing.

Output 
------- 
This tool produces html file with plots for each readthrough gene.


Dependances
------------

.. class:: warningmark

This tool depends on Python (>=2.7) and following packages : numpy 1.8.0, Biopython 1.58, matplotlib 1.3.1, HTSeq and pysam. Samtools is used for bam manipulation.

	</help>
</tool>