view map_peptides_to_bed.xml @ 5:8175366a591f draft

planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/map_peptides_to_bed commit bd3b68ee27bc3b50b9970d4fe801cb1a13900f4c
author galaxyp
date Thu, 15 Dec 2016 18:36:42 -0500
parents 88d2c4c2cb0a
children aac354f8d5e8
line wrap: on
line source

<tool id="map_peptides_to_bed" name="Map peptides to a bed file" version="0.1.1">
    <description>for viewing in a genome browser</description>
    <requirements>
        <requirement type="package" version="1.62">biopython</requirement>
    </requirements>
    <stdio>
        <exit_code range="1:" />
    </stdio>
    <command><![CDATA[
        python '$__tool_directory__/map_peptides_to_bed.py'
          --translated_bed='$translated_bed'
          --input='$input'
          #if $peptide_column:
            --peptide_column=$peptide_column 
          #end if
          #if $name_column:
            --name_column=$name_column
          #end if
          #if $start_column:
            --start_column=$start_column
          #end if
          $gffTags
          --bed='$mapped_peptides'
    ]]></command>
    <inputs>
        <param name="translated_bed" type="data" format="bed" label="Translated BED" 
               help="mapping Protein IDs from a Protein Search fasta to a reference genome"/>
        <param name="input" type="data" format="tabular" label="Identified Peptides" 
               help="Such as a  PSM (Peptide Spectral Match) report from a Proteomics Search Application"/>
        <param name="peptide_column" type="data_column" data_ref="input" label="PSM peptide column" optional="true" 
               help="Contains the peptide amino acid sequence. Defaults to first column"/>
        <param name="name_column" type="data_column" data_ref="input" label="PSM protein name column" optional="true" 
               help="The name in this column must match the name column in the Translate BED input. Defaults to second column." />
        <param name="start_column" type="data_column" data_ref="input" label="PSM peptide offset column (optional)" optional="true">
               <help>The offset in AnimoAcids of the peptide from the start of the protein sequence.  
                     If this column is not available, the application will expect the Translated BED file 
                     to have a 13th column with the protein sequence from which to determine the offset 
               </help>
        </param>
        <param name="gffTags" type="boolean" truevalue="--gffTags" falsevalue="" checked="true" label="Use #gffTags in output" 
               help="and use the peptide as the display name for the entry"/>
    </inputs>
    <outputs>
        <data name="mapped_peptides" format="bed" />
    </outputs>
    <tests>
        <test>
            <param name="translated_bed" ftype="bed" value="translated_bed_sequences.bed"/>
            <param name="input" ftype="tabular" value="peptides.tsv"/>
            <param name="peptide_column" value="2"/>
            <param name="name_column" value="1"/>
            <output name="mapped_peptides" file="mapped_peptides.bed"/>
        </test>
    </tests>
    <help><![CDATA[
**Map peptides to a bed file**

This tool is intended to map peptides identified by Mass Spectrometry relative to the protein sequence in the Proteomics Search database.

It generates a BED file that maps the location of peptides within proteins mapped to a reference genome so that they can be displayed in a genome browser.  


The input is a tabular file that has columns containing: peptide,  protein ID, and optionally the offset of the peptide from the start of the protein sequence. 

The other input is a BED file decribing the location of protein sequences relative to a reference genome.  
This file can be produced by the Translate BED Sequences tool when generating the fasta file of translations.  

The output is a BED file with lines from the input BED file that had peptide matches.  
The ID, thichStart, and thickEnd fields are changed to reflect the peptide match to that protein entry.


Inputs:

  - A BED file for the Proteins contained in a Proteomics Search Database. 
    (Should have a 13th column with the protein sequence.) 

  - A tabular file (Pepetide Spectral Matach report) that includes columns that contain the protein identifer and the peptide sequence.  


Output:

  - A BED file with an entry from the input BED file for each mapped input peptide, 
    with the ID, thichStart, and thickEnd fields are changed to reflect the peptide.


Example:

  Input BED file ::

    track name="novel_junction_peptides" type=bedDetail description="test"
    5	90931315	90932985	JUNC00034987_3	1	+	90931315	90932985	255,0,0	2	118,74	0,1596	AWRRGPAASRCSGAVWAEGLCEPRASPSARGAASGAAAGGPAERRDSIFQRLPLSASPFSHLFK
    X	157593859	157598461	JUNC00080388_3	11	-	157593859	157598461	255,0,0	2	14,151	0,4451	SPGLVRMVLCRPRPFLFPFGVSAPGREPLRAPAASACALPRGASVRPSKEIICVF


  Input PSM (Peptide Spectral Match) file ::

    25895	JUNC00034987_3	ASPSARGAASGAAAGGPAER
    15253	JUNC00080388_3	APAASACALPR


  Output BED file::

    track name="novel_junction_peptides" type=bedDetail description="test"
    #gffTags
    5	90931315	90932985	ID=JUNC00034987_3;Name=ASPSARGAASGAAAGGPAER	1	+	90931387	90932925	255,0,0	2	118,74	0,1596	ASPSARGAASGAAAGGPAER	AWRRGPAASRCSGAVWAEGLCEPRASPSARGAASGAAAGGPAERRDSIFQRLPLSASPFSHLFK
    X	157593859	157598461	ID=JUNC00080388_3;Name=APAASACALPR	11	-	157598338	157598371	255,0,0	2	14,151	0,4451	APAASACALPR	SPGLVRMVLCRPRPFLFPFGVSAPGREPLRAPAASACALPRGASVRPSKEIICVF


Usage: map_peptides_to_bed.py [options]

Options:
  -h, --help            show this help message and exit
  -t TRANSLATED_BED, --translated_bed=TRANSLATED_BED
                        A bed file with added 13th column having a translation
  -i INPUT, --input=INPUT
                        Tabular file with peptide_sequence column
  -p PEPTIDE_COLUMN, --peptide_column=PEPTIDE_COLUMN
                        column ordinal with peptide sequence
  -n NAME_COLUMN, --name_column=NAME_COLUMN
                        column ordinal with protein name
  -s START_COLUMN, --start_column=START_COLUMN
                        column with peptide start position in protein
  -B BED, --bed=BED     Output a bed file with added 13th column having
                        translation
  -T, --gffTags         Add #gffTags to bed output for IGV
  -d, --debug           Turn on wrapper debugging to stderr

    ]]></help>
</tool>