# HG changeset patch # User peterjc # Date 1307482804 14400 # Node ID 1426b2bae76df0b26e74cecb25a033db5e30c9be # Parent fe10f448d64131ff332c361156baa2b26d8f1adf Migrated tool version 0.0.7 from old tool shed archive to new tool shed repository diff -r fe10f448d641 -r 1426b2bae76d tools/protein_analysis/README --- a/tools/protein_analysis/README Tue Jun 07 17:39:26 2011 -0400 +++ b/tools/protein_analysis/README Tue Jun 07 17:40:04 2011 -0400 @@ -73,6 +73,9 @@ v0.0.4 - Ignore comment lines in tmhmm2 output. v0.0.5 - Explicitly request tmhmm short output (may not be the default) v0.0.6 - Improvement to how sub-jobs are run (should be faster) +v0.0.7 - Change SignalP default truncation from 60 to 70 to match the + SignalP webservice. + Developers ========== diff -r fe10f448d641 -r 1426b2bae76d tools/protein_analysis/seq_analysis_utils.py diff -r fe10f448d641 -r 1426b2bae76d tools/protein_analysis/signalp3.xml --- a/tools/protein_analysis/signalp3.xml Tue Jun 07 17:39:26 2011 -0400 +++ b/tools/protein_analysis/signalp3.xml Tue Jun 07 17:40:04 2011 -0400 @@ -1,4 +1,4 @@ - + Find signal peptides in protein sequences signalp3.py $organism $truncate 8 $fasta_file $tabular_file @@ -11,7 +11,7 @@ - + @@ -46,6 +46,12 @@ + + + + + + @@ -67,9 +73,9 @@ The NN output comprises three different scores (C-max, S-max and Y-max) and two scores derived from them (S-mean and D-score). -The C-score is the 'cleavage site' score. For each position in the submitted sequence, a C-score is reported, which should only be significantly high at the cleavage site. Confusion is often seen with the position numbering of the cleavage site. When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein, meaning that a reported cleavage site between amino acid 26-27 corresponds to that the mature protein starts at (and include) position 27. +The C-score is the 'cleavage site' score. For each position in the submitted sequence, a C-score is reported, which should only be significantly high at the cleavage site. Confusion is often seen with the position numbering of the cleavage site. When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein, meaning that a predicted cleavage site between amino acid 26-27 is reported as 27, corresponding to the mature protein starting at (and including) position 27. -The S-score for the signal peptide prediction is calculateded for every single amino acid position in the submitted sequence (not shown in the output via Galaxy), with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein. +The S-score for the signal peptide prediction is calculated for every single amino acid position in the submitted sequence (not shown in the output via Galaxy), with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein. Y-max is a derivative of the C-score combined with the S-score resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The cleavage site is assigned from the Y-score where the slope of the S-score is steep and a significant C-score is found. diff -r fe10f448d641 -r 1426b2bae76d tools/protein_analysis/suite_config.xml --- a/tools/protein_analysis/suite_config.xml Tue Jun 07 17:39:26 2011 -0400 +++ b/tools/protein_analysis/suite_config.xml Tue Jun 07 17:40:04 2011 -0400 @@ -1,9 +1,9 @@ - + Wrappers for TMHMM and SignalP Find transmembrane domains in protein sequences - + Find signal peptides in protein sequences