# HG changeset patch # User galaxyp # Date 1551718182 18000 # Node ID e8822850243ad803bdfaf55ec900376ce32c8696 # Parent 6caa9011f24544359e15287b1b89cd8a684b95e7 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/dia_umpire commit 2379480213ba2e084a93bf82052fac858ffd074f diff -r 6caa9011f245 -r e8822850243a datatypes_conf.xml --- a/datatypes_conf.xml Mon Mar 04 11:49:18 2019 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,7 +0,0 @@ - - - - - - - diff -r 6caa9011f245 -r e8822850243a dia_umpire_quant.xml --- a/dia_umpire_quant.xml Mon Mar 04 11:49:18 2019 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,443 +0,0 @@ - - DIA quantitation and targeted re-extraction - - dia_umpire_macros.xml - - - - - $dia_umpire_quant && echo "Thread = \$GALAXY_SLOTS" >> $dia_umpire_quant -&& cp -rp $se_input.extra_files_path.__str__ $work_path.__str__ -&& ln -s $protxml_input ${work_path}/$interact_prot_xml -&& ln -s $searchdb_input ${work_path}/$searchdb_fa -#for $input in $mzxml_inputs: -&& ln -s $input ${work_path}/${input.name} -#end for -#for $input in $pepxml_inputs: -&& ln -s $input ${work_path}/${input.name} -#end for -## Make sure pep.xml and prot.xml start with "interact-" -## && echo "# $quant_params" >> $dia_umpire_quant -&& java -jar \$DIA_UMPIRE_QUANT_JAR $quant_params -&& cp $work_path/ProtSummary*.xls "$ProtSummary" -&& cp $work_path/PeptideSummary*.xls "$PeptideSummary" -&& cp $work_path/FragSummary*.xls "$FragSummary" -&& cp $work_path/IDNoSummary*.xls "$IDNoSummary" -&& cat $work_path/*.log "$logfile" -]]> - - - - - - - - 1 - - - 0 - - test modification with neutral losses - 123.456789 - 0 - 0 - - B - O - - - - 456.789123 - 0 - 0 - - - 789.123456 - 0 - 0 - - - 00 - testMod - - - - - - - - - - - - - - - - - - -InternalLibID: Identifier for the internal spectral library. -If you are processing the dataset for the first time, it will be used as the name for the new library, if you are reprocessing data (e.g. using different thresholds/FDR levels, etc.) first a library with that name will be looked up and used if found. -Recommended value: you can use the same name for all analysis; however it is beneficial to provide unique meaningful names, to make the library more easily reusable. - - - - - -Whether to process targeted re-extraction across samples and replicates. - - - - - -Typical values: if you are unsure what that prefix was, check protein names in the FASTA file. "rev_" and "DECOY_" are common choices. - - - - - - - - - - - - - - - - - - - - -ExternalLibSearch: Whether to process targeted extraction across samples and replicates to research unidentified peptide ions from specified external spectral library. Peptide ions in external library will be research if it satisfies the two conditions. (1) unidentified from initial database search, and (2) unidentified or identified but the probability was lower than the specified threswhold described below. - - - - - - - - -ExternalLibPath (new parameter in v1.4): File path of external spectral library file. Currently only traML and custom binary (.serFS) formats are supported, and a decoy spectrum for each forward peptide ion sequence is required in the library file. (Effective only when ExternalLibSearch is set as true) - - - - -ExternalLibDecoyTag: Decoy tag of decoy spectra. (default: DECOY) - - - - -ReSearchProb: Probability threshold to determine which peptide ion will be re-searched using external spectral library. (default: 0.5) - - - - - - - - - - - - - - - -PeptideFDR: Target peptide level FDR. -DIA-Umpire estimates peptide level FDR by target-decoy approach according to peptide ion's maximum PeptideProphet probability. (default: 0.01) -Recommended value: 0.01 or 0.05 are the standard thresholds used in proteomics studies, corresponding to 1% and 5% FDR. - - - - -ProteinFDR: Target protein level FDR. -DIA-Umpire fist removes protein identifications with low protein group probability (<0.5) and estimates protein level FDR of the remaining list by target-decoy approach according to the maximum peptide ion probability. (default: 0.01) -Recommended value: 0.01 or 0.05. - - - - -ProbThreshold: (0.0~0.99) Probability threshold for peptide-centric targeted extraction. This probability is calculated by DIA-Umpire based on LDA analysis of true and decoy targeted identifications. (default: 0.99) -Recommended value: 0.99 corresponds to 99% confidence in an ID. Which means FDR should be less than 1% in that case. - - - - - - - - - - - - - - - - - - -Minimum weight (peptide group weight or peptide weight chosen from the previous option) threshold of peptides to be considered for protein quantitation. Higher weight (closer to 1) of a peptide for a protein is more likely to be a unique peptide for the protein. (default: 0.9) -Recommended value: 0.9 - - - - -Top N fragments in terms of fragment score (Pearson correlation fragment intensity) used for determining peptide ion intensity (default:6). -Recommended value: 3 - 6 - - - - -Top N peptide ions in terms of peptide ion intensity (determined by top fragments) used for determining protein intensity (default:6) -Recommended value: 3~6 - - - - -Minimum frequency of a peptide ion or fragment across all samples/replicates to -be considered for Top N ranking. (default:0.5) Recommended value: 0.5 or more - - - - - - - - - - - - - - - - - - - - denotes the name of the raw file in which a peptide was identified) - - 1. Columns printed in protein summary table (ProtSummary.xls) - - 1. Protein Key: Protein accession number - 2. _Prob: Protein identification probability - 3. _Peptides: Number of identified peptide ions assigned to a protein - 4. _PSMs: Number of identified pseudo MS/MS spectra assigned to a protein - 5. _MS1_iBAQ: Protein abundance estimated by MS1 peptide intensities (See manuscript for details) (iBAQ: sum of all identified peptide intensities divided by the number of theoretical tryptic peptides) - 6. _TopNpep/TopNfra, Freq>freq: Protein abundance estimated by top scored peptide ions and fragments (See manuscript for details). - - 2. Columns printed in peptide ion summary table (PeptideSummary.xls) - - 1. Peptide Key: Peptide ion identifier - 2. Sequence: Peptide sequence - 3. ModSeq: Peptide sequence with modification information - 4. Proteins: Parent proteins - 5. mz: Precursor m/z of peptide ion - 6. Charge: Charge state of peptide ion - 7. MaxProb: Maximum identification probability of peptide ion across the whole data- set from untargeted MS/MS database search - 8. _Spec_Centric_Prob: Identification probability of a peptide ion from untargeted MS/MS database search - 9. _Pep_Centric_Prob: Identification probability of a peptide ion from targeted re-extraction matching - 10. _PSMs: The number of identified pseudo MS/MS spectra assigned to a peptide ion - 11. _RT: Retention time of a peptide ion - 12. _MS1: Peptide abundance estimated by MS1 precursor intensity 2.13. _TopNfra: Peptide abundance estimated by top N fragment ions - - 3. Columns printed in fragment summary table (FragSummary.xls) - - 1. Fragment Key: Fragment ion identifier - 2. Protein: Parent protein accession number - 3. Peptide: Parent peptide ion identifier - 4. Fragment: Fragment ion type - 5. FragMz: m/z of fragment ion - 6. _RT: Retention time of parent peptide ion - 7. _Spec_Centric_Prob: Identification probability of peptide ion from untargeted MS/MS database search - 8. _Pep_Centric_Prob: Identification probability of peptide ion from targeted re-extraction matching - 9. _Intensity: fragment intensity - 10. _Corr: Elution profile Pearson correlation between fragment ion and precursor peptide ion - 11. _PPM: Mass error of an observed fragment m/z to the theoretical one - -]]> - - - diff -r 6caa9011f245 -r e8822850243a dia_umpire_se.xml --- a/dia_umpire_se.xml Mon Mar 04 11:49:18 2019 -0500 +++ b/dia_umpire_se.xml Mon Mar 04 11:49:42 2019 -0500 @@ -7,27 +7,16 @@ $se_ser -#else: -#set se_params = $params mkdir '$output_dir' && cat $se_config > $se_params -#end if -## && echo " " >> $se_params && echo "Thread = \$GALAXY_SLOTS" >> $se_params #if $input_prefix and len($input_prefix.strip()) > 0: #set $input_path = str($output_dir) + '/' + $input_prefix.__str__ + '_rep' + str($i + 1) + '.mzXML' #else: -#set $input_path = str($output_dir) + '/' + $re.sub('\.[mM]\w+$','',$re.sub('[^-a-zA-Z0-9_.]','_',$input.name)) + '.mzXML' +#set $input_path = str($output_dir) + '/' + $re.sub('\.[mM]\w+$','',$re.sub('[^-a-zA-Z0-9_.]','_',$input.element_identifier)) + '.mzXML' #end if && ln -s '${input}' '$input_path' && dia_umpire_se '$input_path' '$se_params' @@ -201,8 +190,6 @@ [a-zA-Z][a-zA-Z0-9_-]* - - @@ -210,7 +197,6 @@ - SE.MS1PPM: (Unit: ppm) Maximum mass error for two MS1 peaks in consecutive spectra to be considered signal of the same ion. Used in MS1 signal detection and precursor alignment between samples/runs. @@ -224,7 +210,6 @@ - @@ -287,7 +272,6 @@ RTOverlap: Retention time overlap. (Default: 0.3) - DeltaApex: (Unit: minute) Maximum retention time difference of LC profile apexes between precursor and fragment (the lower, the more stringent). (Default: 0.6) @@ -313,7 +297,6 @@ - SE.MinMSIntensity: Minimum signal intensity for a peak in an MS1 spectrum to be considered as a valid signal. Any MS1 peak having intensity lower than this threshold will be ignored. It is the main parameter controlling how many peaks and isotopic envelopes will be detected. @@ -378,7 +361,6 @@ - @@ -394,7 +376,6 @@ SE.MinMS2NoPeakCluster (new parameter in v1.4): Minimum number of isotope peaks for a MS2 feature. When it is set as 1, the algorithm will group fragments even for peaks without any isotope signal being found. For these cases, the assumed charged states will be from the parameter SE.StartCharge to SE.EndCharge. - @@ -437,8 +418,6 @@ - @@ -446,12 +425,7 @@ - - se_extraction_data - - - not se_extraction_data - + ExportPrecursorPeak @@ -489,7 +463,7 @@ - + @@ -561,9 +535,7 @@ Note: Each file corresponds to a different "quality level" of precursor ions (Q1= More than two isotopic peaks detected in MS1, Q2 = only two isotopic peak detected, Q3 = detected unfragmented precursor in MS2). These spectra are written to separate files, because they must be searched separately against a protein database as a consequence of differences in FDR estimates for these varying quality data. - 2. *DIA_Umpire_SE Signal Extraction data* - includes the binary files (.ser) containing contain all necessary information for quantitation procedures (parameter settings, all detected precursor and fragment peaks, precursor-fragment grouping information). - - 3. If ExportPrecursorPeak and/or ExportFragmentPeak options were set to true, text files with detailed information about detected MS1 and/or MS2 features will be generated. + 2. If ExportPrecursorPeak and/or ExportFragmentPeak options were set to true, text files with detailed information about detected MS1 and/or MS2 features will be generated. ]]> diff -r 6caa9011f245 -r e8822850243a test-data/LongSwath_UPS1_1ug_rep1_xs_Q2.mgf --- a/test-data/LongSwath_UPS1_1ug_rep1_xs_Q2.mgf Mon Mar 04 11:49:18 2019 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,25 +0,0 @@ -BEGIN IONS -PEPMASS=740.93756 -CHARGE=4+ -RTINSECONDS=23.515736 -TITLE=LongSwath_UPS1_1ug_rep1_xs_Q2.1.1.4 -289.0418 0.13421604 -462.80182 0.34596336 -476.83914 0.076175064 -495.8407 0.28123242 -505.83884 0.40484485 -510.82834 0.26279047 -512.8057 0.08942752 -516.8521 0.09888018 -528.8025 0.17339894 -539.8589 0.034855265 -548.77325 0.2268137 -561.8681 0.36307892 -563.7804 0.02051069 -566.7381 0.3546458 -581.84204 0.34910008 -588.8908 0.33360612 -600.7914 0.04130452 -647.8723 0.42873022 -END IONS -