comparison mqppep_preproc.xml @ 1:2ccb9727516b draft

planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/mqppep commit 43e7a43b545c24b2dc33d039198551c032aa79be
author galaxyp
date Fri, 28 Oct 2022 18:24:43 +0000
parents ba62d93a9ef5
children 37203e501ea6
comparison
equal deleted inserted replaced
0:ba62d93a9ef5 1:2ccb9727516b
286 </assert_contents> 286 </assert_contents>
287 </output> 287 </output>
288 </test> 288 </test>
289 </tests> 289 </tests>
290 <help><![CDATA[ 290 <help><![CDATA[
291 ========================================================= 291 =============================================================
292 Phopsphoproteomic Enrichment Pipeline Preprocessing Steps 292 **Phopsphoproteomic Enrichment Pipeline Preprocessing Steps**
293 ========================================================= 293 =============================================================
294 294
295 **Overview** 295 *Overview*
296 ==========
296 297
297 Prior to statistical analysis, it is necessary to perform 298 Prior to statistical analysis, it is necessary to perform
298 three steps to transform the MaxQuant output 299 three steps to transform the MaxQuant output
299 for phosphoproteome-enriched samples. 300 for phosphoproteome-enriched samples.
300 301
301 **Workflow position** 302 *Workflow position*
302 303 ===================
303 ``upstream tool`` 304
304 The input data file for this tool is the ``Phospho (STY)Sites.txt`` file that is produced: 305 Upstream tool
305 306 =============
306 - by the Galaxy "MaxQuant" (``maxquant``) tool 307
307 - or by the Galaxy "Maxquant (using mqpar.xml)" (``maxquant_mqpar``) tool 308 The input dataset for this tool is the ``Phospho (STY)Sites.txt`` file that is produced:
308 - or by the desktop version of MaxQuant. 309
309 310 - by the Galaxy "MaxQuant" (``maxquant``) tool
310 ``downstream tool`` 311 - or by the Galaxy "Maxquant (using mqpar.xml)" (``maxquant_mqpar``) tool
311 The "MaxQuant Phosphopeptide ANOVA" tool (``mqppep_anova``) consumes the ``merged/filtered`` output file ``preproc_tab`` that this tool produces. 312 - or by the desktop version of MaxQuant.
312 313
313 ====================================================================== 314 Downstream tool
314 Phopsphoproteomic Enrichment Pipeline Localization-Probability Cut-Off 315 ===============
315 ====================================================================== 316
317 The "MaxQuant Phosphopeptide ANOVA" tool (``mqppep_anova``) consumes the "preprocessed" output file ``preproc_tab`` that this tool produces.
318
319 *Phopsphoproteomic Enrichment Pipeline Localization-Probability Cut-Off*
320 ========================================================================
316 321
317 This step applies a "localization-probability cut-off" for phosphopeptides for each phosphopeptide. 322 This step applies a "localization-probability cut-off" for phosphopeptides for each phosphopeptide.
318 Higher values may reduce the number of peptides in the output. 323 Higher values may reduce the number of peptides in the output.
319 The default value of 0.75 reflects the text of [Cheng 2018]: 324 The default value of 0.75 reflects the text of [Cheng 2018]:
320 325
334 [Bielow 2016] (available at `https://github.com/cbielow/PTXQC/ 339 [Bielow 2016] (available at `https://github.com/cbielow/PTXQC/
335 <https://github.com/cbielow/PTXQC/>`_) is run by the Galaxy wrappers for MaxQuant, 340 <https://github.com/cbielow/PTXQC/>`_) is run by the Galaxy wrappers for MaxQuant,
336 so it is omitted here even though it was included in Larry Cheng's original script. 341 so it is omitted here even though it was included in Larry Cheng's original script.
337 342
338 343
339 **Input dataset** 344 Input dataset
340 345 =============
341 ``phosphoSites`` 346
342 This is the ``MaxQuant Phospho (STY)Sites.txt`` file produced by MaxQuant. 347 Phospho (STY)Sites.txt
343 If you use the desktop version of MaxQuant, you will find this file in the ``txt`` folder. 348 This is the ``MaxQuant Phospho (STY)Sites.txt`` file produced by MaxQuant.
344 349 If you use the desktop version of MaxQuant, you will find this file in the ``txt`` folder.
345 **Output datasets** 350
351 Input parameters
352 ================
353
354 Localization probability cutoff
355 Minimum localization probability; see above.
356
357 Intensity merge-function
358 Specifies how intensities for identical phosphosites should be merged; see above.
359
360 Output datasets
361 ===============
346 362
347 ``ppep_intensities`` 363 ``ppep_intensities``
348 Data table (in tabular format) presenting, for each sample, the mass-spectral intensity of each phopshopeptide having localization probability greater than the cutoff. 364 Data table (in tabular format) presenting, for each sample, the mass-spectral intensity of each phopshopeptide having localization probability greater than the cutoff.
365
349 ``enrichment.pdf`` 366 ``enrichment.pdf``
350 Graph (in PDF format) presenting non-zero proportions of pS, pT, and pY among the phosphosites; note that a phosphopeptide may have multiple phosphosite. 367 Graph (in PDF format) presenting non-zero proportions of pS, pT, and pY among the phosphosites; note that a phosphopeptide may have multiple phosphosite.
368
351 ``locProbCutoff.pdf`` 369 ``locProbCutoff.pdf``
352 Graph (in PDF format) contrasting proportion of phosphopeptides above the localization probability cutoff with the proportion below. 370 Graph (in PDF format) contrasting proportion of phosphopeptides above the localization probability cutoff with the proportion below.
371
353 ``enrichment.svg`` 372 ``enrichment.svg``
354 Enrichment graph (in downloadable "scalable vector graphics" format) for incorporation into documents. 373 Enrichment graph (in downloadable "scalable vector graphics" format) for incorporation into documents.
374
355 ``locProbCutoff.svg`` 375 ``locProbCutoff.svg``
356 Localization probability cutoff graph (in downloadable "scalable vector graphics" format) for incorporation into documents. 376 Localization probability cutoff graph (in downloadable "scalable vector graphics" format) for incorporation into documents.
377
357 ``filteredData`` 378 ``filteredData``
358 Data table (in tabular format) comprising rows of the ``phosphSites`` input file that are not flagged as contaminants or reversed sequences. 379 Data table (in tabular format) comprising rows of the ``phosphSites`` input file that are not flagged as contaminants or reversed sequences.
380
359 ``quantData`` 381 ``quantData``
360 Data table (in tabular format) comprising rows of the ``filteredData`` file whose localization probability exceeds the **Localization Probability Cutoff** parameter. 382 Data table (in tabular format) comprising rows of the ``filteredData`` file whose localization probability exceeds the **Localization Probability Cutoff** parameter.
361 383
362 **Authors** 384 Authors
385 =======
363 386
364 ``Nicholas A. Graham`` 387 ``Nicholas A. Graham``
365 (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) initiated the original script. 388 (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) initiated the original script.
366 389
367 ``Larry C. Cheng`` 390 ``Larry C. Cheng``
372 395
373 ``James E. Johnson`` 396 ``James E. Johnson``
374 (University of Minnesota Supercomputing Institute) adapted the script to run in Galaxy. 397 (University of Minnesota Supercomputing Institute) adapted the script to run in Galaxy.
375 398
376 399
377 ============================================================= 400 *Phopsphoproteomic Enrichment Pipeline Upstream Kinase Mapping*
378 Phopsphoproteomic Enrichment Pipeline Upstream Kinase Mapping 401 ===============================================================
379 =============================================================
380 402
381 This step searches phosphopeptides against several databases for known or predicted sites. 403 This step searches phosphopeptides against several databases for known or predicted sites.
382 404
383 **Input databases** 405 Input databases
406 ===============
384 407
385 ``networkin`` 408 ``networkin``
386 This table is the result of filtering the NetworkKIN database [Linding 2007; Horn 2014] for cutoff score > 2.0. The ENSEMBL data used to generate the file were from Ensembl, `ensembl.org <https://web.archive.org/web/20220308011159/http://useast.ensembl.org/index.html>`_ [Howe 2021]. 409 This table is the result of filtering the NetworkKIN database [Linding 2007; Horn 2014] for cutoff score > 2.0. The ENSEMBL data used to generate the file were from Ensembl, `ensembl.org <https://web.archive.org/web/20220308011159/http://useast.ensembl.org/index.html>`_ [Howe 2021].
387 410
388 *To generate this file:* 411 To generate this file:
389 412
390 **(1)** Download the "precomputed data for all available kinase predictors against ENSEMBL" 413 (1) Download the "precomputed data for all available kinase predictors against ENSEMBL" (available at the NetworkKIN predictions link on the downloads page at https://web.archive.org/web/20200208000403/http://networkin.info/download/networkin_human_predictions_3.1.tsv.xz; N.B.: "Commercial users are requested to contact the authors before using the data on the networkin.info website");
391 (Available at the NetworkKIN predictions link on the downloads page at https://web.archive.org/web/20200208000403/http://networkin.info/download/networkin_human_predictions_3.1.tsv.xz; N.B.: "Commercial users are requested to contact the authors before using the data on the networkin.info website"); 414 (2) Decompress the .tsv.xz with file with "unxz" (from XZ Utils `https://tukaani.org/xz/ <https://tukaani.org/xz/>`_);
392 415 (3) Filter out the rows having "network_kin" less than 2.0.
393 **(2)** Decompress the .tsv.xz with file with "unxz" (from XZ Utils `https://tukaani.org/xz/ <https://tukaani.org/xz/>`_); 416
394 417 The result should be a tab-separated file with the following columns:
395 **(3)** Filter out the rows having "network_kin" less than 2.0. 418
396 419 - ``#substrate``
397 The result should be a tab-separated file with the following columns: 420 - ``position``
398 421 - ``id``
399 1. ``#substrate`` 422 - ``networkin_score``
400 2. ``position`` 423 - ``tree``
401 3. ``id`` 424 - ``netphorest_group``
402 4. ``networkin_score`` 425 - ``netphorest_score``
403 5. ``tree`` 426 - ``string_identifier``
404 6. ``netphorest_group`` 427 - ``string_score``
405 7. ``netphorest_score`` 428 - ``substrate_name``
406 8. ``string_identifier`` 429 - ``sequence``
407 9. ``string_score`` 430 - ``string_path``
408 10. ``substrate_name``
409 11. ``sequence``
410 12. ``string_path``
411 431
412 432
413 ``p_sty_motifs`` 433 ``p_sty_motifs``
414 This database merges motif patterns from [Amanchy 2007] and Phosida [Gnad 2011]. 434 This database merges motif patterns from [Amanchy 2007] and Phosida [Gnad 2011].
415 435
416 The Amanchy data are adapted from `http://hprd.org/serine_motifs <http://hprd.org/serine_motifs>`_ and `http://hprd.org/tyrosine_motifs <http://hprd.org/tyrosine_motifs>`_ (both links cite the reference where each motif was published), and the patterns are translated into Perl regular expression format (`https://perldoc.perl.org/perlre <https://perldoc.perl.org/perlre>`_). 436 The Amanchy data are adapted from `https://web.archive.org/web/*/http://hprd.org/serine_motifs <https://web.archive.org/web/*/http://hprd.org/serine_motifs>`_ and `https://web.archive.org/web/*/http://hprd.org/tyrosine_motifs <https://web.archive.org/web/*/http://hprd.org/tyrosine_motifs>`_ (both links cite the reference where each motif was published), and the patterns are translated into Perl regular expression format (`https://perldoc.perl.org/perlre <https://perldoc.perl.org/perlre>`_).
417 437
418 The Phosida data are adapted (translated to Perl-formatted regular expressions) from `http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx <http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx>`_ (this link cites the reference where each motif was published). 438 The Phosida data are adapted (translated to Perl-formatted regular expressions) from `http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx <http://pegasus.biochem.mpg.de/phosida/help/motifs.aspx>`_ (this link cites the reference where each motif was published).
419 439
420 This file has three tab-separated columns (and no header): 440 This file has three tab-separated columns (and no header):
421 441
422 1. column 1 is an (ignored) identifier 442 - column 1 is an (ignored) identifier
423 2. column 2 is a Perl regular expression 443 - column 2 is a Perl regular expression
424 3. column 3 is a descriptor. 444 - column 3 is a descriptor.
425 445
426 For two examples: 446 For two examples:
427 447
428 ``2<TAB>R.R..(pS|pT)<TAB>Akt kinase substrate motif (HPRD)`` 448 ``2<TAB>R.R..(pS|pT)<TAB>Akt kinase substrate motif (HPRD)``
429 449
430 ``10<TAB>R..(pS|pT)V<TAB>CAMK2_Phosida`` 450 ``10<TAB>R..(pS|pT)V<TAB>CAMK2_Phosida``
431 451
432 ``psp_kinase_substrate`` 452 ``psp_kinase_substrate``
433 'Kinase-substrate dataset: experimentally determined substrates, sequences, cognate kinases, and metadata curated from the literature' [Hornbeck 2011]. This tabular-formatted file may be downloaded for non-commercial purposes as 'Kinase_Substrate_Dataset.gz' from `https://www.phosphosite.org/staticDownloads.action <https://www.phosphosite.org/staticDownloads.action>`_. 453 'Kinase-substrate dataset: experimentally determined substrates, sequences, cognate kinases, and metadata curated from the literature' [Hornbeck 2011]. This tabular-formatted file may be downloaded for non-commercial purposes as 'Kinase_Substrate_Dataset.gz' from `https://www.phosphosite.org/staticDownloads.action <https://www.phosphosite.org/staticDownloads.action>`_.
434 454
435 Data extracted from PhosphoSitePlus(R), created by Cell Signaling Technology Inc. PhosphoSitePlus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (`https://creativecommons.org/licenses/by-nc-sa/3.0/ <https://creativecommons.org/licenses/by-nc-sa/3.0/>`_). Attribution must be given in written, oral and digital presentations to PhosphoSitePlus, www.phosphosite.org. Written documents should additionally cite: 455 Data extracted from PhosphoSitePlus(R), created by Cell Signaling Technology Inc. PhosphoSitePlus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (`https://creativecommons.org/licenses/by-nc-sa/3.0/ <https://creativecommons.org/licenses/by-nc-sa/3.0/>`_). Attribution must be given in written, oral and digital presentations to PhosphoSitePlus, www.phosphosite.org. Written documents should additionally cite:
436 456
437 Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261-D270.; www.phosphosite.org. 457 Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261-D270.; www.phosphosite.org.
438 458
439 ``psp_regulatory_sites`` 459 ``psp_regulatory_sites``
440 'Regulatory sites: information curated from the literature about modification sites shown to regulate molecular functions, biological processes, and molecular interactions including protein-protein interactions' [Hornbeck 2011]. This tabular-formatted file may be downloaded for non-commercial purposes as 'Regulatory_sites.gz' from `https://www.phosphosite.org/staticDownloads.action <https://www.phosphosite.org/staticDownloads.action>`_. 460 'Regulatory sites: information curated from the literature about modification sites shown to regulate molecular functions, biological processes, and molecular interactions including protein-protein interactions' [Hornbeck 2011]. This tabular-formatted file may be downloaded for non-commercial purposes as 'Regulatory_sites.gz' from `https://www.phosphosite.org/staticDownloads.action <https://www.phosphosite.org/staticDownloads.action>`_.
441 461
442 Terms of use and citatation are as for the ``psp_kinase_substrate`` file. 462 Terms of use and citatation are as for the ``psp_kinase_substrate`` file.
443 463
444 **Output datasets** 464 Output datasets
465 ===============
445 466
446 ``ppep_map`` 467 ``ppep_map``
447 Data table (in tabular format, consumed by the merge/filter step) presenting, for each phosphopeptide, the kinase mappings, the mass-spectral intensities for each sample, and the metadata from UniProtKB/SwissProt, phospho-sites, phospho-motifs, and regulatory sites. Data in the columns marked "``Domain``", "``ON_...``", or "``..._PhosphoSite``" are available subject to the following terms: 468 Data table (in tabular format, consumed by the merge/filter step) presenting, for each phosphopeptide, the kinase mappings, the mass-spectral intensities for each sample, and the metadata from UniProtKB/SwissProt, phospho-sites, phospho-motifs, and regulatory sites. Data in the columns marked "``Domain``", "``ON_...``", or "``..._PhosphoSite``" are available subject to the following terms:
448 469
449 "PhosphoSitePlus\ |reg| (PSP) was created by Cell Signaling Technology Inc. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License(`https://creativecommons.org/licenses/by-nc-sa/3.0/ <https://creativecommons.org/licenses/by-nc-sa/3.0/>`_). When using PSP data or analyses in printed publications or in online resources, the following acknowledgements must be included: (a) the words 'PhosphoSitePlus(R), www.phosphosite.org' must be included at appropriate places in the text or webpage, and (b) citation of [Hornbeck 2011 (`PMID: 25514926 <https://pubmed.ncbi.nlm.nih.gov/25514926>`_)] must be included in the bibliography." 470 "PhosphoSitePlus\ |reg| (PSP) was created by Cell Signaling Technology Inc. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License(`https://creativecommons.org/licenses/by-nc-sa/3.0/ <https://creativecommons.org/licenses/by-nc-sa/3.0/>`_). When using PSP data or analyses in printed publications or in online resources, the following acknowledgements must be included: (a) the words 'PhosphoSitePlus(R), www.phosphosite.org' must be included at appropriate places in the text or webpage, and (b) citation of [Hornbeck 2011 (`PMID: 25514926 <https://pubmed.ncbi.nlm.nih.gov/25514926>`_)] must be included in the bibliography."
453 Data table (in tabular format) presenting, for each phosphopeptide, the gene and one of the phospho-motifs or kinase-substrate sites. 474 Data table (in tabular format) presenting, for each phosphopeptide, the gene and one of the phospho-motifs or kinase-substrate sites.
454 475
455 ``ppep_mapping_sqlite`` 476 ``ppep_mapping_sqlite``
456 SQLite database (consumed by the merge/filter step). 477 SQLite database (consumed by the merge/filter step).
457 478
458 **Authors** 479 Authors
480 =======
459 481
460 ``Nicholas A. Graham`` 482 ``Nicholas A. Graham``
461 (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) wrote the original script. 483 (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) wrote the original script.
462 484
463 ``Arthur C. Eschenlauer`` 485 ``Arthur C. Eschenlauer``
464 (`ORCiD 0000-0002-2882-0508 <https://orcid.org/0000-0002-2882-0508>`_) adapted the script to run in Galaxy. 486 (`ORCiD 0000-0002-2882-0508 <https://orcid.org/0000-0002-2882-0508>`_) adapted the script to run in Galaxy.
465 487
466 488
467 ====================================================== 489 *Phopsphoproteomic Enrichment Pipeline Merge and Filter*
468 Phopsphoproteomic Enrichment Pipeline Merge and Filter 490 ========================================================
469 ======================================================
470 491
471 This step merges mapped metadata into metadata for phosphopeptides, filtering by species. 492 This step merges mapped metadata into metadata for phosphopeptides, filtering by species.
472 493
473 **Input parameters** 494 Input parameters
495 ================
474 496
475 ``species`` 497 ``species``
476 Limit PhosphoSitesPlus to indicated species. Default: **human** 498 Limit PhosphoSitesPlus to indicated species. Default: **human**
477 499
478 **Output datasets** 500 Output datasets
501 ===============
479 502
480 ``preproc_tab`` 503 ``preproc_tab``
481 Phosphopeptides annotated with SwissProt and phosphosite metadata, in tabular format. This file is designed to be consumed by the downstream ANOVA tool. Some data in the columns marked "PSP" are available subject to the following terms: 504 Phosphopeptides annotated with SwissProt and phosphosite metadata, in tabular format. This file is designed to be consumed by the downstream ANOVA tool. Some data in the columns marked "PSP" are available subject to the following terms:
482 505
483 "PhosphoSitePlus\ |reg| (PSP) was created by Cell Signaling Technology Inc. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License(`https://creativecommons.org/licenses/by-nc-sa/3.0/ <https://creativecommons.org/licenses/by-nc-sa/3.0/>`_). When using PSP data or analyses in printed publications or in online resources, the following acknowledgements must be included: (a) the words 'PhosphoSitePlus(R), www.phosphosite.org' must be included at appropriate places in the text or webpage, and (b) citation of [Hornbeck 2011 (`PMID: 25514926 <https://pubmed.ncbi.nlm.nih.gov/25514926>`_)] must be included in the bibliography." 506 "PhosphoSitePlus\ |reg| (PSP) was created by Cell Signaling Technology Inc. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License(`https://creativecommons.org/licenses/by-nc-sa/3.0/ <https://creativecommons.org/licenses/by-nc-sa/3.0/>`_). When using PSP data or analyses in printed publications or in online resources, the following acknowledgements must be included: (a) the words 'PhosphoSitePlus(R), www.phosphosite.org' must be included at appropriate places in the text or webpage, and (b) citation of [Hornbeck 2011 (`PMID: 25514926 <https://pubmed.ncbi.nlm.nih.gov/25514926>`_)] must be included in the bibliography."
486 Phosphopeptides annotated with SwissProt and phosphosite metadata, in CSV format. 509 Phosphopeptides annotated with SwissProt and phosphosite metadata, in CSV format.
487 510
488 ``preproc_sqlite`` 511 ``preproc_sqlite``
489 ``ppep_mapping_sqlite`` updated with annotations, in SQLite format. 512 ``ppep_mapping_sqlite`` updated with annotations, in SQLite format.
490 513
491 **Authors** 514 Authors
515 =======
492 516
493 ``Nicholas A. Graham`` 517 ``Nicholas A. Graham``
494 (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) initiated the original script. 518 (`ORCiD 0000-0002-6811-1941 <https://orcid.org/0000-0002-6811-1941>`_) initiated the original script.
495 519
496 ``Larry C. Cheng`` 520 ``Larry C. Cheng``