Galaxy | (sandbox for testing) |

Changeset 1:34cbb9e749da (2013-06-12)

Previous changeset 0:82e0af566160 (2013-06-12) Next changeset 2:515a5e61874c (2013-06-12)

Commit message:
Uploaded

removed:
rgedgeR/rgGSEA.py
rgedgeR/rgGSEAcolumns.xml
rgedgeR/rgedgeR.xml
rgedgeR/rgedgeRglm.xml
rgedgeR/rgedgeRpaired.xml.iaas1

diff -r 82e0af566160 -r 34cbb9e749da rgedgeR/rgGSEA.py
--- a/rgedgeR/rgGSEA.py Wed Jun 12 02:58:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

[

b'@@ -1,494 +0,0 @@\n-"""\n-April 2013\n-eeesh GSEA does NOT respect the mode flag!\n-\n-Now realise that the creation of the input rank file for gsea needs to take the lowest p value for duplicate \n-feature names. To make Ish\'s life easier, remove duplicate gene ids from any gene set to stop GSEA from \n-barfing.\n-\n-October 14 2012\n-Amazingly long time to figure out that GSEA fails with useless error message if any filename contains a dash "-"\n-eesh.\n-\n-Added history .gmt source - requires passing a faked name to gsea \n-Wrapper for GSEA http://www.broadinstitute.org/gsea/index.jsp\n-Started Feb 22 \n-Copyright 2012 Ross Lazarus\n-All rights reserved\n-Licensed under the LGPL\n-\n-called eg as\n-\n-#!/bin/sh\n-GALAXY_LIB="/data/extended/galaxy/lib"\n-if [ "$GALAXY_LIB" != "None" ]; then\n- if [ -n "$PYTHONPATH" ]; then\n- PYTHONPATH="$GALAXY_LIB:$PYTHONPATH"\n- else\n- PYTHONPATH="$GALAXY_LIB"\n- fi\n- export PYTHONPATH\n-fi\n-\n-cd /data/extended/galaxy/database/job_working_directory/027/27311\n-python /data/extended/galaxy/tools/rgenetics/rgGSEA.py --input_tab "/data/extended/galaxy/database/files/033/dataset_33806.dat" --adjpvalcol "5" --signcol "2"\n---idcol "1" --outhtml "/data/extended/galaxy/database/files/034/dataset_34455.dat" --input_name "actaearly-Controlearly-actalate-Controllate_topTable.xls"\n---setMax "500" --setMin "15" --nPerm "1000" --plotTop "20"\n---gsea_jar "/data/extended/galaxy/tool-data/shared/jars/gsea2-2.0.12.jar"\n---output_dir "/data/extended/galaxy/database/job_working_directory/027/27311/dataset_34455_files" --mode "Max_probe"\n- --title " actaearly-Controlearly-actalate-Controllate_interpro_GSEA" --builtin_gmt "/data/genomes/gsea/3.1/IPR_DOMAIN.gmt"\n-\n-\n-"""\n-import optparse\n-import tempfile\n-import os\n-import sys\n-import subprocess\n-import time\n-import shutil\n-import glob\n-import math\n-import re\n-\n-KEEPSELECTION = False # detailed records for selection of multiple probes\n-\n-def timenow():\n- """return current time as a string\n- """\n- return time.strftime(\'%d/%m/%Y %H:%M:%S\', time.localtime(time.time()))\n-\n-\n-\n-def fix_subdir(adir,destdir):\n- """ Galaxy wants everything in the same files_dir\n- if os.path.exists(adir):\n- for (d,dirs,files) in os.path.walk(adir):\n- for f in files:\n- sauce = os.path.join(d,f) \n- shutil.copy(sauce,destdir) \n- """\n-\n- def fixAffycrap(apath=\'\'):\n- """class=\'richTable\'>RUNNING ES</th><th class=\'richTable\'>CORE ENRICHMENT</th><tr><td class=\'lessen\'>1</td>\n- <td><a href=\'https://www.affymetrix.com/LinkServlet?probeset=LBR\'>LBR</a></td><td></td><td></td><td>1113</td>\n- <td>0.194</td><td>-0.1065</td><td>No</td></tr><tr><td class=\'lessen\'>2</td><td>\n- <a href=\'https://www.affymetrix.com/LinkServlet?probeset=GGPS1\'>GGPS1</a></td><td></td><td></td><td>4309</td><td>0.014</td><td>-0.4328</td>\n- <td>No</td></tr>\n- """\n- html = []\n- try:\n- html = open(apath,\'r\').readlines() \n- except:\n- return html\n- for i,row in enumerate(html):\n- row = re.sub(\'https\\:\\/\\/www.affymetrix.com\\/LinkServlet\\?probeset=\',"http://www.genecards.org/index.php?path=/Search/keyword/",row)\n- html[i] = row\n- return html\n-\n- cleanup = False\n- if os.path.exists(adir):\n- flist = os.listdir(adir) # get all files created\n- for f in flist:\n- apath = os.path.join(adir,f)\n- dest = os.path.join(destdir,f)\n- if not os.path.isdir(apath):\n- if os.path.splitext(f)[1].lower() == \'.html\':\n- html = fixAffycrap(apath)\n- fixed = open(apath,\'w\')\n- fixed.write(\'\\n\'.join(html))\n- fixed.write(\'\\n\')\n- fixed.close()\n- if not os.path.isfile(dest):\n- shutil.copy(apath,dest)\n- else:\n- fix_subdir(apath,destdir)\n- if cleanup:\n- '..b'n(html))\n- htmlf.write(\'\\n\')\n- htmlf.close()\n- os.unlink(self.fakeRanks)\n- os.unlink(self.fakeGMT)\n- if opts.outtab_neg:\n- tabs = glob.glob(os.path.join(opts.output_dir,"gsea_report_for_*.xls"))\n- if len(tabs) > 0:\n- for tabi,t in enumerate(tabs):\n- tkind = os.path.basename(t).split(\'_\')[4].lower()\n- if tkind == \'neg\':\n- outtab = opts.outtab_neg\n- elif tkind == \'pos\':\n- outtab = opts.outtab_pos\n- else:\n- print >> sys.stderr, \'## tab file matched %s which is not "neg" or "pos" in 4th segment %s\' % (t,tkind)\n- sys.exit()\n- content = open(t).readlines()\n- tabf = open(outtab,\'w\')\n- tabf.write(\'\'.join(content))\n- tabf.close()\n- else:\n- print >> sys.stdout, \'Odd, maketab = %s but no matches - tabs = %s\' % (makeTab,tabs)\n- return retval\n- \n-\n-if __name__ == "__main__":\n- """ \n- called as:\n- <command interpreter="python">rgGSEA.py --input_ranks "$input1" --outhtml "$html_file"\n- --setMax "$setMax" --setMin "$setMin" --nPerm "$nPerm" --plotTop "$plotTop" --gsea_jar "$GALAXY_DATA_INDEX_DIR/shared/jars/gsea2-2.07.jar" \n- --output_dir "$html_file.files_path" --use_gmt ""${use_gmt.fields.path}"" --chip "${use_chip.fields.path}"\n- </command>\n- """\n- op = optparse.OptionParser()\n- a = op.add_option\n- a(\'--input_ranks\',default=None)\n- a(\'--input_tab\',default=None)\n- a(\'--input_name\',default=None)\n- a(\'--use_gmt\',default=None)\n- a(\'--history_gmt\',default=None)\n- a(\'--builtin_gmt\',default=None)\n- a(\'--history_gmt_name\',default=None)\n- a(\'--setMax\',default="500")\n- a(\'--setMin\',default="15")\n- a(\'--nPerm\',default="1000") \n- a(\'--title\',default="GSEA report") \n- a(\'--chip\',default=\'\')\n- a(\'--plotTop\',default=\'20\')\n- a(\'--outhtml\',default=None)\n- a(\'--makeTab\',default=None)\n- a(\'--output_dir\',default=None)\n- a(\'--outtab_neg\',default=None)\n- a(\'--outtab_pos\',default=None)\n- a(\'--adjpvalcol\',default=None)\n- a(\'--signcol\',default=None)\n- a(\'--idcol\',default=None)\n- a(\'--mode\',default=\'Max_probe\')\n- a(\'-j\',\'--gsea_jar\',default=\'/usr/local/bin/gsea2-2.07.jar\')\n- opts, args = op.parse_args() \n- assert os.path.isfile(opts.gsea_jar),\'## GSEA runner unable to find supplied gsea java desktop executable file %s\' % opts.gsea_jar\n- if opts.input_ranks:\n- inpf = opts.input_ranks\n- else:\n- inpf = opts.input_tab\n- assert opts.idcol <> None, \'## GSEA runner needs an id column if a tabular file provided\'\n- assert opts.signcol <> None, \'## GSEA runner needs a sign column if a tabular file provided\'\n- assert opts.adjpvalcol <> None, \'## GSEA runner needs an adjusted p value column if a tabular file provided\'\n- assert os.path.isfile(inpf),\'## GSEA runner unable to open supplied input file %s\' % inpf\n- if opts.chip > \'\':\n- assert os.path.isfile(opts.chip),\'## GSEA runner unable to open supplied chip file %s\' % opts.chip\n- some = None\n- if opts.history_gmt <> None:\n- some = 1\n- assert os.path.isfile(opts.history_gmt),\'## GSEA runner unable to open supplied history gene set matrix (.gmt) file %s\' % opts.history_gmt\n- if opts.builtin_gmt <> None:\n- some = 1\n- assert os.path.isfile(opts.builtin_gmt),\'## GSEA runner unable to open supplied history gene set matrix (.gmt) file %s\' % opts.builtin_gmt\n- assert some, \'## GSEA runner needs a gene set matrix file - none chosen?\'\n- opts.title = re.sub(\'[^a-zA-Z0-9_]+\', \'\', opts.title)\n- myName=os.path.split(sys.argv[0])[-1]\n- gse = gsea_wrapper(myName, opts=opts)\n- retcode = gse.run()\n- if retcode <> 0:\n- sys.exit(retcode) # indicate failure to job runner\n- \n- \n'

diff -r 82e0af566160 -r 34cbb9e749da rgedgeR/rgGSEAcolumns.xml
--- a/rgedgeR/rgGSEAcolumns.xml Wed Jun 12 02:58:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

b'@@ -1,834 +0,0 @@\n-<tool id="rgGSEAcolumns" name="Gene set enrichment" version="0.05">\n- <description>using a generic tabular file</description>\n- <requirements>\n- <requirement type="package" version="2.0.12">gsea_jar</requirement>\n- <requirement type="set_environment">GSEAJAR_PATH</requirement>\n- </requirements>\n- <command interpreter="python">rgGSEA.py --input_tab "$input1" --adjpvalcol "$adjpvalcol" --signcol "$signcol" \n- --idcol "$idcol" --outhtml "$html_file" --input_name "${input1.name}"\n- --setMax "$setMax" --setMin "$setMin" --nPerm "$nPerm" --plotTop "$plotTop" \n- --gsea_jar "\\$GSEAJAR_PATH" \n- --output_dir "$html_file.files_path" --mode "$mode" --title "$title"\n-#if $makeTab.value==\'Yes\'\n- --outtab_pos "$outtab_pos" --outtab_neg "$outtab_neg"\n-#end if\n-#if $gmtSource.refgmtSource == "indexed" or $gmtSource.refgmtSource == "both":\n---builtin_gmt "${gmtSource.builtinGMT.fields.path}"\n-#end if\n-#if $gmtSource.refgmtSource == "history" or $gmtSource.refgmtSource == "both":\n---history_gmt "${gmtSource.ownGMT}" --history_gmt_name "${gmtSource.ownGMT.name}"\n-#end if\n-#if $chipSource.refchipSource=="builtin"\n- --chip "${chipSource.builtinChip.fields.path}"\n-#end if\n-#if $chipSource.refchipSource=="history"\n- --chip "${chipSource.ownChip}"\n-#end if\n-</command>\n- <inputs>\n- <param name="input1" type="data" format="tabular" label="Select a tab delimited file with a probe id, adjusted p value and a signed weight (eg t-test statistic) on each row"\n- help=""/>\n- <param name="adjpvalcol" label="Column containing a p value for the DEG statistical test" \n- help = "Use RAW p-values - not FDR adjusted as these have a non GSEA friendly non-uniform distribution"\n- type="data_column" data_ref="input1" numerical="True" \n- multiple="false" use_header_names="true" size="5">\n- <validator type="no_options" message="Please select a p value column."/>\n- </param>\n- <param name="signcol" label="Column containing the DE sign - eg log fold change so positive/negative values = upregulated/downregulated in treatment" \n- type="data_column" data_ref="input1" numerical="True" \n- multiple="false" use_header_names="true" size="5">\n- <validator type="no_options" message="Please select a sign column."/>\n- </param>\n- <param name="idcol" label="Column containing a gene id (refseq, symbol or Entrez" \n- type="data_column" data_ref="input1" numerical="False" \n- multiple="false" use_header_names="true" size="5">\n- <validator type="no_options" message="Please select an id column."/>\n- </param>\n-\n- <param name="title" type="text" value="GSEA" size="80" label="Title for job outputs" help="Supply a meaningful name here to remind you what the outputs contain"/>\n- <param name="setMin" type="integer" label="Minimum gene set size to prune (default=15)" size="5" value="15"/>\n- <param name="setMax" type="integer" label="Maximum gene set size to prune (default=500)" size="5" value="500"/>\n- <param name="nPerm" type="integer" label="Number of permutations (default=1000)" size="7" value="1000"/>\n- <param name="plotTop" type="integer" label="Number of top gene sets to plot and present in detailed reports(default=20)" size="10" value="20"/>\n- <param name="mode" type="select" label="Mode for dealing with duplicated gene ids" >\n- <option value="Max_probe" selected="true">Use the most extreme value</option>\n- <option value="Median_of_probes">Use the median of all supplied values</option>\n- </param>\n- <param name="makeTab" type="select" label="Create a tabular report containing ALL gene sets for downstream analysis" >\n- <option value="Yes">Yes</option>\n- <option value="No" selected="true">No</option>\n- </param>\n- <conditional name="chipSource" >\n- <param name="refchipSource" type="select" label="Translate the rank file IDs (first column) using a GSEA \'chip'..b"-regulated 4-like\n-\n- Rosetta.chip \n- Probe Set ID Gene Symbol Gene Title\n- NM_000504 F10 coagulation factor X\n- Contig32955_RC ARL6IP6 ADP-ribosylation-like factor 6 interacting protein 6\n- AK000455 MGC16733 hypothetical gene MGC16733 similar to CG12113\n-\n- RT_U34.chip \n- Probe Set ID Gene Symbol Gene Title\n- AA108277_at HSPH1 heat shock 105kDa/110kDa protein 1\n- AA108308_i_at MDM2_PREDICTED Transformed mouse 3T3 cell double minute 2 homolog (mouse) (predicted)\n- AA108308_s_at --- ---\n-\n- RZPD_Human_Ensembl1.1.chip \n- Probe Set ID Gene Symbol Gene Title\n- RZPDp203A011001D SARDH sarcosine dehydrogenase\n- RZPDp203A011002D ARHGAP22 Rho GTPase activating protein 22\n- RZPDp203A011003D CNGA3 cyclic nucleotide gated channel alpha 3\n-\n- RZPD_Human_ORF_Clones_Gateway.chip \n- Probe Set ID Gene Symbol Gene Title\n- RZPDo834A0110D - Gateway (closed) PTD015 PTD015 protein\n- RZPDo834A0114D - Gateway (closed) TRIB3 tribbles homolog 3 (Drosophila)\n- RZPDo834A0116D - Gateway (closed) ORM2 orosomucoid 2\n-\n- RZPD_Human_Unigene3.1.chip \n- Probe Set ID Gene Symbol Gene Title\n- HU3_p983A011001D SARDH sarcosine dehydrogenase\n- HU3_p983A011002D ARHGAP22 Rho GTPase activating protein 22\n- HU3_p983A011003D CNGA3 cyclic nucleotide gated channel alpha 3\n-\n- Seq_Accession.chip \n- Probe Set ID Gene Symbol\n- AA017197 C21ORF36\n- AA191116 MTVR2\n- AA280701 CXYORF7\n-\n- Stanford.chip \n- Probe Set ID Gene Symbol Gene Title\n- IMAGE:703849 DDB2 damage-specific DNA binding protein 2, 48kDa\n- IMAGE:1301778 ZFY zinc finger protein, Y-linked\n- IMAGE:795810 HS.99503HS.520681 Homo sapiens transcribed sequenceHomo sapiens, clone IMAGE:4823270, mRNA\n-\n- Stanford_Source_Accessions.chip \n- Probe Set ID Gene Symbol Gene Title\n- AI848107 0610010K14RIK RIKEN cDNA 0610010K14 gene\n- AK002491 0610010K14RIK RIKEN cDNA 0610010K14 gene\n- AK003842 0610010K14RIK RIKEN cDNA 0610010K14 gene\n-\n- TIGR_31K_Human_Set.chip \n- Probe Set ID Gene Symbol Gene Title\n- 1-1 NULL NULL\n- 1-10 WNT2 wingless-type MMTV integration site family member 2\n- 1-11 VHL von Hippel-Lindau tumor suppressor\n-\n- TIGR_40K_Human_Set.chip \n- Probe Set ID Gene Symbol Gene Title\n- 10 NULL NULL\n- 100 TEX27 testis expressed sequence 27\n- 1000 HOXA1 homeo box A1\n-\n- U133_X3P.chip \n- Probe Set ID Gene Symbol Gene Title\n- 1053_3p_at RFC2 replication factor C (activator 1) 2, 40kDa\n- 117_3p_at HSPA6 /// LOC652878 heat shock 70kDa protein 6 (HSP70B') /// similar to heat shock 70kDa protein 6 (HSP70B)\n- 1494_3p_f_at CYP2A6 cytochrome P450, family 2, subfamily A, polypeptide 6\n-\n- UCLA_NIH_33K.chip \n- Probe Set ID Gene Symbol Gene Title\n- 1020181 NULL NULL\n- 1020315 VAV2 vav 2 oncogene\n- 1020478 AP1GBP1 AP1 gamma subunit binding protein 1\n-\n- Zebrafish.chip \n- Probe Set ID Gene Symbol Gene Title\n- AFFX-BioB-3_at --- ---\n- AFFX-BioB-5_at --- ---\n- AFFX-BioB-M_at --- ---\n-\n-\n- .. _LGPL: http://www.gnu.org/copyleft/lesser.html\n- .. _GSEA: http://www.broadinstitute.org/gsea\n- .. _GUIDE: http://www.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html?_Interpreting_GSEA_Results\n- .. _MSigDB: http://www.broadinstitute.org/gsea/msigdb/index.jsp\n- .. _2005Paper: http://www.pnas.org/content/102/43/15545.full\n-\n-</help>\n-\n-</tool>\n-\n-\n"

diff -r 82e0af566160 -r 34cbb9e749da rgedgeR/rgedgeR.xml
--- a/rgedgeR/rgedgeR.xml Wed Jun 12 02:58:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

b'@@ -1,504 +0,0 @@\n-<tool id="rgedgeR" name="edgeR" version="0.18">\n- <description>digital DGE between two groups of replicates</description>\n- <command interpreter="python">\n- rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "edgeR" \n- --output_dir "$html_file.files_path" --output_html "$html_file" --output_tab "$outtab" --make_HTML "yes"\n- </command>\n- <inputs>\n- <param name="input1" type="data" format="tabular" label="Select an input matrix - rows are contigs, columns are counts for each sample"\n- help="Use the HTSeq based count matrix preparation tool to create these count matrices from BAM files and a GTF file"/>\n- <param name="title" type="text" value="DGE" size="80" label="Title for job outputs" help="Supply a meaningful name here to remind you what the outputs contain">\n- <sanitizer invalid_char="">\n- <valid initial="string.letters,string.digits"><add value="_" /> </valid>\n- </sanitizer>\n- </param>\n- <param name="treatment_name" type="text" value="Treatment" size="50" label="Treatment Name"/>\n- <param name="Treat_cols" label="Select columns containing treatment." type="data_column" data_ref="input1" numerical="True" \n- multiple="true" use_header_names="true" size="120" display="checkboxes">\n- <validator type="no_options" message="Please select at least one column."/>\n- </param>\n- <param name="control_name" type="text" value="Control" size="50" label="Control Name"/>\n- <param name="Control_cols" label="Select columns containing control." type="data_column" data_ref="input1" numerical="True" \n- multiple="true" use_header_names="true" size="120" display="checkboxes" optional="true">\n- </param>\n- <param name="fQ" type="float" value="0.3" size="5" label="Non-differential contig count quantile threshold - zero to analyze all non-zero read count contigs"\n- help="May be a good or a bad idea depending on the biology and the question. EG 0.3 = sparsest 30% of contigs with at least one read are removed before analysis"/>\n- <param name="useQuantile" type="boolean" truevalue="T" checked=\'false\' falsevalue="" size="1" label="Non differential filter - remove contigs below a threshold (1 per million) for half or more samples"\n- help="May be a good or a bad idea depending on the biology and the question. This was the old default. Quantile based is available as an alternative"/>\n- <param name="priorn" type="integer" value="4" size="3" label="prior.df for tagwise dispersion - lower value = more emphasis on each tag\'s variance - note this used to be prior.n"\n- help="Zero = auto-estimate. 1 to force high variance tags out. Use a small value to \'smooth\' small samples. See edgeR docs and note below"/>\n- <param name="fdrthresh" type="float" value="0.05" size="5" label="P value threshold for FDR filtering for amily wise error rate control"\n- help="Conventional default value of 0.05 recommended"/>\n- <param name="fdrtype" type="select" label="FDR (Type II error) control method" \n- help="Use fdr or bh typically to control for the number of tests in a reliable way">\n- <option value="fdr" selected="true">fdr</option>\n- <option value="BH">Benjamini Hochberg</option>\n- <option value="BY">Benjamini Yukateli</option>\n- <option value="bonferroni">Bonferroni</option>\n- <option value="hochberg">Hochberg</option>\n- <option value="holm">Holm</option>\n- <option value="hommel">Hommel</option>\n- <option value="none">no control for multiple tests</option>\n- </param>\n- </inputs>\n- <outputs>\n- <data format="tabular" name="outtab" label="${title}.xls"/>\n- <data format="html" name="html_file" label="${title}.html"/>\n- <data format="gsearank" name="outgsea" label="${title}.gsearank">\n- <filter> makeRank == \'Yes\' </filter>\n- </data>\n- </outputs>\n-<configfiles>\n-<configfile name="runme">\n-\n-# edgeR.Rscript\n-#'..b'names(Count_Matrix),sep="_") #Relable columns\n-if (priorn <= 0) {priorn = ceiling(20/(length(group)-1))} # estimate prior.n if not provided\n-# see http://comments.gmane.org/gmane.comp.lang.r.sequencing/2009 \n-results = edgeIt(Count_Matrix=Count_Matrix,group=group,outputfilename=outputfilename,fdrtype=fdrtype,priorn=priorn,fdrthresh=fdrthresh,\n- outputdir=Out_Dir,myTitle=myTitle,libSize=c(),useQuantile=useQuantile,filterquantile=fQ) #Run the main function\n-# for the log\n-\n-\n-sessionInfo()\n-\n-\n-</configfile>\n-</configfiles>\n-<tests>\n-<test>\n-<param name=\'input1\' value=\'DGEtest.xls\' ftype=\'tabular\' />\n- <param name=\'treatment_name\' value=\'case\' />\n- <param name=\'title\' value=\'DGEtest\' />\n- <param name=\'fdrtype\' value=\'fdr\' />\n- <param name=\'priorn\' value="5" />\n- <param name=\'fdrthresh\' value="0.05" />\n- <param name=\'control_name\' value=\'control\' />\n- <param name=\'Treat_cols\' value=\'c3,c6,c9\' />\n- <param name=\'Control_cols\' value=\'c2,c5,c8\' />\n- <output name=\'outtab\' file=\'DGEtest1out.xls\' ftype=\'tabular\' compare=\'diff\' />\n- <output name=\'html_file\' file=\'DGEtest1out.html\' ftype=\'html\' compare=\'diff\' lines_diff=\'20\' />\n-</test>\n-</tests>\n-<help>\n-**What it does**\n-\n-Performs digital gene expression analysis between a treatment and control on a matrix.\n-\n-**Documentation** Please see documentation_ for methods and parameter details \n-\n-**Input**\n-\n-A matrix consisting of non-negative integers. The matrix must have a unique header row identifiying the samples, as well as a unique set of row names \n-as the first column.\n-\n-**Output**\n-\n-A matrix which consists the original data and relative expression levels and some helpful plots\n-\n-**Note on edgeR versions**\n-\n-The edgeR authors made a small cosmetic change in the name of one important variable (from p.value to PValue) \n-breaking this and all other code that assumed the old name for this variable, \n-between edgeR2.4.4 and 2.4.6 (the version for R 2.14 as at the time of writing). \n-This means that all code using edgeR is sensitive to the version. I think this was a very unwise thing \n-to do because it wasted hours of my time to track down and will similarly cost other edgeR users dearly\n-when their old scripts break. This tool currently now works with 2.4.6.\n-\n-**Note on prior.N**\n-\n-http://seqanswers.com/forums/showthread.php?t=5591 says:\n-\n-*prior.n*\n-\n-The value for prior.n determines the amount of smoothing of tagwise dispersions towards the common dispersion. \n-You can think of it as like a "weight" for the common value. (It is actually the weight for the common likelihood \n-in the weighted likelihood equation). The larger the value for prior.n, the more smoothing, i.e. the closer your \n-tagwise dispersion estimates will be to the common dispersion. If you use a prior.n of 1, then that gives the \n-common likelihood the weight of one observation.\n-\n-In answer to your question, it is a good thing to squeeze the tagwise dispersions towards a common value, \n-or else you will be using very unreliable estimates of the dispersion. I would not recommend using the value that \n-you obtained from estimateSmoothing()---this is far too small and would result in virtually no moderation \n-(squeezing) of the tagwise dispersions. How many samples do you have in your experiment? \n-What is the experimental design? If you have few samples (less than 6) then I would suggest a prior.n of at least 10. \n-If you have more samples, then the tagwise dispersion estimates will be more reliable, \n-so you could consider using a smaller prior.n, although I would hesitate to use a prior.n less than 5. \n-\n-**Attribution** Copyright Ross Lazarus (ross period lazarus at gmail period com) May 2012\n-Derived from the implementation by Antony Kaspi and Sebastian Lunke at the BakerIDI\n-\n-All rights reserved.\n-\n-Licensed under the LGPL_\n-\n-.. _LGPL: http://www.gnu.org/copyleft/lesser.html\n-.. _documentation: http://bioconductor.org/packages/release/bioc/html/edgeR.html\n-</help>\n-\n-</tool>\n-\n-\n'

diff -r 82e0af566160 -r 34cbb9e749da rgedgeR/rgedgeRglm.xml
--- a/rgedgeR/rgedgeRglm.xml Wed Jun 12 02:58:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

b'@@ -1,537 +0,0 @@\n-\n-<tool id="rgedgeRglm" name="edgeRglm" version="0.18">\n- <description>digital DGE glm</description>\n- <command interpreter="python">\n- rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "edgeRglm" \n- --output_dir "$html_file.files_path" --output_html "$html_file" --make_HTML "yes"\n- </command>\n- <inputs>\n- <param name="input1" type="data" format="tabular" label="Select an input matrix - rows are contigs, columns are counts for each sample"\n- help="Use the DGE matrix preparation tool to create these matrices from BAM files and a BED file of contigs"/>\n- <param name="title" type="text" value="Factorial DGE" size="80" label="Title for job outputs" help="Supply a meaningful name here to remind you what the outputs contain">\n- <sanitizer invalid_char="">\n- <valid initial="string.letters,string.digits"><add value="_" /> </valid>\n- </sanitizer>\n- </param>\n- <param name="factor1name" type="text" value="Factor 1" size="80" label="Factor 1 name" help="Supply a meaningful name here to remind you when looking at the results">\n- <sanitizer invalid_char="">\n- <valid initial="string.letters,string.digits"><add value="_" /> </valid>\n- </sanitizer>\n- </param>\n- <param name="factor1" type="text" optional="false" size="120"\n- label="Enter comma separated values to indicate the factor level for all input file count columns"\n- help="EG if there are 4 columns of counts from 2 treatment replicates (2,4) and 2 control replicates (3,5) then enter \'treat,control,treat,control\'">\n- <sanitizer>\n- <valid initial="string.digits,string.letters"><add value="," /> </valid>\n- </sanitizer>\n- </param>\n- <param name="factor2name" type="text" value="Factor 2" size="80" label="Factor 2 name" help="Supply a meaningful name here to remind you when looking at the results">\n- <sanitizer invalid_char="">\n- <valid initial="string.letters,string.digits"><add value="_" /> </valid>\n- </sanitizer>\n- </param>\n- <param name="factor2" type="text" optional="false" size="120"\n- label="Enter comma separated values to indicate factor 2 level for all input file count columns"\n- help="Leave blank if no factor 2, but eg if data from sample id A99 is in columns 2,4 and id C21 is in 3,5 then enter \'1,2,1,2\'">\n- <sanitizer>\n- <valid initial="string.digits,string.letters"><add value="," /> </valid>\n- </sanitizer>\n- </param>\n- <param name="fQ" type="float" value="0.3" size="5" label="Non-differential contig count quantile threshold - zero to analyze all non-zero read count contigs"\n- help="May be a good or a bad idea depending on the biology and the question. EG 0.3 = sparsest 30% of contigs with at least one read are removed before analysis"/>\n- <param name="useNDfilt" type="boolean" truevalue="T" checked=\'false\' falsevalue="" size="1" label="Non differential filter - remove contigs below a threshold (1 per million) for half or more samples"\n- help="May be a good or a bad idea depending on the biology and the question. This was the old default. Quantile based is available as an alternative"/>\n- <param name="priorn" type="integer" value="4" size="3" label="prior.df for tagwise dispersion - higher value = more emphasis on each tag\'s variance - note this used to be prior.n"\n- help="Zero = auto-estimate. 1 to force high variance tags out. Use a small value to \'smooth\' small samples. See edgeR docs and note below"/>\n- <param name="fdrthresh" type="float" value="0.05" size="5" label="P value threshold for FDR filtering for family wise error rate control"\n- help="Conventional default value of 0.05 recommended"/>\n- <param name="fdrtype" type="select" label="FDR (Type II error) control method" \n- help="Use fdr or bh typically to control for the number of tests in a reliable way">\n- <option value="fdr" selected="true">fdr</option>\n- <op'..b'parison\n-best thought of as control and treatment (whatever that means) for each of the main comparisons. \n-\n-The interaction is defined as the difference between those two comparisons and is reported as a topTable as are the \n-primary comparisons.\n-\n-All comparisons are reported as separate tabular spreadsheets ordered by p value and a comprehensive summary is \n-provided in the html output.\n-\n-This code essentially embelishes the code described by Gordon Smythe in the limma documentation for a factorial \n-analysis.\n-\n-**Input**\n-\n-A matrix consisting of non-negative integers. The matrix must have a unique header row identifiying the samples, as well as a unique set of row names \n-as the first column.\n-\n-**Output**\n-\n-Tabular files which contain the statistical results and the raw and transformed counts and some colourful\n-and helpful plots\n-\n-**Note on edgeR versions**\n-\n-The edgeR authors made a small cosmetic change in the name of one important variable (from p.value to PValue) \n-breaking this and all other code that assumed the old name for this variable, \n-between edgeR2.4.4 and 2.4.6 (the version for R 2.14 as at the time of writing). \n-This means that all code using edgeR is sensitive to the version. I think this was a very unwise thing \n-to do because it wasted hours of my time to track down and will similarly cost other edgeR users dearly\n-when their old scripts break. This tool currently now works with 2.4.6.\n-\n-**Note on prior.N - now replaced with prior.df**\n-\n-http://seqanswers.com/forums/showthread.php?t=5591 says:\n-\n-*prior.n*\n-\n-The value for prior.n determines the amount of smoothing of tagwise dispersions towards the common dispersion. \n-You can think of it as like a "weight" for the common value. (It is actually the weight for the common likelihood \n-in the weighted likelihood equation). The larger the value for prior.n, the more smoothing, i.e. the closer your \n-tagwise dispersion estimates will be to the common dispersion. If you use a prior.n of 1, then that gives the \n-common likelihood the weight of one observation.\n-\n-In answer to your question, it is a good thing to squeeze the tagwise dispersions towards a common value, \n-or else you will be using very unreliable estimates of the dispersion. I would not recommend using the value that \n-you obtained from estimateSmoothing()---this is far too small and would result in virtually no moderation \n-(squeezing) of the tagwise dispersions. How many samples do you have in your experiment? \n-What is the experimental design? If you have few samples (less than 6) then I would suggest a prior.n of at least 10. \n-If you have more samples, then the tagwise dispersion estimates will be more reliable, \n-so you could consider using a smaller prior.n, although I would hesitate to use a prior.n less than 5. \n-\n-From Bioconductor Digest, Vol 118, Issue 5, Gordon writes:\n-\n-Dear Dorota,\n-\n-The important settings are prior.df and trend.\n-\n-prior.n and prior.df are related through prior.df = prior.n * residual.df,\n-and your experiment has residual.df = 36 - 12 = 24. So the old setting of\n-prior.n=10 is equivalent for your data to prior.df = 240, a very large\n-value. Going the other way, the new setting of prior.df=10 is equivalent\n-to prior.n=10/24.\n-\n-To recover old results with the current software you would use\n-\n- estimateTagwiseDisp(object, prior.df=240, trend="none")\n-\n-To get the new default from old software you would use\n-\n- estimateTagwiseDisp(object, prior.n=10/24, trend=TRUE)\n-\n-Actually the old trend method is equivalent to trend="loess" in the new\n-software. You should use plotBCV(object) to see whether a trend is\n-required.\n-\n-Note you could also use\n-\n- prior.n = getPriorN(object, prior.df=10)\n-\n-to map between prior.df and prior.n.\n-\n-\n- .. _edgeR: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html\n- .. _edgeR_Manual: http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf\n-\n-</help>\n-\n-</tool>\n-\n-\n'

diff -r 82e0af566160 -r 34cbb9e749da rgedgeR/rgedgeRpaired.xml.iaas1
--- a/rgedgeR/rgedgeRpaired.xml.iaas1 Wed Jun 12 02:58:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

b'@@ -1,627 +0,0 @@\n-<tool id="rgedgeRpaired" name="edgeR paired" version="0.18">\n- <description>2 level Anova for counts</description>\n- <command interpreter="python">\n- rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "edgeR" \n- --output_dir "$html_file.files_path" --output_html "$html_file" --output_tab "$outtab" --make_HTML "yes"\n- </command>\n- <inputs>\n- <param name="input1" type="data" format="tabular" label="Select an input matrix - rows are contigs, columns are counts for each sample"\n- help="Use the HTSeq based count matrix preparation tool to create these matrices from BAM/SAM files and a GTF file of genomic features"/>\n- <param name="title" type="text" value="edgeR" size="80" label="Title for job outputs" help="Supply a meaningful name here to remind you what the outputs contain">\n- <sanitizer invalid_char="">\n- <valid initial="string.letters,string.digits"><add value="_" /> </valid>\n- </sanitizer>\n- </param>\n- <param name="treatment_name" type="text" value="Treatment" size="50" label="Treatment Name"/>\n- <param name="Treat_cols" label="Select columns containing treatment." type="data_column" data_ref="input1" numerical="True" \n- multiple="true" use_header_names="true" size="120" display="checkboxes">\n- <validator type="no_options" message="Please select at least one column."/>\n- </param>\n- <param name="control_name" type="text" value="Control" size="50" label="Control Name"/>\n- <param name="Control_cols" label="Select columns containing control." type="data_column" data_ref="input1" numerical="True" \n- multiple="true" use_header_names="true" size="120" display="checkboxes" optional="true">\n- </param>\n- <param name="subjectids" type="text" optional="true" size="120"\n- label="IF SUBJECTS NOT ALL INDEPENDENT! Enter integers to indicate sample pairing for every column in input"\n- help="Leave blank if no pairing, but eg if data from sample id A99 is in columns 2,4 and id C21 is in 3,5 then enter \'1,2,1,2\'">\n- <sanitizer>\n- <valid initial="string.digits"><add value="," /> </valid>\n- </sanitizer>\n- </param>\n- <param name="fQ" type="float" value="0.3" size="5" label="Non-differential contig count quantile threshold - zero to analyze all non-zero read count contigs"\n- help="May be a good or a bad idea depending on the biology and the question. EG 0.3 = sparsest 30% of contigs with at least one read are removed before analysis"/>\n- <param name="useNDF" type="boolean" truevalue="T" checked=\'false\' falsevalue="" size="1" label="Non differential filter - remove contigs below a threshold (1 per million) for half or more samples"\n- help="May be a good or a bad idea depending on the biology and the question. This was the old default. Quantile based is available as an alternative"/>\n- <param name="priordf" type="integer" value="20" size="3" label="prior.df for tagwise dispersion - lower value = more emphasis on each tag\'s variance. Replaces prior.n and prior.df = prior.n * residual.df"\n- help="Zero = Use edgeR default. Use a small value to \'smooth\' small samples. See edgeR docs and note below"/>\n- <param name="fdrthresh" type="float" value="0.05" size="5" label="P value threshold for FDR filtering for amily wise error rate control"\n- help="Conventional default value of 0.05 recommended"/>\n- <param name="fdrtype" type="select" label="FDR (Type II error) control method" \n- help="Use fdr or bh typically to control for the number of tests in a reliable way">\n- <option value="fdr" selected="true">fdr</option>\n- <option value="BH">Benjamini Hochberg</option>\n- <option value="BY">Benjamini Yukateli</option>\n- <option value="bonferroni">Bonferroni</option>\n- <option value="hochberg">Hochberg</option>\n- <option value="holm">Holm</option>\n- <option value="hommel">Hommel</option>\n- '..b'ethods.\n-\n-If you have (eg) paired samples and wish to include a term in the GLM to account for some other factor (subject in the case of paired samples),\n-put a comma separated list of indicators for every sample (whether modelled or not!) indicating (eg) the subject number or \n-A list of integers, one for each subject or an empty string if samples are all independent.\n-If not empty, there must be exactly as many integers in the supplied integer list as there are columns (samples) in the count matrix.\n-Integers for samples that are not in the analysis *must* be present in the string as filler even if not used.\n-\n-So if you have 2 pairs out of 6 samples, you need to put in unique integers for the unpaired ones\n-eg if you had 6 samples with the first two independent but the second and third pairs each being from independent subjects. you might use\n-8,9,1,1,2,2 \n-as subject IDs to indicate two paired samples from the same subject in columns 3/4 and 5/6\n-\n-**Output**\n-\n-A matrix which consists the original data and relative expression levels and some helpful plots\n-\n-**Note on edgeR versions**\n-\n-The edgeR authors made a small cosmetic change in the name of one important variable (from p.value to PValue) \n-breaking this and all other code that assumed the old name for this variable, \n-between edgeR2.4.4 and 2.4.6 (the version for R 2.14 as at the time of writing). \n-This means that all code using edgeR is sensitive to the version. I think this was a very unwise thing \n-to do because it wasted hours of my time to track down and will similarly cost other edgeR users dearly\n-when their old scripts break. This tool currently now works with 2.4.6.\n-\n-**Note on prior.N**\n-\n-http://seqanswers.com/forums/showthread.php?t=5591 says:\n-\n-*prior.n*\n-\n-The value for prior.n determines the amount of smoothing of tagwise dispersions towards the common dispersion. \n-You can think of it as like a "weight" for the common value. (It is actually the weight for the common likelihood \n-in the weighted likelihood equation). The larger the value for prior.n, the more smoothing, i.e. the closer your \n-tagwise dispersion estimates will be to the common dispersion. If you use a prior.n of 1, then that gives the \n-common likelihood the weight of one observation.\n-\n-In answer to your question, it is a good thing to squeeze the tagwise dispersions towards a common value, \n-or else you will be using very unreliable estimates of the dispersion. I would not recommend using the value that \n-you obtained from estimateSmoothing()---this is far too small and would result in virtually no moderation \n-(squeezing) of the tagwise dispersions. How many samples do you have in your experiment? \n-What is the experimental design? If you have few samples (less than 6) then I would suggest a prior.n of at least 10. \n-If you have more samples, then the tagwise dispersion estimates will be more reliable, \n-so you could consider using a smaller prior.n, although I would hesitate to use a prior.n less than 5. \n-\n-\n-From Bioconductor Digest, Vol 118, Issue 5, Gordon writes:\n-\n-Dear Dorota,\n-\n-The important settings are prior.df and trend.\n-\n-prior.n and prior.df are related through prior.df = prior.n * residual.df,\n-and your experiment has residual.df = 36 - 12 = 24. So the old setting of\n-prior.n=10 is equivalent for your data to prior.df = 240, a very large\n-value. Going the other way, the new setting of prior.df=10 is equivalent\n-to prior.n=10/24.\n-\n-To recover old results with the current software you would use\n-\n- estimateTagwiseDisp(object, prior.df=240, trend="none")\n-\n-To get the new default from old software you would use\n-\n- estimateTagwiseDisp(object, prior.n=10/24, trend=TRUE)\n-\n-Actually the old trend method is equivalent to trend="loess" in the new\n-software. You should use plotBCV(object) to see whether a trend is\n-required.\n-\n-Note you could also use\n-\n- prior.n = getPriorN(object, prior.df=10)\n-\n-to map between prior.df and prior.n.\n-\n-</help>\n-\n-</tool>\n-\n-\n'