# HG changeset patch # User peterjc # Date 1430991830 14400 # Node ID 367a0403b7d2868f5e8b7ac00943021513939abd # Parent 035727913cae6f16d9078bb8da969e2796ecb24d planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/venn_list commit 6c4ac223d511bbcd0ec9cbada730613a5fe9f1af-dirty diff -r 035727913cae -r 367a0403b7d2 README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Thu May 07 05:43:50 2015 -0400 @@ -0,0 +1,125 @@ +Galaxy tool to draw a Venn Diagram with up to 3 sets +==================================================== + +This tool is copyright 2011-2015 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below. + +This tool is a short Python script (using both the Galaxy and Biopython library +functions) to extract ID lists from tabular, FASTA, FASTQ or SFF files to build +sets, which are then drawn using the R limma package function vennDiagram +(called from Python using rpy). + +This tool is available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/peterjc/venn_list + + +Automated Installation +====================== + +This should be straightforward, Galaxy should automatically download the tool +and the Biopython dependency. + +You will still need to install the R/Bioconductor package limma. + + +Manual Installation +=================== + +There are just two files to install: + +* ``venn_list.py`` (the Python script) +* ``venn_list.xml`` (the Galaxy tool definition) + +The suggested location is in the Galaxy folder ``tools/plotting`` next to other +graph drawing tools, or a dedicated ``tools/venn_list`` directory. + +You will also need to install Biopython 1.54 or later, and the R/Bioconductor +package limma. You should already have rpy installed for other Galaxy tools. + +You will also need to modify the ``tools_conf.xml`` file to tell Galaxy to offer the +tool. The suggested location is in the "Graph/Display Data" section. Simply add +the line:: + + + +If you wish to run the unit tests, also move/copy the ``test-data/`` files +under Galaxy's ``test-data/`` folder. Then:: + + ./run_tests.sh -id venn_list + + +History +======= + +======= ====================================================================== +Version Changes +------- ---------------------------------------------------------------------- +v0.0.3 - Initial public release. +v0.0.4 - Ignore blank lines when loading IDs from tabular files +v0.0.5 - Explicit Galaxy error handling of return codes +v0.0.6 - Added unit tests. + - Use reStructuredText for this README file. + - Adopt standard MIT licence. + - Updated citation information (Cock et al. 2013). + - Development moved to GitHub, https://github.com/peterjc/pico_galaxy +v0.0.7 - Renamed folder and README file. + - Tool definition now embeds citation information. +v0.0.8 - Reorder XML elements (internal change only). + - Fixed and improved error handling when rpy is not available. + - Test output relaxed to cope with more variation in PDF output. + - Declare Biopython dependency via the Tool Shed. +======= ====================================================================== + + +Developers +========== + +This script and related tools were initially developed on the following hg branch: +http://bitbucket.org/peterjc/galaxy-central/src/tools + +Development has now moved to a dedicated GitHub repository: +https://github.com/peterjc/pico_galaxy + +For pushing a release to the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ +use the following Planemo command (which requires you have set your Tool Shed +access details in ``~/.planemo.yml`` and that you have access rights on the Tool Shed):: + + $ planemo shed_upload --tar_only ~/repositories/pico_galaxy/tools/venn_list/ + ... + $ tar -tzf shed_upload.tar.gz + README.rst + test-data/magic.pdf + test-data/rhodopsin_proteins.fasta + test-data/venn_list.tabular + tool_dependencies.xml + venn_list.py + venn_list.xml + +This tar-ball can be uploaded to the Tool Shed via the web interface (using +the ``--tar`` command or via +Planemo. More simply, the following single command can be used: + + $ planemo shed_upload + + +Licence (MIT) +============= + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. diff -r 035727913cae -r 367a0403b7d2 tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Thu May 07 05:43:50 2015 -0400 @@ -0,0 +1,6 @@ + + + + + + diff -r 035727913cae -r 367a0403b7d2 tools/venn_list/README.rst --- a/tools/venn_list/README.rst Thu Apr 30 05:53:34 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,120 +0,0 @@ -Galaxy tool to draw a Venn Diagram with up to 3 sets -==================================================== - -This tool is copyright 2011-2015 by Peter Cock, The James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -See the licence text below. - -This tool is a short Python script (using both the Galaxy and Biopython library -functions) to extract ID lists from tabular, FASTA, FASTQ or SFF files to build -sets, which are then drawn using the R limma package function vennDiagram -(called from Python using rpy). - -This tool is available from the Galaxy Tool Shed at: -http://toolshed.g2.bx.psu.edu/view/peterjc/venn_list - - -Automated Installation -====================== - -This should be straightforward, Galaxy should automatically download the tool -and the Biopython dependency. - -You will still need to install the R/Bioconductor package limma. - - -Manual Installation -=================== - -There are just two files to install: - -* ``venn_list.py`` (the Python script) -* ``venn_list.xml`` (the Galaxy tool definition) - -The suggested location is in the Galaxy folder ``tools/plotting`` next to other -graph drawing tools, or a dedicated ``tools/venn_list`` directory. - -You will also need to install Biopython 1.54 or later, and the R/Bioconductor -package limma. You should already have rpy installed for other Galaxy tools. - -You will also need to modify the ``tools_conf.xml`` file to tell Galaxy to offer the -tool. The suggested location is in the "Graph/Display Data" section. Simply add -the line:: - - - -If you wish to run the unit tests, also move/copy the ``test-data/`` files -under Galaxy's ``test-data/`` folder. Then:: - - ./run_tests.sh -id venn_list - - -History -======= - -======= ====================================================================== -Version Changes -------- ---------------------------------------------------------------------- -v0.0.3 - Initial public release. -v0.0.4 - Ignore blank lines when loading IDs from tabular files -v0.0.5 - Explicit Galaxy error handling of return codes -v0.0.6 - Added unit tests. - - Use reStructuredText for this README file. - - Adopt standard MIT licence. - - Updated citation information (Cock et al. 2013). - - Development moved to GitHub, https://github.com/peterjc/pico_galaxy -v0.0.7 - Renamed folder and README file. - - Tool definition now embeds citation information. -v0.0.8 - Reorder XML elements (internal change only). - - Fixed and improved error handling when rpy is not available. - - Test output relaxed to cope with more variation in PDF output. - - Declare Biopython dependency via the Tool Shed. -======= ====================================================================== - - -Developers -========== - -This script and related tools were initially developed on the following hg branch: -http://bitbucket.org/peterjc/galaxy-central/src/tools - -Development has now moved to a dedicated GitHub repository: -https://github.com/peterjc/pico_galaxy - -For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use -the following command from the Galaxy root folder:: - - $ tar -czf venn_list.tar.gz tools/venn_list/README.rst tools/venn_list/venn_list.* tools/venn_list/tool_dependencies.xml test-data/magic.pdf test-data/venn_list.tabular test-data/rhodopsin_proteins.fasta - -Check this worked:: - - $ tar -tzf venn_list.tar.gz - tools/venn_list/README.rst - tools/venn_list/venn_list.py - tools/venn_list/venn_list.xml - tools/venn_list/tool_dependencies.xml - test-data/magic.pdf - test-data/venn_list.tabular - test-data/rhodopsin_proteins.fasta - - -Licence (MIT) -============= - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -THE SOFTWARE. diff -r 035727913cae -r 367a0403b7d2 tools/venn_list/tool_dependencies.xml --- a/tools/venn_list/tool_dependencies.xml Thu Apr 30 05:53:34 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,6 +0,0 @@ - - - - - - diff -r 035727913cae -r 367a0403b7d2 tools/venn_list/venn_list.py --- a/tools/venn_list/venn_list.py Thu Apr 30 05:53:34 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,137 +0,0 @@ -#!/usr/bin/env python -"""Plot up to 3-way Venn Diagram using R limma vennDiagram (via rpy) - -This script is copyright 2010 by Peter Cock, The James Hutton Institute -(formerly SCRI), UK. All rights reserved. -See accompanying text file for licence details (MIT/BSD style). - -This is version 0.0.8 of the script. -""" - - -import sys - -def sys_exit(msg, error_level=1): - """Print error message to stdout and quit with given error level.""" - sys.stderr.write("%s\n" % msg) - sys.exit(error_level) - -try: - import rpy -except ImportError: - sys_exit("Requires the Python library rpy (to call R)") -except RuntimeError, e: - sys_exit("The Python library rpy is not availble for the current R version\n\n%s" % e) - -try: - rpy.r.library("limma") -except: - sys_exit("Requires the R library limma (for vennDiagram function)") - - -if len(sys.argv)-1 not in [7, 10, 13]: - sys_exit("Expected 7, 10 or 13 arguments (for 1, 2 or 3 sets), not %i" % (len(sys.argv)-1)) - -all_file, all_type, all_label = sys.argv[1:4] -set_data = [] -if len(sys.argv)-1 >= 7: - set_data.append(tuple(sys.argv[4:7])) -if len(sys.argv)-1 >= 10: - set_data.append(tuple(sys.argv[7:10])) -if len(sys.argv)-1 >= 13: - set_data.append(tuple(sys.argv[10:13])) -pdf_file = sys.argv[-1] -n = len(set_data) -print "Doing %i-way Venn Diagram" % n - -def load_ids(filename, filetype): - if filetype=="tabular": - for line in open(filename): - line = line.rstrip("\n") - if line and not line.startswith("#"): - yield line.split("\t",1)[0] - elif filetype=="fasta": - for line in open(filename): - if line.startswith(">"): - yield line[1:].rstrip("\n").split(None,1)[0] - elif filetype.startswith("fastq"): - #Use the Galaxy library not Biopython to cope with CS - from galaxy_utils.sequence.fastq import fastqReader - handle = open(filename, "rU") - for record in fastqReader(handle): - #The [1:] is because the fastaReader leaves the @ on the identifer. - yield record.identifier.split()[0][1:] - handle.close() - elif filetype=="sff": - try: - from Bio.SeqIO import index - except ImportError: - sys_exit("Require Biopython 1.54 or later (to read SFF files)") - #This will read the SFF index block if present (very fast) - for name in index(filename, "sff"): - yield name - else: - sys_exit("Unexpected file type %s" % filetype) - -def load_ids_whitelist(filename, filetype, whitelist): - for name in load_ids(filename, filetype): - if name in whitelist: - yield name - else: - sys_exit("Unexpected ID %s in %s file %s" % (name, filetype, filename)) - -if all_file in ["", "-", '""', '"-"']: - #Load without white list - sets = [set(load_ids(f,t)) for (f,t,c) in set_data] - #Take union - all = set() - for s in sets: - all.update(s) - print "Inferred total of %i IDs" % len(all) -else: - all = set(load_ids(all_file, all_type)) - print "Total of %i IDs" % len(all) - sets = [set(load_ids_whitelist(f,t,all)) for (f,t,c) in set_data] - -for s, (f,t,c) in zip(sets, set_data): - print "%i in %s" % (len(s), c) - -#Now call R library to draw simple Venn diagram -try: - #Create dummy Venn diagram counts object for three groups - cols = 'c("%s")' % '","'.join("Set%i" % (i+1) for i in range(n)) - rpy.r('groups <- cbind(%s)' % ','.join(['1']*n)) - rpy.r('colnames(groups) <- %s' % cols) - rpy.r('vc <- vennCounts(groups)') - #Populate the 2^n classes with real counts - #Don't make any assumptions about the class order - #print rpy.r('vc') - for index, row in enumerate(rpy.r('vc[,%s]' % cols)): - if isinstance(row, int) or isinstance(row, float): - #Hack for rpy being too clever for single element row - row = [row] - names = all - for wanted, s in zip(row, sets): - if wanted: - names = names.intersection(s) - else: - names = names.difference(s) - rpy.r('vc[%i,"Counts"] <- %i' % (index+1, len(names))) - #print rpy.r('vc') - if n == 1: - #Single circle, don't need to add (Total XXX) line - names = [c for (t,f,c) in set_data] - else: - names = ["%s\n(Total %i)" % (c, len(s)) for s, (f,t,c) in zip(sets, set_data)] - rpy.r.assign("names", names) - rpy.r.assign("colors", ["red","green","blue"][:n]) - rpy.r.pdf(pdf_file, 8, 8) - rpy.r("""vennDiagram(vc, include="both", names=names, - main="%s", sub="(Total %i)", - circle.col=colors) - """ % (all_label, len(all))) - rpy.r.dev_off() -except Exception, exc: - sys_exit( "%s" %str( exc ) ) -rpy.r.quit( save="no" ) -print "Done" diff -r 035727913cae -r 367a0403b7d2 tools/venn_list/venn_list.xml --- a/tools/venn_list/venn_list.xml Thu Apr 30 05:53:34 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,130 +0,0 @@ - - from lists - - rpy - biopython - Bio - - - - - - - -venn_list.py -#if $universe.type_select=="implicit": - - - -#else: - "$main" $main.ext -#end if -"$main_lab" -#for $s in $sets: - "$s.set" $s.set.ext "$s.lab" -#end for -$PDF - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -.. class:: infomark - -**TIP:** If your data is in tabular files, the identifier is assumed to be in column one. - -**What it does** - -Draws Venn Diagram for one, two or three sets (as a PDF file). - -You must supply one, two or three sets of identifiers -- corresponding -to one, two or three circles on the Venn Diagram. - -In general you should also give the full list of all the identifiers -explicitly. This is used to calculate the number of identifers outside -the circles (and check the identifiers in the other files match up). -The full list can be omitted by implicitly taking the union of the -category sets. In this case, the count outside the categories (circles) -will always be zero. - -The identifiers can be taken from the first column of a tabular file -(e.g. query names in BLAST tabular output, or signal peptide predictions -after filtering, etc), or from a sequence file (FASTA, FASTQ, SFF). - -For example, you may have a set of NGS reads (as a FASTA, FASTQ or SFF -file), and the results of several different read mappings (e.g. to -different references) as tabular files (filtered to have just the mapped -reads). You could then show the different mappings (and their overlaps) -as a Venn Diagram, and the outside count would be the unmapped reads. - -**Citations** - -The Venn Diagrams are drawn using Gordon Smyth's limma package from -R/Bioconductor, http://www.bioconductor.org/ - -The R library is called from Python via rpy, http://rpy.sourceforge.net/ - -If you use this Galaxy tool in work leading to a scientific publication please -cite: - -Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). -Galaxy tools and workflows for sequence analysis with applications -in molecular plant pathology. PeerJ 1:e167 -http://dx.doi.org/10.7717/peerj.167 - -This tool uses Biopython to read and write SFF files, so you may also wish to -cite the Biopython application note (and Galaxy too of course): - -Cock et al 2009. Biopython: freely available Python tools for computational -molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. -http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. - - - - 10.7717/peerj.167 - 10.1093/bioinformatics/15.5.356 - - diff -r 035727913cae -r 367a0403b7d2 venn_list.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/venn_list.py Thu May 07 05:43:50 2015 -0400 @@ -0,0 +1,137 @@ +#!/usr/bin/env python +"""Plot up to 3-way Venn Diagram using R limma vennDiagram (via rpy) + +This script is copyright 2010 by Peter Cock, The James Hutton Institute +(formerly SCRI), UK. All rights reserved. +See accompanying text file for licence details (MIT/BSD style). + +This is version 0.0.8 of the script. +""" + + +import sys + +def sys_exit(msg, error_level=1): + """Print error message to stdout and quit with given error level.""" + sys.stderr.write("%s\n" % msg) + sys.exit(error_level) + +try: + import rpy +except ImportError: + sys_exit("Requires the Python library rpy (to call R)") +except RuntimeError, e: + sys_exit("The Python library rpy is not availble for the current R version\n\n%s" % e) + +try: + rpy.r.library("limma") +except: + sys_exit("Requires the R library limma (for vennDiagram function)") + + +if len(sys.argv)-1 not in [7, 10, 13]: + sys_exit("Expected 7, 10 or 13 arguments (for 1, 2 or 3 sets), not %i" % (len(sys.argv)-1)) + +all_file, all_type, all_label = sys.argv[1:4] +set_data = [] +if len(sys.argv)-1 >= 7: + set_data.append(tuple(sys.argv[4:7])) +if len(sys.argv)-1 >= 10: + set_data.append(tuple(sys.argv[7:10])) +if len(sys.argv)-1 >= 13: + set_data.append(tuple(sys.argv[10:13])) +pdf_file = sys.argv[-1] +n = len(set_data) +print "Doing %i-way Venn Diagram" % n + +def load_ids(filename, filetype): + if filetype=="tabular": + for line in open(filename): + line = line.rstrip("\n") + if line and not line.startswith("#"): + yield line.split("\t",1)[0] + elif filetype=="fasta": + for line in open(filename): + if line.startswith(">"): + yield line[1:].rstrip("\n").split(None,1)[0] + elif filetype.startswith("fastq"): + #Use the Galaxy library not Biopython to cope with CS + from galaxy_utils.sequence.fastq import fastqReader + handle = open(filename, "rU") + for record in fastqReader(handle): + #The [1:] is because the fastaReader leaves the @ on the identifer. + yield record.identifier.split()[0][1:] + handle.close() + elif filetype=="sff": + try: + from Bio.SeqIO import index + except ImportError: + sys_exit("Require Biopython 1.54 or later (to read SFF files)") + #This will read the SFF index block if present (very fast) + for name in index(filename, "sff"): + yield name + else: + sys_exit("Unexpected file type %s" % filetype) + +def load_ids_whitelist(filename, filetype, whitelist): + for name in load_ids(filename, filetype): + if name in whitelist: + yield name + else: + sys_exit("Unexpected ID %s in %s file %s" % (name, filetype, filename)) + +if all_file in ["", "-", '""', '"-"']: + #Load without white list + sets = [set(load_ids(f,t)) for (f,t,c) in set_data] + #Take union + all = set() + for s in sets: + all.update(s) + print "Inferred total of %i IDs" % len(all) +else: + all = set(load_ids(all_file, all_type)) + print "Total of %i IDs" % len(all) + sets = [set(load_ids_whitelist(f,t,all)) for (f,t,c) in set_data] + +for s, (f,t,c) in zip(sets, set_data): + print "%i in %s" % (len(s), c) + +#Now call R library to draw simple Venn diagram +try: + #Create dummy Venn diagram counts object for three groups + cols = 'c("%s")' % '","'.join("Set%i" % (i+1) for i in range(n)) + rpy.r('groups <- cbind(%s)' % ','.join(['1']*n)) + rpy.r('colnames(groups) <- %s' % cols) + rpy.r('vc <- vennCounts(groups)') + #Populate the 2^n classes with real counts + #Don't make any assumptions about the class order + #print rpy.r('vc') + for index, row in enumerate(rpy.r('vc[,%s]' % cols)): + if isinstance(row, int) or isinstance(row, float): + #Hack for rpy being too clever for single element row + row = [row] + names = all + for wanted, s in zip(row, sets): + if wanted: + names = names.intersection(s) + else: + names = names.difference(s) + rpy.r('vc[%i,"Counts"] <- %i' % (index+1, len(names))) + #print rpy.r('vc') + if n == 1: + #Single circle, don't need to add (Total XXX) line + names = [c for (t,f,c) in set_data] + else: + names = ["%s\n(Total %i)" % (c, len(s)) for s, (f,t,c) in zip(sets, set_data)] + rpy.r.assign("names", names) + rpy.r.assign("colors", ["red","green","blue"][:n]) + rpy.r.pdf(pdf_file, 8, 8) + rpy.r("""vennDiagram(vc, include="both", names=names, + main="%s", sub="(Total %i)", + circle.col=colors) + """ % (all_label, len(all))) + rpy.r.dev_off() +except Exception, exc: + sys_exit( "%s" %str( exc ) ) +rpy.r.quit( save="no" ) +print "Done" diff -r 035727913cae -r 367a0403b7d2 venn_list.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/venn_list.xml Thu May 07 05:43:50 2015 -0400 @@ -0,0 +1,130 @@ + + from lists + + rpy + biopython + Bio + + + + + + + +venn_list.py +#if $universe.type_select=="implicit": + - - +#else: + "$main" $main.ext +#end if +"$main_lab" +#for $s in $sets: + "$s.set" $s.set.ext "$s.lab" +#end for +$PDF + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +.. class:: infomark + +**TIP:** If your data is in tabular files, the identifier is assumed to be in column one. + +**What it does** + +Draws Venn Diagram for one, two or three sets (as a PDF file). + +You must supply one, two or three sets of identifiers -- corresponding +to one, two or three circles on the Venn Diagram. + +In general you should also give the full list of all the identifiers +explicitly. This is used to calculate the number of identifers outside +the circles (and check the identifiers in the other files match up). +The full list can be omitted by implicitly taking the union of the +category sets. In this case, the count outside the categories (circles) +will always be zero. + +The identifiers can be taken from the first column of a tabular file +(e.g. query names in BLAST tabular output, or signal peptide predictions +after filtering, etc), or from a sequence file (FASTA, FASTQ, SFF). + +For example, you may have a set of NGS reads (as a FASTA, FASTQ or SFF +file), and the results of several different read mappings (e.g. to +different references) as tabular files (filtered to have just the mapped +reads). You could then show the different mappings (and their overlaps) +as a Venn Diagram, and the outside count would be the unmapped reads. + +**Citations** + +The Venn Diagrams are drawn using Gordon Smyth's limma package from +R/Bioconductor, http://www.bioconductor.org/ + +The R library is called from Python via rpy, http://rpy.sourceforge.net/ + +If you use this Galaxy tool in work leading to a scientific publication please +cite: + +Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +This tool uses Biopython to read and write SFF files, so you may also wish to +cite the Biopython application note (and Galaxy too of course): + +Cock et al 2009. Biopython: freely available Python tools for computational +molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. +http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. + + + + 10.7717/peerj.167 + 10.1093/bioinformatics/15.5.356 + +