comparison README.rst @ 2:317726be0703 draft

planemo upload for repository https://github.com/galaxyproject/tools-devteam/blob/master/tool_collections/kraken/kraken_filter/ commit cb6ebb843c71dcfc73aa05cc616f8e3229170108-dirty
author devteam
date Wed, 15 Jul 2015 15:22:22 -0400
parents
children
comparison
equal deleted inserted replaced
1:f093ba52debe 2:317726be0703
1 Introduction
2 ============
3
4 `Kraken <http://ccb.jhu.edu/software/kraken/>`__ is a taxonomic sequence
5 classifier that assigns taxonomic labels to short DNA reads. It does
6 this by examining the :math:`k`-mers within a read and querying a
7 database with those :math:`k`-mers. This database contains a mapping of
8 every :math:`k`-mer in
9 `Kraken <http://ccb.jhu.edu/software/kraken/>`__'s genomic library to
10 the lowest common ancestor (LCA) in a taxonomic tree of all genomes that
11 contain that :math:`k`-mer. The set of LCA taxa that correspond to the
12 :math:`k`-mers in a read are then analyzed to create a single taxonomic
13 label for the read; this label can be any of the nodes in the taxonomic
14 tree. `Kraken <http://ccb.jhu.edu/software/kraken/>`__ is designed to be
15 rapid, sensitive, and highly precise. Our tests on various real and
16 simulated data have shown
17 `Kraken <http://ccb.jhu.edu/software/kraken/>`__ to have sensitivity
18 slightly lower than Megablast with precision being slightly higher. On a
19 set of simulated 100 bp reads,
20 `Kraken <http://ccb.jhu.edu/software/kraken/>`__ processed over 1.3
21 million reads per minute on a single core in normal operation, and over
22 4.1 million reads per minute in quick operation.
23
24 The latest released version of Kraken will be available at the `Kraken
25 website <http://ccb.jhu.edu/software/kraken/>`__, and the latest updates
26 to the Kraken source code are available at the `Kraken GitHub
27 repository <https://github.com/DerrickWood/kraken>`__.
28
29 If you use `Kraken <http://ccb.jhu.edu/software/kraken/>`__ in your
30 research, please cite the `Kraken
31 paper <http://genomebiology.com/2014/15/3/R46>`__. Thank you!
32
33 System Requirements
34 ===================
35
36 Note: Users concerned about the disk or memory requirements should read
37 the paragraph about MiniKraken, below.
38
39 - **Disk space**: Construction of Kraken's standard database will
40 require at least 160 GB of disk space. Customized databases may
41 require more or less space. Disk space used is linearly proportional
42 to the number of distinct :math:`k`-mers; as of Feb. 2015, Kraken's
43 default database contains just under 6 billion (6e9) distinct
44 :math:`k`-mers.
45
46 In addition, the disk used to store the database should be
47 locally-attached storage. Storing the database on a network
48 filesystem (NFS) partition can cause Kraken's operation to be very
49 slow, or to be stopped completely. As NFS accesses are much slower
50 than local disk accesses, both preloading and database building will
51 be slowed by use of NFS.
52
53 - **Memory**: To run efficiently, Kraken requires enough free memory to
54 hold the database in RAM. While this can be accomplished using a
55 ramdisk, Kraken supplies a utility for loading the database into RAM
56 via the OS cache. The default database size is 75 GB (as of Feb.
57 2015), and so you will need at least that much RAM if you want to
58 build or run with the default database.
59
60 - **Dependencies**: Kraken currently makes extensive use of Linux
61 utilities such as sed, find, and wget. Many scripts are written using
62 the Bash shell, and the main scripts are written using Perl. Core
63 programs needed to build the database and run the classifier are
64 written in C++, and need to be compiled using g++. Multithreading is
65 handled using OpenMP. Downloads of NCBI data are performed by wget
66 and in some cases, by rsync. Most Linux systems that have any sort of
67 development package installed will have all of the above listed
68 programs and libraries available.
69
70 Finally, if you want to build your own database, you will need to
71 install the
72 `Jellyfish <http://www.cbcb.umd.edu/software/jellyfish/>`__
73 :math:`k`-mer counter. Note that Kraken only supports use of
74 Jellyfish version 1. Jellyfish version 2 is not yet compatible with
75 Kraken.
76
77 - **Network connectivity**: Kraken's standard database build and
78 download commands expect unfettered FTP and rsync access to the NCBI
79 FTP server. If you're working behind a proxy, you may need to set
80 certain environment variables (such as ``ftp_proxy`` or
81 ``RSYNC_PROXY``) in order to get these commands to work properly.
82
83 - **MiniKraken**: To allow users with low-memory computing environments
84 to use Kraken, we supply a reduced standard database that can be
85 downloaded from the Kraken web site. When Kraken is run with a
86 reduced database, we call it MiniKraken.
87
88 The database we make available is only 4 GB in size, and should run
89 well on computers with as little as 8 GB of RAM. Disk space required
90 for this database is also only 4 GB.
91
92