annotate README @ 0:b64409be8c69 draft default tip

Uploaded
author aaronpetkau
date Sat, 04 Jul 2015 11:25:36 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
1 Tool wrapper by Brian Yeo
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
2 brian.yeo@phac.aspc.gc.ca
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
3
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
4 INTRODUCTION
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
5
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
6 FLASH (Fast Length Adjustment of SHort reads) is an accurate and fast tool
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
7 to merge paired-end reads that were generated from DNA fragments whose
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
8 lengths are shorter than twice the length of reads. Merged read pairs result
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
9 in unpaired longer reads, which are generally more desired in genome
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
10 assembly and genome analysis processes.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
11
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
12 Briefly, the FLASH algorithm considers all possible overlaps at or above a
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
13 minimum length between the reads in a pair and chooses the overlap that
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
14 results in the lowest mismatch density (proportion of mismatched bases in
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
15 the overlapped region). Ties between multiple overlaps are broken by
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
16 considering quality scores at mismatch sites. When building the merged
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
17 sequence, FLASH computes a consensus sequence in the overlapped region.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
18 More details can be found in the original publication
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
19 (http://bioinformatics.oxfordjournals.org/content/27/21/2957.full).
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
20
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
21 Limitations of FLASH include:
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
22 - FLASH cannot merge paired-end reads that do not overlap.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
23 - FLASH cannot merge read pairs that have an outward orientation, either
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
24 due to being "jumping" reads or due to excessive trimming.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
25 - FLASH is not designed for data that has a significant amount of indel
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
26 errors (such as Sanger sequencing data). It is best suited for Illumina
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
27 data.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
28
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
29 INSTALLATION
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
30
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
31 On UNIX-compatible systems, including GNU/Linux and Mac OS X, you must compile
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
32 FLASH from source. The only dependency, other than functions that are expected
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
33 to be available in the C library, is the zlib data compression library. To
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
34 install FLASH, download the tarball, untar it, and compile the code using the
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
35 provided Makefile:
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
36
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
37 $ tar xzf FLASH-1.2.9.tar.gz
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
38 $ cd FLASH-1.2.9
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
39 $ make
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
40
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
41 The executable file that is produced is named 'flash'. To run it from the
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
42 command line you must copy it to a location on your $PATH variable, or else run
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
43 it with a path including a directory, such as "./flash".
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
44
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
45 FLASH also runs on Windows, and you can compile it on Windows using MinGW.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
46 However, for convenience you may instead download a standalone Windows binary
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
47 from the SourceForge page (https://sourceforge.net/projects/flashpage/).
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
48
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
49 USAGE
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
50
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
51 Please compile FLASH and run `flash --help' to see command-line usage
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
52 information and information about input/output files.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
53
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
54 MULTITHREADING
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
55
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
56 By default, FLASH uses multiple threads. There are "combiner" threads that do
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
57 the actual read combining, as well as up to 5 threads that are used for I/O (up
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
58 to 2 readers, up to 3 writers). The default number of combiner threads is the
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
59 number of processors; however, it can be adjusted with the -t option (long
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
60 option: --threads).
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
61
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
62 When multiple combiner threads are used, the order of the combined and
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
63 uncombined reads in the output files will be nondeterministic. If you need to
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
64 enforce that the output reads appear in the same order as the input, you must
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
65 specify --threads=1.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
66
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
67 PERFORMANCE
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
68
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
69 Since the FLASH algorithm considers each read pair independently, FLASH will, by
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
70 default, process read pairs in parallel. FLASH v1.2.9 and later also make use
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
71 of vector instructions available on modern x86 CPUs. Consequently, FLASH works
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
72 quite fast, even with low-cost computing resources. As an example, we ran FLASH
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
73 v1.2.9 on a laptop with a dual-core 2.3 GHz AMD x86_64 processor and it
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
74 processed one million 101-bp read pairs in 11.6 seconds with the default
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
75 parameters. Less than 2 MB of memory was used. Actual timing results will
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
76 vary, but they will depend primarily on the number of CPUs available, the speed
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
77 of each CPU, and on the I/O speed of reading the input files and writing the
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
78 output files. FLASH is designed to be scalable to dozens of processors,
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
79 although its speed may be limited by I/O in such cases.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
80
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
81 ACCURACY
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
82
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
83 With reads' error rate of 1% or less, FLASH processes over 99% of read pairs
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
84 correctly. With error rate of 2%, FLASH processes over 98% of read pairs
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
85 correctly when default parameters are used. With more aggressive parameters
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
86 (i.e., -x 0.35), FLASH processes over 90% of read pairs correctly even when the
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
87 error rate is 5%.
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
88
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
89 PUBLICATION
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
90
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
91 Title: FLASH: fast length adjustment of short reads to improve genome assemblies
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
92 Authors: Tanja Magoč and Steven L. Salzberg
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
93 URL: http://bioinformatics.oxfordjournals.org/content/27/21/2957.full
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
94
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
95 LICENSE
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
96
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
97 FLASH is released under the GNU General Public License Version 3 or later (see
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
98 COPYING).
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
99
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
100 COMMENTS/QUESTIONS/REQUESTS
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
101
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
102 Send an e-mail to flash.comment@gmail.com
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
103
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
104 Other versions are available from the SourceForge page:
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
105
b64409be8c69 Uploaded
aaronpetkau
parents:
diff changeset
106 https://sourceforge.net/projects/flashpage/