annotate readme.rst @ 2:fc862d5bccaf draft

Uploaded
author bgruening
date Thu, 05 Sep 2013 12:42:48 -0400
parents a4ad586d1403
children 7068d1548234
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
1 These are Galaxy wrappers for common unix text-processing tools
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
2 ===============================================================
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
3
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ).
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
6
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
7
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
8 The tools are:
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
9
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
10 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
11 * sed - Stream Editor ( http://sed.sf.net )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
12 * grep - Search files ( http://www.gnu.org/software/grep/ )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
13 * sort_columns - Sorting every line according to there columns
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
14 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
15 * sort - sort files
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
16 * join - join two files, based on common key field.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
17 * cut - keep/discard fields from a file
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
18 * unsorted_uniq - keep unique/duplicated lines in a file
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
19 * sorted_uniq - keep unique/duplicated lines in a file
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
20 * head - keep the first X lines in a file.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
21 * tail - keep the last X lines in a file.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
22
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
23 Few improvements over the standard tools:
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
24
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
25 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
26 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
27 * Find_and_Replace - Find/Replace text in a line or specific column.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
28 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
29 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
30
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
31
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
32 Requirements
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
33 ------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
34
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
35 1. Coreutils vesion 8.19 or later.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
36 2. AWK version 4.0.1 or later.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
37 3. SED version 4.2 *with* a special patch
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
38 4. Grep with PCRE support
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
39
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
40 These will be installed automatically with the Galaxy Tool Shed.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
41
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
42
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
43 -------------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
44 NOTE About Security
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
45 -------------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
46
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
47 The included tools are secure (barring unintentional bugs):
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
48 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
49 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
50 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
51
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
52 User trying to run an awk program similar to:
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
53 BEGIN { system("ls") }
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
54 Will get an error (in Galaxy) saying:
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
55 fatal: 'system' function not allowed in sandbox mode.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
56
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
57 User trying to run a SED program similar to:
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
58 1els
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
59 will get an error (in Galaxy) saying:
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
60 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
61
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
62 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
63
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
64 ------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
65 Installation
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
66 ------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
67
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
68 Should be done with the Galaxy `Tool Shed`_.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
69
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
70 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
71
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
72
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
73 ----
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
74 TODO
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
75 ----
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
76
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
77 - unit-tests
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
78 - uniqu will get a new --group funciton with the 8.22 release, its currently commended out
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
79 - also shuf will get a major improved performance with large files http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=20d7bce0f7e57d9a98f0ee811e31c757e9fedfff
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
80 we can remove the random feature from sort and use shuf instead
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
81 - move some advanced settings under a conditional, for example the cut tools offers to cut bytes
1
a4ad586d1403 Uploaded
bgruening
parents: 0
diff changeset
82 - cut wrapper has some output conditional magic for interval files, that needs to be checked
a4ad586d1403 Uploaded
bgruening
parents: 0
diff changeset
83 - comm wrapper, see the Galaxy default one
2
fc862d5bccaf Uploaded
bgruening
parents: 1
diff changeset
84 - evaluate the join wrappers against the Galaxy ones, maybe we should drop them
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
85
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
86
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
87
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
88
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
89