3
|
1 Galaxy wrappers for common unix text-processing tools
|
|
2 =====================================================
|
0
|
3
|
|
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu )
|
|
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ).
|
|
6
|
|
7
|
3
|
8 Tools:
|
0
|
9
|
|
10 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
|
|
11 * sed - Stream Editor ( http://sed.sf.net )
|
|
12 * grep - Search files ( http://www.gnu.org/software/grep/ )
|
|
13 * sort_columns - Sorting every line according to there columns
|
|
14 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
|
3
|
15
|
0
|
16 * sort - sort files
|
|
17 * join - join two files, based on common key field.
|
|
18 * cut - keep/discard fields from a file
|
|
19 * unsorted_uniq - keep unique/duplicated lines in a file
|
|
20 * sorted_uniq - keep unique/duplicated lines in a file
|
|
21 * head - keep the first X lines in a file.
|
|
22 * tail - keep the last X lines in a file.
|
|
23
|
|
24 Few improvements over the standard tools:
|
|
25
|
|
26 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
|
|
27 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
|
|
28 * Find_and_Replace - Find/Replace text in a line or specific column.
|
|
29 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
|
|
30 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
|
|
31
|
|
32
|
|
33 Requirements
|
|
34 ------------
|
|
35
|
|
36 1. Coreutils vesion 8.19 or later.
|
|
37 2. AWK version 4.0.1 or later.
|
|
38 3. SED version 4.2 *with* a special patch
|
|
39 4. Grep with PCRE support
|
|
40
|
3
|
41 These will be installed automatically with the Galaxy `Tool Shed`_.
|
0
|
42
|
|
43
|
|
44 -------------------
|
|
45 NOTE About Security
|
|
46 -------------------
|
|
47
|
|
48 The included tools are secure (barring unintentional bugs):
|
|
49 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
|
|
50 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
|
|
51 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
|
|
52
|
|
53 User trying to run an awk program similar to:
|
3
|
54
|
0
|
55 BEGIN { system("ls") }
|
3
|
56
|
0
|
57 Will get an error (in Galaxy) saying:
|
3
|
58
|
0
|
59 fatal: 'system' function not allowed in sandbox mode.
|
|
60
|
|
61 User trying to run a SED program similar to:
|
3
|
62
|
0
|
63 1els
|
3
|
64
|
0
|
65 will get an error (in Galaxy) saying:
|
3
|
66
|
0
|
67 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
|
|
68
|
3
|
69
|
0
|
70 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them.
|
|
71
|
|
72 ------------
|
|
73 Installation
|
|
74 ------------
|
|
75
|
3
|
76 Should be done via the Galaxy `Tool Shed`_.
|
0
|
77
|
|
78 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed
|
|
79
|
|
80
|
|
81 ----
|
|
82 TODO
|
|
83 ----
|
|
84
|
|
85 - unit-tests
|
|
86 - uniqu will get a new --group funciton with the 8.22 release, its currently commended out
|
|
87 - also shuf will get a major improved performance with large files http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=20d7bce0f7e57d9a98f0ee811e31c757e9fedfff
|
|
88 we can remove the random feature from sort and use shuf instead
|
|
89 - move some advanced settings under a conditional, for example the cut tools offers to cut bytes
|
1
|
90 - cut wrapper has some output conditional magic for interval files, that needs to be checked
|
|
91 - comm wrapper, see the Galaxy default one
|
2
|
92 - evaluate the join wrappers against the Galaxy ones, maybe we should drop them
|
0
|
93
|
|
94
|
3
|
95 -------
|
|
96 License
|
|
97 -------
|
|
98
|
|
99 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu)
|
|
100 * Copyright (c) 2013 B. Gruening (bjoern dot gruening <at> gmail dot com)
|
0
|
101
|
|
102
|
3
|
103 Permission is hereby granted, free of charge, to any person obtaining
|
|
104 a copy of this software and associated documentation files (the
|
|
105 "Software"), to deal in the Software without restriction, including
|
|
106 without limitation the rights to use, copy, modify, merge, publish,
|
|
107 distribute, sublicense, and/or sell copies of the Software, and to
|
|
108 permit persons to whom the Software is furnished to do so, subject to
|
|
109 the following conditions:
|
0
|
110
|
3
|
111 The above copyright notice and this permission notice shall be
|
|
112 included in all copies or substantial portions of the Software.
|
|
113
|
|
114 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
115 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
116 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
|
117 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
|
118 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
|
119 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
|
120 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
121
|