|
0
|
1 These are Galaxy wrappers for common unix text-processing tools.
|
|
|
2
|
|
|
3 Source:
|
|
|
4 http://hannonlab.cshl.edu/galaxy_unix_tools/index.html
|
|
|
5
|
|
|
6 Contact: gordon at cshl dot edu
|
|
|
7
|
|
|
8 NOTE: You must install some programs manually. See below for details.
|
|
|
9
|
|
|
10 The tools are:
|
|
|
11
|
|
|
12 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
|
|
|
13 * sed - Stream Editor ( http://sed.sf.net )
|
|
|
14 * grep - Search files ( http://www.gnu.org/software/grep/ )
|
|
|
15 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
|
|
|
16 * sort - sort files
|
|
|
17 * join - join two files, based on common key field.
|
|
|
18 * cut - keep/discard fields from a file
|
|
|
19 * uniq - keep unique/duplicated lines in a file
|
|
|
20 * head - keep the first X lines in a file.
|
|
|
21 * tail - keep the last X lines in a file.
|
|
|
22
|
|
|
23 Few improvements over the standard tools:
|
|
|
24
|
|
|
25 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
|
|
|
26 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
|
|
|
27 * Sort-Header - Sort a file, while maintaining the first line as header line ( https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
|
|
|
28 * Find_and_Replace - Find/Replace text in a line or specific column.
|
|
|
29 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
|
|
|
30 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
|
|
|
31
|
|
|
32
|
|
|
33 Requirements
|
|
|
34 ============
|
|
|
35 1. Coreutils vesion 8.19 or later.
|
|
|
36 2. AWK version 4.0.1 or later.
|
|
|
37 3. SED version 4.2 *with* a special patch
|
|
|
38 4. Grep with PCRE support
|
|
|
39
|
|
|
40
|
|
|
41 NOTE About Security
|
|
|
42 ===================
|
|
|
43 The included tools are secure (barring unintentional bugs):
|
|
|
44 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
|
|
|
45 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
|
|
|
46 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
|
|
|
47
|
|
|
48 User trying to run an awk program similar to:
|
|
|
49 BEGIN { system("ls") }
|
|
|
50 Will get an error (in Galaxy) saying:
|
|
|
51 fatal: 'system' function not allowed in sandbox mode.
|
|
|
52
|
|
|
53 User trying to run a SED program similar to:
|
|
|
54 1els
|
|
|
55 will get an error (in Galaxy) saying:
|
|
|
56 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
|
|
|
57
|
|
|
58 That being said, if you do find some vulnerability in these tools, please let me know and I'll fix them.
|
|
|
59
|
|
|
60
|
|
|
61 Installation
|
|
|
62 ============
|
|
|
63
|
|
|
64 ## GNU coreutils
|
|
|
65 wget http://ftp.gnu.org/gnu/coreutils/coreutils-8.19.tar.xz
|
|
|
66 tar -xJf coreutils-8.19.tar.xz
|
|
|
67 cd coreutils-8.19
|
|
|
68 ./configure --prefix=/INSTALL/PATH
|
|
|
69 make
|
|
|
70 sudo make install
|
|
|
71
|
|
|
72
|
|
|
73 ## AWK
|
|
|
74 wget http://ftp.gnu.org/gnu/gawk/gawk-4.0.1.tar.gz
|
|
|
75 tar -xf gawk-4.0.1.tar.gz
|
|
|
76 cd gawk-4.0.1
|
|
|
77 ./configure --prefix=/INSTALL/PATH
|
|
|
78 make
|
|
|
79 sudo make install
|
|
|
80
|
|
|
81 ## SED
|
|
|
82 wget ftp://ftp.gnu.org/gnu/sed/sed-4.2.tar.gz
|
|
|
83 wget http://cancan.cshl.edu/labmembers/gordon/files/sed-4.2-sandbox.patch
|
|
|
84 tar -xf sed-4.2.tar.gz
|
|
|
85 patch -p0 < sed-4.2-sandbox.patch
|
|
|
86 cd sed-4.2
|
|
|
87 ./configure --prefix=/INSTALL/PATH
|
|
|
88 make
|
|
|
89 sudo make install
|
|
|
90
|
|
|
91 ## Grep
|
|
|
92 wget ftp://ftp.gnu.org/gnu/grep/grep-2.14.tar.xz
|
|
|
93 tar -xJf grep-2.14.tar.xz
|
|
|
94 cd grep-2.14
|
|
|
95 ./configure --enable-perl-regexp --prefix=/INSTALL/PATH
|
|
|
96 make
|
|
|
97 sudo make install
|
|
|
98
|
|
|
99
|