annotate README @ 0:631dfde45073 draft default tip

First tool-shed public version
author gordon
date Tue, 09 Oct 2012 18:48:06 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
1 These are Galaxy wrappers for common unix text-processing tools.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
2
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
3 Source:
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
4 http://hannonlab.cshl.edu/galaxy_unix_tools/index.html
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
5
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
6 Contact: gordon at cshl dot edu
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
7
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
8 NOTE: You must install some programs manually. See below for details.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
9
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
10 The tools are:
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
11
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
12 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
13 * sed - Stream Editor ( http://sed.sf.net )
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
14 * grep - Search files ( http://www.gnu.org/software/grep/ )
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
15 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
16 * sort - sort files
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
17 * join - join two files, based on common key field.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
18 * cut - keep/discard fields from a file
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
19 * uniq - keep unique/duplicated lines in a file
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
20 * head - keep the first X lines in a file.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
21 * tail - keep the last X lines in a file.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
22
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
23 Few improvements over the standard tools:
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
24
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
25 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
26 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
27 * Sort-Header - Sort a file, while maintaining the first line as header line ( https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
28 * Find_and_Replace - Find/Replace text in a line or specific column.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
29 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
30 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
31
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
32
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
33 Requirements
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
34 ============
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
35 1. Coreutils vesion 8.19 or later.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
36 2. AWK version 4.0.1 or later.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
37 3. SED version 4.2 *with* a special patch
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
38 4. Grep with PCRE support
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
39
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
40
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
41 NOTE About Security
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
42 ===================
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
43 The included tools are secure (barring unintentional bugs):
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
44 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
45 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
46 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
47
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
48 User trying to run an awk program similar to:
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
49 BEGIN { system("ls") }
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
50 Will get an error (in Galaxy) saying:
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
51 fatal: 'system' function not allowed in sandbox mode.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
52
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
53 User trying to run a SED program similar to:
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
54 1els
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
55 will get an error (in Galaxy) saying:
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
56 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
57
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
58 That being said, if you do find some vulnerability in these tools, please let me know and I'll fix them.
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
59
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
60
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
61 Installation
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
62 ============
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
63
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
64 ## GNU coreutils
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
65 wget http://ftp.gnu.org/gnu/coreutils/coreutils-8.19.tar.xz
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
66 tar -xJf coreutils-8.19.tar.xz
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
67 cd coreutils-8.19
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
68 ./configure --prefix=/INSTALL/PATH
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
69 make
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
70 sudo make install
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
71
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
72
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
73 ## AWK
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
74 wget http://ftp.gnu.org/gnu/gawk/gawk-4.0.1.tar.gz
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
75 tar -xf gawk-4.0.1.tar.gz
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
76 cd gawk-4.0.1
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
77 ./configure --prefix=/INSTALL/PATH
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
78 make
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
79 sudo make install
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
80
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
81 ## SED
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
82 wget ftp://ftp.gnu.org/gnu/sed/sed-4.2.tar.gz
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
83 wget http://cancan.cshl.edu/labmembers/gordon/files/sed-4.2-sandbox.patch
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
84 tar -xf sed-4.2.tar.gz
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
85 patch -p0 < sed-4.2-sandbox.patch
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
86 cd sed-4.2
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
87 ./configure --prefix=/INSTALL/PATH
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
88 make
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
89 sudo make install
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
90
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
91 ## Grep
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
92 wget ftp://ftp.gnu.org/gnu/grep/grep-2.14.tar.xz
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
93 tar -xJf grep-2.14.tar.xz
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
94 cd grep-2.14
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
95 ./configure --enable-perl-regexp --prefix=/INSTALL/PATH
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
96 make
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
97 sudo make install
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
98
631dfde45073 First tool-shed public version
gordon
parents:
diff changeset
99