Mercurial > repos > gordon > unix_tools
diff README @ 0:631dfde45073 draft default tip
First tool-shed public version
| author | gordon |
|---|---|
| date | Tue, 09 Oct 2012 18:48:06 -0400 |
| parents | |
| children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README Tue Oct 09 18:48:06 2012 -0400 @@ -0,0 +1,99 @@ +These are Galaxy wrappers for common unix text-processing tools. + +Source: +http://hannonlab.cshl.edu/galaxy_unix_tools/index.html + +Contact: gordon at cshl dot edu + +NOTE: You must install some programs manually. See below for details. + +The tools are: + +* awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) +* sed - Stream Editor ( http://sed.sf.net ) +* grep - Search files ( http://www.gnu.org/software/grep/ ) +* GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): + * sort - sort files + * join - join two files, based on common key field. + * cut - keep/discard fields from a file + * uniq - keep unique/duplicated lines in a file + * head - keep the first X lines in a file. + * tail - keep the last X lines in a file. + +Few improvements over the standard tools: + + * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) + * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) + * Sort-Header - Sort a file, while maintaining the first line as header line ( https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) + * Find_and_Replace - Find/Replace text in a line or specific column. + * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. + * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) + + +Requirements +============ +1. Coreutils vesion 8.19 or later. +2. AWK version 4.0.1 or later. +3. SED version 4.2 *with* a special patch +4. Grep with PCRE support + + +NOTE About Security +=================== +The included tools are secure (barring unintentional bugs): +The main concern might be executing system commands with awk's "system" and sed's "e" commands, +or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. +These commands are DISABLED using the "--sandbox" parameter to awk and sed. + +User trying to run an awk program similar to: + BEGIN { system("ls") } +Will get an error (in Galaxy) saying: + fatal: 'system' function not allowed in sandbox mode. + +User trying to run a SED program similar to: + 1els +will get an error (in Galaxy) saying: + sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode + +That being said, if you do find some vulnerability in these tools, please let me know and I'll fix them. + + +Installation +============ + +## GNU coreutils + wget http://ftp.gnu.org/gnu/coreutils/coreutils-8.19.tar.xz + tar -xJf coreutils-8.19.tar.xz + cd coreutils-8.19 + ./configure --prefix=/INSTALL/PATH + make + sudo make install + + +## AWK + wget http://ftp.gnu.org/gnu/gawk/gawk-4.0.1.tar.gz + tar -xf gawk-4.0.1.tar.gz + cd gawk-4.0.1 + ./configure --prefix=/INSTALL/PATH + make + sudo make install + +## SED + wget ftp://ftp.gnu.org/gnu/sed/sed-4.2.tar.gz + wget http://cancan.cshl.edu/labmembers/gordon/files/sed-4.2-sandbox.patch + tar -xf sed-4.2.tar.gz + patch -p0 < sed-4.2-sandbox.patch + cd sed-4.2 + ./configure --prefix=/INSTALL/PATH + make + sudo make install + +## Grep + wget ftp://ftp.gnu.org/gnu/grep/grep-2.14.tar.xz + tar -xJf grep-2.14.tar.xz + cd grep-2.14 + ./configure --enable-perl-regexp --prefix=/INSTALL/PATH + make + sudo make install + +
