Mercurial > repos > bgruening > text_processing
annotate readme.rst @ 28:e8b38ade9b3e draft default tip
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 7cdafed6c6a1387395e5a869186518f129aa3132
author | bgruening |
---|---|
date | Tue, 25 Mar 2025 14:33:35 +0000 |
parents | d9819ccb9ca7 |
children |
rev | line source |
---|---|
3 | 1 Galaxy wrappers for common unix text-processing tools |
2 ===================================================== | |
0 | 3 |
9
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu ) |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and |
4 | 6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose |
7 text manipulation tool to this repository. | |
0 | 8 |
9 | |
3 | 10 Tools: |
6 | 11 ------ |
0 | 12 |
6 | 13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ ) |
14 * sed - Stream Editor ( http://sed.sf.net ) | |
15 * grep - Search files ( http://www.gnu.org/software/grep/ ) | |
16 * sort_columns - Sorting every line according to there columns | |
17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ): | |
3 | 18 |
0 | 19 * sort - sort files |
20 * join - join two files, based on common key field. | |
21 * cut - keep/discard fields from a file | |
22 * unsorted_uniq - keep unique/duplicated lines in a file | |
23 * sorted_uniq - keep unique/duplicated lines in a file | |
24 * head - keep the first X lines in a file. | |
25 * tail - keep the last X lines in a file. | |
6 | 26 * unfold_column - unfold a column with multiple entities into multiple lines |
27 | |
0 | 28 |
29 Few improvements over the standard tools: | |
6 | 30 ----------------------------------------- |
0 | 31 |
32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin ) | |
33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin ) | |
34 * Find_and_Replace - Find/Replace text in a line or specific column. | |
35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions. | |
36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header ) | |
37 | |
38 | |
6 | 39 Requirements: |
40 ------------- | |
0 | 41 |
6 | 42 * Coreutils vesion 8.22 or later. |
43 * AWK version 4.0.1 or later. | |
44 * SED version 4.2 *with* a special patch | |
45 * Grep with PCRE support | |
0 | 46 |
6 | 47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing |
0 | 48 |
49 | |
50 ------------------- | |
51 NOTE About Security | |
52 ------------------- | |
53 | |
54 The included tools are secure (barring unintentional bugs): | |
55 The main concern might be executing system commands with awk's "system" and sed's "e" commands, | |
56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands. | |
57 These commands are DISABLED using the "--sandbox" parameter to awk and sed. | |
58 | |
6 | 59 User trying to run an awk program similar to:: |
3 | 60 |
9
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
61 BEGIN { system("ls") } |
3 | 62 |
6 | 63 Will get an error (in Galaxy) saying:: |
3 | 64 |
9
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
65 fatal: 'system' function not allowed in sandbox mode. |
0 | 66 |
6 | 67 User trying to run a SED program similar to:: |
3 | 68 |
9
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
69 1els |
3 | 70 |
6 | 71 will get an error (in Galaxy) saying:: |
3 | 72 |
9
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode |
0 | 74 |
75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them. | |
76 | |
77 ------------ | |
78 Installation | |
79 ------------ | |
80 | |
3 | 81 Should be done via the Galaxy `Tool Shed`_. |
6 | 82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing |
0 | 83 |
84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed | |
85 | |
86 | |
87 ---- | |
88 TODO | |
89 ---- | |
90 | |
6 | 91 * add shuf, we can remove the random feature from sort and use shuf instead |
92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes | |
93 * cut wrapper has some output conditional magic for interval files, that needs to be checked | |
94 * comm wrapper, see the Galaxy default one | |
95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them | |
0 | 96 |
97 | |
3 | 98 ------- |
99 License | |
100 ------- | |
101 | |
6 | 102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu) |
103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com) | |
0 | 104 |
105 | |
9
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
106 Permission is hereby granted, free of charge, to any person obtaining |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
107 a copy of this software and associated documentation files (the |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
108 "Software"), to deal in the Software without restriction, including |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
109 without limitation the rights to use, copy, modify, merge, publish, |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
110 distribute, sublicense, and/or sell copies of the Software, and to |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
111 permit persons to whom the Software is furnished to do so, subject to |
3 | 112 the following conditions: |
0 | 113 |
9
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
114 The above copyright notice and this permission notice shall be |
3 | 115 included in all copies or substantial portions of the Software. |
116 | |
9
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
117 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
118 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
119 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
120 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
121 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
122 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE |
d9819ccb9ca7
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents:
6
diff
changeset
|
123 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |