annotate readme.rst @ 28:e8b38ade9b3e draft default tip

planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 7cdafed6c6a1387395e5a869186518f129aa3132
author bgruening
date Tue, 25 Mar 2025 14:33:35 +0000
parents d9819ccb9ca7
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
1 Galaxy wrappers for common unix text-processing tools
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
2 =====================================================
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
3
9
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
4 The initial work was done by Assaf Gordon and Greg Hannon's lab ( http://hannonlab.cshl.edu )
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
5 in Cold Spring Harbor Laboratory ( http://www.cshl.edu ). In late 2013 maintainence and
4
56e80527c482 Uploaded
bgruening
parents: 3
diff changeset
6 further development was taken over by Bjoern Gruening. Feel free to contribute any general purpose
56e80527c482 Uploaded
bgruening
parents: 3
diff changeset
7 text manipulation tool to this repository.
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
8
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
9
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
10 Tools:
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
11 ------
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
12
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
13 * awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
14 * sed - Stream Editor ( http://sed.sf.net )
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
15 * grep - Search files ( http://www.gnu.org/software/grep/ )
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
16 * sort_columns - Sorting every line according to there columns
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
17 * GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
18
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
19 * sort - sort files
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
20 * join - join two files, based on common key field.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
21 * cut - keep/discard fields from a file
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
22 * unsorted_uniq - keep unique/duplicated lines in a file
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
23 * sorted_uniq - keep unique/duplicated lines in a file
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
24 * head - keep the first X lines in a file.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
25 * tail - keep the last X lines in a file.
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
26 * unfold_column - unfold a column with multiple entities into multiple lines
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
27
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
28
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
29 Few improvements over the standard tools:
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
30 -----------------------------------------
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
31
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
32 * EasyJoin - A Join tool that does not require pre-sorted the files ( https://github.com/agordon/filo/blob/scripts/src/scripts/easyjoin )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
33 * Multi-Join - Join multiple (>2) files ( https://github.com/agordon/filo/blob/scripts/src/scripts/multijoin )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
34 * Find_and_Replace - Find/Replace text in a line or specific column.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
35 * Grep with Perl syntax - uses grep with Perl-Compatible regular expressions.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
36 * HTML'd Grep - grep text in a file, and produced high-lighted HTML output, for easier viewing ( uses https://github.com/agordon/filo/blob/scripts/src/scripts/sort-header )
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
37
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
38
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
39 Requirements:
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
40 -------------
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
41
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
42 * Coreutils vesion 8.22 or later.
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
43 * AWK version 4.0.1 or later.
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
44 * SED version 4.2 *with* a special patch
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
45 * Grep with PCRE support
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
46
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
47 All dependencies will be installed automatically with the Galaxy `Tool Shed`_ and the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
48
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
49
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
50 -------------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
51 NOTE About Security
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
52 -------------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
53
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
54 The included tools are secure (barring unintentional bugs):
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
55 The main concern might be executing system commands with awk's "system" and sed's "e" commands,
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
56 or reading/writing arbitrary files with awk's redirection and sed's "r/w" commands.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
57 These commands are DISABLED using the "--sandbox" parameter to awk and sed.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
58
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
59 User trying to run an awk program similar to::
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
60
9
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
61 BEGIN { system("ls") }
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
62
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
63 Will get an error (in Galaxy) saying::
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
64
9
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
65 fatal: 'system' function not allowed in sandbox mode.
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
66
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
67 User trying to run a SED program similar to::
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
68
9
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
69 1els
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
70
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
71 will get an error (in Galaxy) saying::
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
72
9
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
73 sed: -e expression #1, char 2: e/r/w commands disabled in sandbox mode
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
74
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
75 That being said, if you do find some vulnerability in these tools, please let me know and I'll try fix them.
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
76
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
77 ------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
78 Installation
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
79 ------------
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
80
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
81 Should be done via the Galaxy `Tool Shed`_.
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
82 Install the following repository: https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
83
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
84 .. _`Tool Shed`: http://wiki.galaxyproject.org/Tool%20Shed
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
85
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
86
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
87 ----
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
88 TODO
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
89 ----
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
90
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
91 * add shuf, we can remove the random feature from sort and use shuf instead
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
92 * move some advanced settings under a conditional, for example the cut tools offers to cut bytes
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
93 * cut wrapper has some output conditional magic for interval files, that needs to be checked
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
94 * comm wrapper, see the Galaxy default one
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
95 * evaluate the join wrappers against the Galaxy ones, maybe we should drop them
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
96
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
97
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
98 -------
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
99 License
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
100 -------
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
101
6
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
102 * Copyright (c) 2009-2013 A. Gordon (gordon <at> cshl dot edu)
8928e6d1e7ba Uploaded
bgruening
parents: 4
diff changeset
103 * Copyright (c) 2013-2015 B. Gruening (bjoern dot gruening <at> gmail dot com)
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
104
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
105
9
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
106 Permission is hereby granted, free of charge, to any person obtaining
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
107 a copy of this software and associated documentation files (the
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
108 "Software"), to deal in the Software without restriction, including
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
109 without limitation the rights to use, copy, modify, merge, publish,
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
110 distribute, sublicense, and/or sell copies of the Software, and to
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
111 permit persons to whom the Software is furnished to do so, subject to
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
112 the following conditions:
0
ec66f9d90ef0 initial uploaded
bgruening
parents:
diff changeset
113
9
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
114 The above copyright notice and this permission notice shall be
3
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
115 included in all copies or substantial portions of the Software.
7068d1548234 Uploaded
bgruening
parents: 2
diff changeset
116
9
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
117 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
118 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
119 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
120 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
121 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
122 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
d9819ccb9ca7 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/text_processing commit 369e40078146d00608d52205bb8cee66ae735b76-dirty
bgruening
parents: 6
diff changeset
123 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.