annotate docs/scripts/txt/ExtractFromSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
2 ExtractFromSequenceFiles.pl - Extract data from sequence and alignment
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
3 files
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
4
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
5 SYNOPSIS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
6 ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
7
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
8 ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
9 [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
10 --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
11 [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
12 EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
13 dirname] SequenceFile(s) AlignmentFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
14
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
15 DESCRIPTION
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
16 Extract specific data from *SequenceFile(s) and AlignmentFile(s)* and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
17 generate FASTA files. You can extract sequences using sequence IDs or
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
18 sequence numbers.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
19
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
20 The file names are separated by spaces. All the sequence files in a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
21 current directory can be specified by **.aln*, **.msf*, **.fasta*,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
22 **.fta*, **.pir* or any other supported formats; additionally, *DirName*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
23 corresponds to all the sequence files in the current directory with any
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
24 of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
25
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
26 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
27 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
28 formats are detected by parsing the contents of *SequenceFile(s) and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
29 AlignmentFile(s)*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
30
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
31 OPTIONS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
32 -h, --help
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
33 Print this help message.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
34
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
35 -i, --IgnoreGaps *yes | no*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
36 Ignore gaps or gap columns during during generation of new sequence
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
37 or alignment file(s). Possible values: *yes or no*. Default value:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
38 *yes*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
39
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
40 In order to remove gap columns, length of all the sequence must be
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
41 same; otherwise, this option is ignored.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
42
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
43 -m, --mode *SequenceID | SequenceNum | SequenceNumRange*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
44 Specify how to extract data from sequence files: extract sequences
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
45 using sequence IDs or sequence numbers. Possible values: *SequenceID
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
46 | SequenceNum | SequenceNumRange*. Default: *SequenceNum* with value
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
47 of 1.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
48
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
49 The sequence numbers correspond to position of sequences starting
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
50 from 1 for first sequence in *SequenceFile(s) and AlignmentFile(s)*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
51
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
52 -o, --overwrite
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
53 Overwrite existing files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
54
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
55 -r, --root *rootname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
56 New sequence file name is generated using the root:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
57 <Root><Mode>.<Ext>. Default new file:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
58 <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
59 input files.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
60
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
61 -s, --Sequences *"SequenceID,[SequenceID,...]" |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
62 "SequenceNum,[SequenceNum,...]" | "StartingSeqNum,EndingSeqNum"*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
63 This value is -m, --mode specific. In general, it's a comma
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
64 delimites list of sequence IDs or sequence numbers.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
65
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
66 For *SequenceID* value of -m, --mode option, input value format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
67 *SequenceID,...*. Examples:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
68
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
69 ACHE_BOVIN
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
70 ACHE_BOVIN,ACHE_HUMAN
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
71
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
72 For *SequenceNum* value of -m, --mode option, input value format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
73 *SequenceNum,...*. Examples:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
74
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
75 2
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
76 1,5
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
77
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
78 For *SequenceNum* value of -m, --mode option, input value format is:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
79 *StaringSeqNum,EndingSeqNum*. Examples:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
80
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
81 2,4
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
82
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
83 --SequenceIDMatch *Exact | Relaxed*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
84 Sequence IDs matching criterion during *SequenceID* value of -m,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
85 --mode option: match specified sequence ID exactly or as sub string
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
86 against sequence IDs in the files. Possible values: *Exact |
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
87 Relaxed*. Default: *Relaxed*. Sequence ID match is case insenstitive
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
88 during both options.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
89
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
90 --SequenceLength *number*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
91 Maximum sequence length per line in sequence file(s). Default: *80*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
92
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
93 -w --WorkingDir *text*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
94 Location of working directory. Default: current directory.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
95
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
96 EXAMPLES
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
97 To extract first sequence from Sample1.fasta sequence file and generate
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
98 Sample1SequenceNum.fasta sequence file, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
99
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
100 % ExtractFromSequenceFiles.pl -o Sample1.fasta
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
101
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
102 To extract first sequence from Sample1.aln alignment file and generate
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
103 Sample1SequenceNum.fasta sequence file without any column gaps, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
104
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
105 % ExtractFromSequenceFiles.pl -o Sample1.aln
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
106
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
107 To extract first sequence from Sample1.aln alignment file and generate
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
108 Sample1SequenceNum.fasta sequence file with column gaps, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
109
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
110 % ExtractFromSequenceFiles.pl --IgnroreGaps No -o Sample1.aln
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
111
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
112 To extract sequence number 1 and 4 from Sample1.fasta sequence file and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
113 generate Sample1SequenceNum.fasta sequence file, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
114
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
115 % ExtractFromSequenceFiles.pl -o -m SequenceNum --Sequences 1,4
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
116 -o Sample1.fasta
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
117
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
118 To extract sequences from sequence number 1 to 4 from Sample1.fasta
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
119 sequence file and generate Sample1SequenceNumRange.fasta sequence file,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
120 type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
121
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
122 % ExtractFromSequenceFiles.pl -o -m SequenceNumRange --Sequences
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
123 1,4 -o Sample1.fasta
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
124
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
125 To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
126 sequence file and generate Sample1SequenceID.fasta sequence file, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
127
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
128 % ExtractFromSequenceFiles.pl -o -m SequenceID --Sequences
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
129 "Q9P993/104-387" --SequenceIDMatch Exact -o Sample1.fasta
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
130
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
131 AUTHOR
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
132 Manish Sud <msud@san.rr.com>
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
133
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
134 SEE ALSO
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
135 AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
136
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
137 COPYRIGHT
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
138 Copyright (C) 2015 Manish Sud. All rights reserved.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
139
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
140 This file is part of MayaChemTools.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
141
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
142 MayaChemTools is free software; you can redistribute it and/or modify it
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
143 under the terms of the GNU Lesser General Public License as published by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
144 the Free Software Foundation; either version 3 of the License, or (at
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
145 your option) any later version.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
146