annotate docs/scripts/txt/ExtractFromSequenceFiles.txt @ 3:90ea638ce878 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:11:59 -0500
parents 2abf0d43254d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
2 ExtractFromSequenceFiles.pl - Extract data from sequence and alignment
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
3 files
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
4
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
5 SYNOPSIS
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
6 ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
7
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
8 ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no]
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
9 [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
10 --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
11 [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
12 EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
13 dirname] SequenceFile(s) AlignmentFile(s)...
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
14
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
15 DESCRIPTION
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
16 Extract specific data from *SequenceFile(s) and AlignmentFile(s)* and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
17 generate FASTA files. You can extract sequences using sequence IDs or
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
18 sequence numbers.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
19
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
20 The file names are separated by spaces. All the sequence files in a
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
21 current directory can be specified by **.aln*, **.msf*, **.fasta*,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
22 **.fta*, **.pir* or any other supported formats; additionally, *DirName*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
23 corresponds to all the sequence files in the current directory with any
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
24 of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
25
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
26 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
27 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
28 formats are detected by parsing the contents of *SequenceFile(s) and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
29 AlignmentFile(s)*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
30
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
31 OPTIONS
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
32 -h, --help
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
33 Print this help message.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
34
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
35 -i, --IgnoreGaps *yes | no*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
36 Ignore gaps or gap columns during during generation of new sequence
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
37 or alignment file(s). Possible values: *yes or no*. Default value:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
38 *yes*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
39
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
40 In order to remove gap columns, length of all the sequence must be
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
41 same; otherwise, this option is ignored.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
42
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
43 -m, --mode *SequenceID | SequenceNum | SequenceNumRange*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
44 Specify how to extract data from sequence files: extract sequences
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
45 using sequence IDs or sequence numbers. Possible values: *SequenceID
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
46 | SequenceNum | SequenceNumRange*. Default: *SequenceNum* with value
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
47 of 1.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
48
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
49 The sequence numbers correspond to position of sequences starting
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
50 from 1 for first sequence in *SequenceFile(s) and AlignmentFile(s)*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
51
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
52 -o, --overwrite
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
53 Overwrite existing files.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
54
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
55 -r, --root *rootname*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
56 New sequence file name is generated using the root:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
57 <Root><Mode>.<Ext>. Default new file:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
58 <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
59 input files.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
60
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
61 -s, --Sequences *"SequenceID,[SequenceID,...]" |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
62 "SequenceNum,[SequenceNum,...]" | "StartingSeqNum,EndingSeqNum"*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
63 This value is -m, --mode specific. In general, it's a comma
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
64 delimites list of sequence IDs or sequence numbers.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
65
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
66 For *SequenceID* value of -m, --mode option, input value format is:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
67 *SequenceID,...*. Examples:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
68
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
69 ACHE_BOVIN
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
70 ACHE_BOVIN,ACHE_HUMAN
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
71
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
72 For *SequenceNum* value of -m, --mode option, input value format is:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
73 *SequenceNum,...*. Examples:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
74
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
75 2
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
76 1,5
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
77
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
78 For *SequenceNum* value of -m, --mode option, input value format is:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
79 *StaringSeqNum,EndingSeqNum*. Examples:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
80
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
81 2,4
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
82
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
83 --SequenceIDMatch *Exact | Relaxed*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
84 Sequence IDs matching criterion during *SequenceID* value of -m,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
85 --mode option: match specified sequence ID exactly or as sub string
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
86 against sequence IDs in the files. Possible values: *Exact |
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
87 Relaxed*. Default: *Relaxed*. Sequence ID match is case insenstitive
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
88 during both options.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
89
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
90 --SequenceLength *number*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
91 Maximum sequence length per line in sequence file(s). Default: *80*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
92
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
93 -w --WorkingDir *text*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
94 Location of working directory. Default: current directory.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
95
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
96 EXAMPLES
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
97 To extract first sequence from Sample1.fasta sequence file and generate
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
98 Sample1SequenceNum.fasta sequence file, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
99
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
100 % ExtractFromSequenceFiles.pl -o Sample1.fasta
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
101
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
102 To extract first sequence from Sample1.aln alignment file and generate
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
103 Sample1SequenceNum.fasta sequence file without any column gaps, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
104
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
105 % ExtractFromSequenceFiles.pl -o Sample1.aln
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
106
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
107 To extract first sequence from Sample1.aln alignment file and generate
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
108 Sample1SequenceNum.fasta sequence file with column gaps, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
109
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
110 % ExtractFromSequenceFiles.pl --IgnroreGaps No -o Sample1.aln
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
111
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
112 To extract sequence number 1 and 4 from Sample1.fasta sequence file and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
113 generate Sample1SequenceNum.fasta sequence file, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
114
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
115 % ExtractFromSequenceFiles.pl -o -m SequenceNum --Sequences 1,4
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
116 -o Sample1.fasta
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
117
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
118 To extract sequences from sequence number 1 to 4 from Sample1.fasta
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
119 sequence file and generate Sample1SequenceNumRange.fasta sequence file,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
120 type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
121
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
122 % ExtractFromSequenceFiles.pl -o -m SequenceNumRange --Sequences
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
123 1,4 -o Sample1.fasta
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
124
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
125 To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
126 sequence file and generate Sample1SequenceID.fasta sequence file, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
127
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
128 % ExtractFromSequenceFiles.pl -o -m SequenceID --Sequences
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
129 "Q9P993/104-387" --SequenceIDMatch Exact -o Sample1.fasta
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
130
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
131 AUTHOR
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
132 Manish Sud <msud@san.rr.com>
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
133
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
134 SEE ALSO
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
135 AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
136
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
137 COPYRIGHT
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
138 Copyright (C) 2015 Manish Sud. All rights reserved.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
139
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
140 This file is part of MayaChemTools.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
141
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
142 MayaChemTools is free software; you can redistribute it and/or modify it
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
143 under the terms of the GNU Lesser General Public License as published by
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
144 the Free Software Foundation; either version 3 of the License, or (at
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
145 your option) any later version.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
146