annotate mayachemtools/docs/scripts/txt/ExtractFromSequenceFiles.txt @ 9:ab29fa5c8c1f draft default tip

Uploaded
author deepakjadmin
date Thu, 15 Dec 2016 14:18:03 -0500
parents 73ae111cf86f
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
2 ExtractFromSequenceFiles.pl - Extract data from sequence and alignment
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
3 files
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
4
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
5 SYNOPSIS
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
6 ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
7
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
8 ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no]
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
9 [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o,
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
10 --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID,
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
11 [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum,
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
12 EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
13 dirname] SequenceFile(s) AlignmentFile(s)...
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
14
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
15 DESCRIPTION
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
16 Extract specific data from *SequenceFile(s) and AlignmentFile(s)* and
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
17 generate FASTA files. You can extract sequences using sequence IDs or
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
18 sequence numbers.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
19
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
20 The file names are separated by spaces. All the sequence files in a
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
21 current directory can be specified by **.aln*, **.msf*, **.fasta*,
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
22 **.fta*, **.pir* or any other supported formats; additionally, *DirName*
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
23 corresponds to all the sequence files in the current directory with any
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
24 of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
25
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
26 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
27 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
28 formats are detected by parsing the contents of *SequenceFile(s) and
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
29 AlignmentFile(s)*.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
30
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
31 OPTIONS
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
32 -h, --help
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
33 Print this help message.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
34
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
35 -i, --IgnoreGaps *yes | no*
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
36 Ignore gaps or gap columns during during generation of new sequence
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
37 or alignment file(s). Possible values: *yes or no*. Default value:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
38 *yes*.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
39
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
40 In order to remove gap columns, length of all the sequence must be
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
41 same; otherwise, this option is ignored.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
42
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
43 -m, --mode *SequenceID | SequenceNum | SequenceNumRange*
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
44 Specify how to extract data from sequence files: extract sequences
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
45 using sequence IDs or sequence numbers. Possible values: *SequenceID
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
46 | SequenceNum | SequenceNumRange*. Default: *SequenceNum* with value
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
47 of 1.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
48
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
49 The sequence numbers correspond to position of sequences starting
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
50 from 1 for first sequence in *SequenceFile(s) and AlignmentFile(s)*.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
51
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
52 -o, --overwrite
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
53 Overwrite existing files.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
54
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
55 -r, --root *rootname*
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
56 New sequence file name is generated using the root:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
57 <Root><Mode>.<Ext>. Default new file:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
58 <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
59 input files.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
60
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
61 -s, --Sequences *"SequenceID,[SequenceID,...]" |
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
62 "SequenceNum,[SequenceNum,...]" | "StartingSeqNum,EndingSeqNum"*
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
63 This value is -m, --mode specific. In general, it's a comma
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
64 delimites list of sequence IDs or sequence numbers.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
65
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
66 For *SequenceID* value of -m, --mode option, input value format is:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
67 *SequenceID,...*. Examples:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
68
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
69 ACHE_BOVIN
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
70 ACHE_BOVIN,ACHE_HUMAN
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
71
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
72 For *SequenceNum* value of -m, --mode option, input value format is:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
73 *SequenceNum,...*. Examples:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
74
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
75 2
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
76 1,5
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
77
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
78 For *SequenceNum* value of -m, --mode option, input value format is:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
79 *StaringSeqNum,EndingSeqNum*. Examples:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
80
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
81 2,4
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
82
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
83 --SequenceIDMatch *Exact | Relaxed*
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
84 Sequence IDs matching criterion during *SequenceID* value of -m,
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
85 --mode option: match specified sequence ID exactly or as sub string
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
86 against sequence IDs in the files. Possible values: *Exact |
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
87 Relaxed*. Default: *Relaxed*. Sequence ID match is case insenstitive
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
88 during both options.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
89
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
90 --SequenceLength *number*
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
91 Maximum sequence length per line in sequence file(s). Default: *80*.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
92
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
93 -w --WorkingDir *text*
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
94 Location of working directory. Default: current directory.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
95
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
96 EXAMPLES
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
97 To extract first sequence from Sample1.fasta sequence file and generate
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
98 Sample1SequenceNum.fasta sequence file, type:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
99
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
100 % ExtractFromSequenceFiles.pl -o Sample1.fasta
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
101
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
102 To extract first sequence from Sample1.aln alignment file and generate
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
103 Sample1SequenceNum.fasta sequence file without any column gaps, type:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
104
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
105 % ExtractFromSequenceFiles.pl -o Sample1.aln
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
106
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
107 To extract first sequence from Sample1.aln alignment file and generate
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
108 Sample1SequenceNum.fasta sequence file with column gaps, type:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
109
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
110 % ExtractFromSequenceFiles.pl --IgnroreGaps No -o Sample1.aln
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
111
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
112 To extract sequence number 1 and 4 from Sample1.fasta sequence file and
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
113 generate Sample1SequenceNum.fasta sequence file, type:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
114
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
115 % ExtractFromSequenceFiles.pl -o -m SequenceNum --Sequences 1,4
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
116 -o Sample1.fasta
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
117
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
118 To extract sequences from sequence number 1 to 4 from Sample1.fasta
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
119 sequence file and generate Sample1SequenceNumRange.fasta sequence file,
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
120 type:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
121
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
122 % ExtractFromSequenceFiles.pl -o -m SequenceNumRange --Sequences
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
123 1,4 -o Sample1.fasta
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
124
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
125 To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
126 sequence file and generate Sample1SequenceID.fasta sequence file, type:
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
127
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
128 % ExtractFromSequenceFiles.pl -o -m SequenceID --Sequences
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
129 "Q9P993/104-387" --SequenceIDMatch Exact -o Sample1.fasta
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
130
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
131 AUTHOR
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
132 Manish Sud <msud@san.rr.com>
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
133
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
134 SEE ALSO
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
135 AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
136
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
137 COPYRIGHT
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
138 Copyright (C) 2015 Manish Sud. All rights reserved.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
139
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
140 This file is part of MayaChemTools.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
141
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
142 MayaChemTools is free software; you can redistribute it and/or modify it
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
143 under the terms of the GNU Lesser General Public License as published by
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
144 the Free Software Foundation; either version 3 of the License, or (at
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
145 your option) any later version.
73ae111cf86f Uploaded
deepakjadmin
parents:
diff changeset
146