Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/ExtractFromSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4816e4a8ae95 |
---|---|
1 NAME | |
2 ExtractFromSequenceFiles.pl - Extract data from sequence and alignment | |
3 files | |
4 | |
5 SYNOPSIS | |
6 ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)... | |
7 | |
8 ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no] | |
9 [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o, | |
10 --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID, | |
11 [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum, | |
12 EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir | |
13 dirname] SequenceFile(s) AlignmentFile(s)... | |
14 | |
15 DESCRIPTION | |
16 Extract specific data from *SequenceFile(s) and AlignmentFile(s)* and | |
17 generate FASTA files. You can extract sequences using sequence IDs or | |
18 sequence numbers. | |
19 | |
20 The file names are separated by spaces. All the sequence files in a | |
21 current directory can be specified by **.aln*, **.msf*, **.fasta*, | |
22 **.fta*, **.pir* or any other supported formats; additionally, *DirName* | |
23 corresponds to all the sequence files in the current directory with any | |
24 of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*. | |
25 | |
26 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, | |
27 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file | |
28 formats are detected by parsing the contents of *SequenceFile(s) and | |
29 AlignmentFile(s)*. | |
30 | |
31 OPTIONS | |
32 -h, --help | |
33 Print this help message. | |
34 | |
35 -i, --IgnoreGaps *yes | no* | |
36 Ignore gaps or gap columns during during generation of new sequence | |
37 or alignment file(s). Possible values: *yes or no*. Default value: | |
38 *yes*. | |
39 | |
40 In order to remove gap columns, length of all the sequence must be | |
41 same; otherwise, this option is ignored. | |
42 | |
43 -m, --mode *SequenceID | SequenceNum | SequenceNumRange* | |
44 Specify how to extract data from sequence files: extract sequences | |
45 using sequence IDs or sequence numbers. Possible values: *SequenceID | |
46 | SequenceNum | SequenceNumRange*. Default: *SequenceNum* with value | |
47 of 1. | |
48 | |
49 The sequence numbers correspond to position of sequences starting | |
50 from 1 for first sequence in *SequenceFile(s) and AlignmentFile(s)*. | |
51 | |
52 -o, --overwrite | |
53 Overwrite existing files. | |
54 | |
55 -r, --root *rootname* | |
56 New sequence file name is generated using the root: | |
57 <Root><Mode>.<Ext>. Default new file: | |
58 <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple | |
59 input files. | |
60 | |
61 -s, --Sequences *"SequenceID,[SequenceID,...]" | | |
62 "SequenceNum,[SequenceNum,...]" | "StartingSeqNum,EndingSeqNum"* | |
63 This value is -m, --mode specific. In general, it's a comma | |
64 delimites list of sequence IDs or sequence numbers. | |
65 | |
66 For *SequenceID* value of -m, --mode option, input value format is: | |
67 *SequenceID,...*. Examples: | |
68 | |
69 ACHE_BOVIN | |
70 ACHE_BOVIN,ACHE_HUMAN | |
71 | |
72 For *SequenceNum* value of -m, --mode option, input value format is: | |
73 *SequenceNum,...*. Examples: | |
74 | |
75 2 | |
76 1,5 | |
77 | |
78 For *SequenceNum* value of -m, --mode option, input value format is: | |
79 *StaringSeqNum,EndingSeqNum*. Examples: | |
80 | |
81 2,4 | |
82 | |
83 --SequenceIDMatch *Exact | Relaxed* | |
84 Sequence IDs matching criterion during *SequenceID* value of -m, | |
85 --mode option: match specified sequence ID exactly or as sub string | |
86 against sequence IDs in the files. Possible values: *Exact | | |
87 Relaxed*. Default: *Relaxed*. Sequence ID match is case insenstitive | |
88 during both options. | |
89 | |
90 --SequenceLength *number* | |
91 Maximum sequence length per line in sequence file(s). Default: *80*. | |
92 | |
93 -w --WorkingDir *text* | |
94 Location of working directory. Default: current directory. | |
95 | |
96 EXAMPLES | |
97 To extract first sequence from Sample1.fasta sequence file and generate | |
98 Sample1SequenceNum.fasta sequence file, type: | |
99 | |
100 % ExtractFromSequenceFiles.pl -o Sample1.fasta | |
101 | |
102 To extract first sequence from Sample1.aln alignment file and generate | |
103 Sample1SequenceNum.fasta sequence file without any column gaps, type: | |
104 | |
105 % ExtractFromSequenceFiles.pl -o Sample1.aln | |
106 | |
107 To extract first sequence from Sample1.aln alignment file and generate | |
108 Sample1SequenceNum.fasta sequence file with column gaps, type: | |
109 | |
110 % ExtractFromSequenceFiles.pl --IgnroreGaps No -o Sample1.aln | |
111 | |
112 To extract sequence number 1 and 4 from Sample1.fasta sequence file and | |
113 generate Sample1SequenceNum.fasta sequence file, type: | |
114 | |
115 % ExtractFromSequenceFiles.pl -o -m SequenceNum --Sequences 1,4 | |
116 -o Sample1.fasta | |
117 | |
118 To extract sequences from sequence number 1 to 4 from Sample1.fasta | |
119 sequence file and generate Sample1SequenceNumRange.fasta sequence file, | |
120 type: | |
121 | |
122 % ExtractFromSequenceFiles.pl -o -m SequenceNumRange --Sequences | |
123 1,4 -o Sample1.fasta | |
124 | |
125 To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta | |
126 sequence file and generate Sample1SequenceID.fasta sequence file, type: | |
127 | |
128 % ExtractFromSequenceFiles.pl -o -m SequenceID --Sequences | |
129 "Q9P993/104-387" --SequenceIDMatch Exact -o Sample1.fasta | |
130 | |
131 AUTHOR | |
132 Manish Sud <msud@san.rr.com> | |
133 | |
134 SEE ALSO | |
135 AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl | |
136 | |
137 COPYRIGHT | |
138 Copyright (C) 2015 Manish Sud. All rights reserved. | |
139 | |
140 This file is part of MayaChemTools. | |
141 | |
142 MayaChemTools is free software; you can redistribute it and/or modify it | |
143 under the terms of the GNU Lesser General Public License as published by | |
144 the Free Software Foundation; either version 3 of the License, or (at | |
145 your option) any later version. | |
146 |