comparison docs/scripts/txt/ExtractFromSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 ExtractFromSequenceFiles.pl - Extract data from sequence and alignment
3 files
4
5 SYNOPSIS
6 ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
7
8 ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no]
9 [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o,
10 --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID,
11 [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum,
12 EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir
13 dirname] SequenceFile(s) AlignmentFile(s)...
14
15 DESCRIPTION
16 Extract specific data from *SequenceFile(s) and AlignmentFile(s)* and
17 generate FASTA files. You can extract sequences using sequence IDs or
18 sequence numbers.
19
20 The file names are separated by spaces. All the sequence files in a
21 current directory can be specified by **.aln*, **.msf*, **.fasta*,
22 **.fta*, **.pir* or any other supported formats; additionally, *DirName*
23 corresponds to all the sequence files in the current directory with any
24 of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*.
25
26 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
27 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
28 formats are detected by parsing the contents of *SequenceFile(s) and
29 AlignmentFile(s)*.
30
31 OPTIONS
32 -h, --help
33 Print this help message.
34
35 -i, --IgnoreGaps *yes | no*
36 Ignore gaps or gap columns during during generation of new sequence
37 or alignment file(s). Possible values: *yes or no*. Default value:
38 *yes*.
39
40 In order to remove gap columns, length of all the sequence must be
41 same; otherwise, this option is ignored.
42
43 -m, --mode *SequenceID | SequenceNum | SequenceNumRange*
44 Specify how to extract data from sequence files: extract sequences
45 using sequence IDs or sequence numbers. Possible values: *SequenceID
46 | SequenceNum | SequenceNumRange*. Default: *SequenceNum* with value
47 of 1.
48
49 The sequence numbers correspond to position of sequences starting
50 from 1 for first sequence in *SequenceFile(s) and AlignmentFile(s)*.
51
52 -o, --overwrite
53 Overwrite existing files.
54
55 -r, --root *rootname*
56 New sequence file name is generated using the root:
57 <Root><Mode>.<Ext>. Default new file:
58 <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple
59 input files.
60
61 -s, --Sequences *"SequenceID,[SequenceID,...]" |
62 "SequenceNum,[SequenceNum,...]" | "StartingSeqNum,EndingSeqNum"*
63 This value is -m, --mode specific. In general, it's a comma
64 delimites list of sequence IDs or sequence numbers.
65
66 For *SequenceID* value of -m, --mode option, input value format is:
67 *SequenceID,...*. Examples:
68
69 ACHE_BOVIN
70 ACHE_BOVIN,ACHE_HUMAN
71
72 For *SequenceNum* value of -m, --mode option, input value format is:
73 *SequenceNum,...*. Examples:
74
75 2
76 1,5
77
78 For *SequenceNum* value of -m, --mode option, input value format is:
79 *StaringSeqNum,EndingSeqNum*. Examples:
80
81 2,4
82
83 --SequenceIDMatch *Exact | Relaxed*
84 Sequence IDs matching criterion during *SequenceID* value of -m,
85 --mode option: match specified sequence ID exactly or as sub string
86 against sequence IDs in the files. Possible values: *Exact |
87 Relaxed*. Default: *Relaxed*. Sequence ID match is case insenstitive
88 during both options.
89
90 --SequenceLength *number*
91 Maximum sequence length per line in sequence file(s). Default: *80*.
92
93 -w --WorkingDir *text*
94 Location of working directory. Default: current directory.
95
96 EXAMPLES
97 To extract first sequence from Sample1.fasta sequence file and generate
98 Sample1SequenceNum.fasta sequence file, type:
99
100 % ExtractFromSequenceFiles.pl -o Sample1.fasta
101
102 To extract first sequence from Sample1.aln alignment file and generate
103 Sample1SequenceNum.fasta sequence file without any column gaps, type:
104
105 % ExtractFromSequenceFiles.pl -o Sample1.aln
106
107 To extract first sequence from Sample1.aln alignment file and generate
108 Sample1SequenceNum.fasta sequence file with column gaps, type:
109
110 % ExtractFromSequenceFiles.pl --IgnroreGaps No -o Sample1.aln
111
112 To extract sequence number 1 and 4 from Sample1.fasta sequence file and
113 generate Sample1SequenceNum.fasta sequence file, type:
114
115 % ExtractFromSequenceFiles.pl -o -m SequenceNum --Sequences 1,4
116 -o Sample1.fasta
117
118 To extract sequences from sequence number 1 to 4 from Sample1.fasta
119 sequence file and generate Sample1SequenceNumRange.fasta sequence file,
120 type:
121
122 % ExtractFromSequenceFiles.pl -o -m SequenceNumRange --Sequences
123 1,4 -o Sample1.fasta
124
125 To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta
126 sequence file and generate Sample1SequenceID.fasta sequence file, type:
127
128 % ExtractFromSequenceFiles.pl -o -m SequenceID --Sequences
129 "Q9P993/104-387" --SequenceIDMatch Exact -o Sample1.fasta
130
131 AUTHOR
132 Manish Sud <msud@san.rr.com>
133
134 SEE ALSO
135 AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl
136
137 COPYRIGHT
138 Copyright (C) 2015 Manish Sud. All rights reserved.
139
140 This file is part of MayaChemTools.
141
142 MayaChemTools is free software; you can redistribute it and/or modify it
143 under the terms of the GNU Lesser General Public License as published by
144 the Free Software Foundation; either version 3 of the License, or (at
145 your option) any later version.
146