Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/ExtractFromSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
| author | deepakjadmin |
|---|---|
| date | Wed, 20 Jan 2016 09:23:18 -0500 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:4816e4a8ae95 |
|---|---|
| 1 NAME | |
| 2 ExtractFromSequenceFiles.pl - Extract data from sequence and alignment | |
| 3 files | |
| 4 | |
| 5 SYNOPSIS | |
| 6 ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)... | |
| 7 | |
| 8 ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no] | |
| 9 [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o, | |
| 10 --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID, | |
| 11 [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum, | |
| 12 EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir | |
| 13 dirname] SequenceFile(s) AlignmentFile(s)... | |
| 14 | |
| 15 DESCRIPTION | |
| 16 Extract specific data from *SequenceFile(s) and AlignmentFile(s)* and | |
| 17 generate FASTA files. You can extract sequences using sequence IDs or | |
| 18 sequence numbers. | |
| 19 | |
| 20 The file names are separated by spaces. All the sequence files in a | |
| 21 current directory can be specified by **.aln*, **.msf*, **.fasta*, | |
| 22 **.fta*, **.pir* or any other supported formats; additionally, *DirName* | |
| 23 corresponds to all the sequence files in the current directory with any | |
| 24 of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*. | |
| 25 | |
| 26 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, | |
| 27 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file | |
| 28 formats are detected by parsing the contents of *SequenceFile(s) and | |
| 29 AlignmentFile(s)*. | |
| 30 | |
| 31 OPTIONS | |
| 32 -h, --help | |
| 33 Print this help message. | |
| 34 | |
| 35 -i, --IgnoreGaps *yes | no* | |
| 36 Ignore gaps or gap columns during during generation of new sequence | |
| 37 or alignment file(s). Possible values: *yes or no*. Default value: | |
| 38 *yes*. | |
| 39 | |
| 40 In order to remove gap columns, length of all the sequence must be | |
| 41 same; otherwise, this option is ignored. | |
| 42 | |
| 43 -m, --mode *SequenceID | SequenceNum | SequenceNumRange* | |
| 44 Specify how to extract data from sequence files: extract sequences | |
| 45 using sequence IDs or sequence numbers. Possible values: *SequenceID | |
| 46 | SequenceNum | SequenceNumRange*. Default: *SequenceNum* with value | |
| 47 of 1. | |
| 48 | |
| 49 The sequence numbers correspond to position of sequences starting | |
| 50 from 1 for first sequence in *SequenceFile(s) and AlignmentFile(s)*. | |
| 51 | |
| 52 -o, --overwrite | |
| 53 Overwrite existing files. | |
| 54 | |
| 55 -r, --root *rootname* | |
| 56 New sequence file name is generated using the root: | |
| 57 <Root><Mode>.<Ext>. Default new file: | |
| 58 <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple | |
| 59 input files. | |
| 60 | |
| 61 -s, --Sequences *"SequenceID,[SequenceID,...]" | | |
| 62 "SequenceNum,[SequenceNum,...]" | "StartingSeqNum,EndingSeqNum"* | |
| 63 This value is -m, --mode specific. In general, it's a comma | |
| 64 delimites list of sequence IDs or sequence numbers. | |
| 65 | |
| 66 For *SequenceID* value of -m, --mode option, input value format is: | |
| 67 *SequenceID,...*. Examples: | |
| 68 | |
| 69 ACHE_BOVIN | |
| 70 ACHE_BOVIN,ACHE_HUMAN | |
| 71 | |
| 72 For *SequenceNum* value of -m, --mode option, input value format is: | |
| 73 *SequenceNum,...*. Examples: | |
| 74 | |
| 75 2 | |
| 76 1,5 | |
| 77 | |
| 78 For *SequenceNum* value of -m, --mode option, input value format is: | |
| 79 *StaringSeqNum,EndingSeqNum*. Examples: | |
| 80 | |
| 81 2,4 | |
| 82 | |
| 83 --SequenceIDMatch *Exact | Relaxed* | |
| 84 Sequence IDs matching criterion during *SequenceID* value of -m, | |
| 85 --mode option: match specified sequence ID exactly or as sub string | |
| 86 against sequence IDs in the files. Possible values: *Exact | | |
| 87 Relaxed*. Default: *Relaxed*. Sequence ID match is case insenstitive | |
| 88 during both options. | |
| 89 | |
| 90 --SequenceLength *number* | |
| 91 Maximum sequence length per line in sequence file(s). Default: *80*. | |
| 92 | |
| 93 -w --WorkingDir *text* | |
| 94 Location of working directory. Default: current directory. | |
| 95 | |
| 96 EXAMPLES | |
| 97 To extract first sequence from Sample1.fasta sequence file and generate | |
| 98 Sample1SequenceNum.fasta sequence file, type: | |
| 99 | |
| 100 % ExtractFromSequenceFiles.pl -o Sample1.fasta | |
| 101 | |
| 102 To extract first sequence from Sample1.aln alignment file and generate | |
| 103 Sample1SequenceNum.fasta sequence file without any column gaps, type: | |
| 104 | |
| 105 % ExtractFromSequenceFiles.pl -o Sample1.aln | |
| 106 | |
| 107 To extract first sequence from Sample1.aln alignment file and generate | |
| 108 Sample1SequenceNum.fasta sequence file with column gaps, type: | |
| 109 | |
| 110 % ExtractFromSequenceFiles.pl --IgnroreGaps No -o Sample1.aln | |
| 111 | |
| 112 To extract sequence number 1 and 4 from Sample1.fasta sequence file and | |
| 113 generate Sample1SequenceNum.fasta sequence file, type: | |
| 114 | |
| 115 % ExtractFromSequenceFiles.pl -o -m SequenceNum --Sequences 1,4 | |
| 116 -o Sample1.fasta | |
| 117 | |
| 118 To extract sequences from sequence number 1 to 4 from Sample1.fasta | |
| 119 sequence file and generate Sample1SequenceNumRange.fasta sequence file, | |
| 120 type: | |
| 121 | |
| 122 % ExtractFromSequenceFiles.pl -o -m SequenceNumRange --Sequences | |
| 123 1,4 -o Sample1.fasta | |
| 124 | |
| 125 To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta | |
| 126 sequence file and generate Sample1SequenceID.fasta sequence file, type: | |
| 127 | |
| 128 % ExtractFromSequenceFiles.pl -o -m SequenceID --Sequences | |
| 129 "Q9P993/104-387" --SequenceIDMatch Exact -o Sample1.fasta | |
| 130 | |
| 131 AUTHOR | |
| 132 Manish Sud <msud@san.rr.com> | |
| 133 | |
| 134 SEE ALSO | |
| 135 AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl | |
| 136 | |
| 137 COPYRIGHT | |
| 138 Copyright (C) 2015 Manish Sud. All rights reserved. | |
| 139 | |
| 140 This file is part of MayaChemTools. | |
| 141 | |
| 142 MayaChemTools is free software; you can redistribute it and/or modify it | |
| 143 under the terms of the GNU Lesser General Public License as published by | |
| 144 the Free Software Foundation; either version 3 of the License, or (at | |
| 145 your option) any later version. | |
| 146 |
