Mercurial > repos > deepakjadmin > mayatool3_test2
diff docs/scripts/txt/ExtractFromSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/scripts/txt/ExtractFromSequenceFiles.txt Wed Jan 20 09:23:18 2016 -0500 @@ -0,0 +1,146 @@ +NAME + ExtractFromSequenceFiles.pl - Extract data from sequence and alignment + files + +SYNOPSIS + ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)... + + ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no] + [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o, + --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID, + [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum, + EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir + dirname] SequenceFile(s) AlignmentFile(s)... + +DESCRIPTION + Extract specific data from *SequenceFile(s) and AlignmentFile(s)* and + generate FASTA files. You can extract sequences using sequence IDs or + sequence numbers. + + The file names are separated by spaces. All the sequence files in a + current directory can be specified by **.aln*, **.msf*, **.fasta*, + **.fta*, **.pir* or any other supported formats; additionally, *DirName* + corresponds to all the sequence files in the current directory with any + of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*. + + Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, + *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file + formats are detected by parsing the contents of *SequenceFile(s) and + AlignmentFile(s)*. + +OPTIONS + -h, --help + Print this help message. + + -i, --IgnoreGaps *yes | no* + Ignore gaps or gap columns during during generation of new sequence + or alignment file(s). Possible values: *yes or no*. Default value: + *yes*. + + In order to remove gap columns, length of all the sequence must be + same; otherwise, this option is ignored. + + -m, --mode *SequenceID | SequenceNum | SequenceNumRange* + Specify how to extract data from sequence files: extract sequences + using sequence IDs or sequence numbers. Possible values: *SequenceID + | SequenceNum | SequenceNumRange*. Default: *SequenceNum* with value + of 1. + + The sequence numbers correspond to position of sequences starting + from 1 for first sequence in *SequenceFile(s) and AlignmentFile(s)*. + + -o, --overwrite + Overwrite existing files. + + -r, --root *rootname* + New sequence file name is generated using the root: + <Root><Mode>.<Ext>. Default new file: + <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple + input files. + + -s, --Sequences *"SequenceID,[SequenceID,...]" | + "SequenceNum,[SequenceNum,...]" | "StartingSeqNum,EndingSeqNum"* + This value is -m, --mode specific. In general, it's a comma + delimites list of sequence IDs or sequence numbers. + + For *SequenceID* value of -m, --mode option, input value format is: + *SequenceID,...*. Examples: + + ACHE_BOVIN + ACHE_BOVIN,ACHE_HUMAN + + For *SequenceNum* value of -m, --mode option, input value format is: + *SequenceNum,...*. Examples: + + 2 + 1,5 + + For *SequenceNum* value of -m, --mode option, input value format is: + *StaringSeqNum,EndingSeqNum*. Examples: + + 2,4 + + --SequenceIDMatch *Exact | Relaxed* + Sequence IDs matching criterion during *SequenceID* value of -m, + --mode option: match specified sequence ID exactly or as sub string + against sequence IDs in the files. Possible values: *Exact | + Relaxed*. Default: *Relaxed*. Sequence ID match is case insenstitive + during both options. + + --SequenceLength *number* + Maximum sequence length per line in sequence file(s). Default: *80*. + + -w --WorkingDir *text* + Location of working directory. Default: current directory. + +EXAMPLES + To extract first sequence from Sample1.fasta sequence file and generate + Sample1SequenceNum.fasta sequence file, type: + + % ExtractFromSequenceFiles.pl -o Sample1.fasta + + To extract first sequence from Sample1.aln alignment file and generate + Sample1SequenceNum.fasta sequence file without any column gaps, type: + + % ExtractFromSequenceFiles.pl -o Sample1.aln + + To extract first sequence from Sample1.aln alignment file and generate + Sample1SequenceNum.fasta sequence file with column gaps, type: + + % ExtractFromSequenceFiles.pl --IgnroreGaps No -o Sample1.aln + + To extract sequence number 1 and 4 from Sample1.fasta sequence file and + generate Sample1SequenceNum.fasta sequence file, type: + + % ExtractFromSequenceFiles.pl -o -m SequenceNum --Sequences 1,4 + -o Sample1.fasta + + To extract sequences from sequence number 1 to 4 from Sample1.fasta + sequence file and generate Sample1SequenceNumRange.fasta sequence file, + type: + + % ExtractFromSequenceFiles.pl -o -m SequenceNumRange --Sequences + 1,4 -o Sample1.fasta + + To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta + sequence file and generate Sample1SequenceID.fasta sequence file, type: + + % ExtractFromSequenceFiles.pl -o -m SequenceID --Sequences + "Q9P993/104-387" --SequenceIDMatch Exact -o Sample1.fasta + +AUTHOR + Manish Sud <msud@san.rr.com> + +SEE ALSO + AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl + +COPYRIGHT + Copyright (C) 2015 Manish Sud. All rights reserved. + + This file is part of MayaChemTools. + + MayaChemTools is free software; you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published by + the Free Software Foundation; either version 3 of the License, or (at + your option) any later version. +