Mercurial > repos > deepakjadmin > mayatool3_test2
view docs/scripts/txt/ExtractFromSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
line wrap: on
line source
NAME ExtractFromSequenceFiles.pl - Extract data from sequence and alignment files SYNOPSIS ExtractFromSequenceFiles.pl SequenceFile(s) AlignmentFile(s)... ExtractFromSequenceFiles.pl [-h, --help] [-i, --IgnoreGaps yes | no] [-m, --mode SequenceID | SequenceNum | SequenceNumRange] [-o, --overwrite] [-r, --root rootname] [-s, --Sequences "SequenceID, [SequenceID,...]" | "SequenceNum, [SequenceNum,...]" | "StartingSeqNum, EndingSeqNum"] [--SequenceIDMatch Exact | Relaxed] [-w, --WorkingDir dirname] SequenceFile(s) AlignmentFile(s)... DESCRIPTION Extract specific data from *SequenceFile(s) and AlignmentFile(s)* and generate FASTA files. You can extract sequences using sequence IDs or sequence numbers. The file names are separated by spaces. All the sequence files in a current directory can be specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other supported formats; additionally, *DirName* corresponds to all the sequence files in the current directory with any of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*. Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file formats are detected by parsing the contents of *SequenceFile(s) and AlignmentFile(s)*. OPTIONS -h, --help Print this help message. -i, --IgnoreGaps *yes | no* Ignore gaps or gap columns during during generation of new sequence or alignment file(s). Possible values: *yes or no*. Default value: *yes*. In order to remove gap columns, length of all the sequence must be same; otherwise, this option is ignored. -m, --mode *SequenceID | SequenceNum | SequenceNumRange* Specify how to extract data from sequence files: extract sequences using sequence IDs or sequence numbers. Possible values: *SequenceID | SequenceNum | SequenceNumRange*. Default: *SequenceNum* with value of 1. The sequence numbers correspond to position of sequences starting from 1 for first sequence in *SequenceFile(s) and AlignmentFile(s)*. -o, --overwrite Overwrite existing files. -r, --root *rootname* New sequence file name is generated using the root: <Root><Mode>.<Ext>. Default new file: <SequenceFileName><Mode>.<Ext>. This option is ignored for multiple input files. -s, --Sequences *"SequenceID,[SequenceID,...]" | "SequenceNum,[SequenceNum,...]" | "StartingSeqNum,EndingSeqNum"* This value is -m, --mode specific. In general, it's a comma delimites list of sequence IDs or sequence numbers. For *SequenceID* value of -m, --mode option, input value format is: *SequenceID,...*. Examples: ACHE_BOVIN ACHE_BOVIN,ACHE_HUMAN For *SequenceNum* value of -m, --mode option, input value format is: *SequenceNum,...*. Examples: 2 1,5 For *SequenceNum* value of -m, --mode option, input value format is: *StaringSeqNum,EndingSeqNum*. Examples: 2,4 --SequenceIDMatch *Exact | Relaxed* Sequence IDs matching criterion during *SequenceID* value of -m, --mode option: match specified sequence ID exactly or as sub string against sequence IDs in the files. Possible values: *Exact | Relaxed*. Default: *Relaxed*. Sequence ID match is case insenstitive during both options. --SequenceLength *number* Maximum sequence length per line in sequence file(s). Default: *80*. -w --WorkingDir *text* Location of working directory. Default: current directory. EXAMPLES To extract first sequence from Sample1.fasta sequence file and generate Sample1SequenceNum.fasta sequence file, type: % ExtractFromSequenceFiles.pl -o Sample1.fasta To extract first sequence from Sample1.aln alignment file and generate Sample1SequenceNum.fasta sequence file without any column gaps, type: % ExtractFromSequenceFiles.pl -o Sample1.aln To extract first sequence from Sample1.aln alignment file and generate Sample1SequenceNum.fasta sequence file with column gaps, type: % ExtractFromSequenceFiles.pl --IgnroreGaps No -o Sample1.aln To extract sequence number 1 and 4 from Sample1.fasta sequence file and generate Sample1SequenceNum.fasta sequence file, type: % ExtractFromSequenceFiles.pl -o -m SequenceNum --Sequences 1,4 -o Sample1.fasta To extract sequences from sequence number 1 to 4 from Sample1.fasta sequence file and generate Sample1SequenceNumRange.fasta sequence file, type: % ExtractFromSequenceFiles.pl -o -m SequenceNumRange --Sequences 1,4 -o Sample1.fasta To extract sequence ID "Q9P993/104-387" from sequence from Sample1.fasta sequence file and generate Sample1SequenceID.fasta sequence file, type: % ExtractFromSequenceFiles.pl -o -m SequenceID --Sequences "Q9P993/104-387" --SequenceIDMatch Exact -o Sample1.fasta AUTHOR Manish Sud <msud@san.rr.com> SEE ALSO AnalyzeSequenceFilesData.pl, InfoSequenceFiles.pl COPYRIGHT Copyright (C) 2015 Manish Sud. All rights reserved. This file is part of MayaChemTools. MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.