Mercurial > repos > deepakjadmin > mayatool3_test1
diff docs/scripts/txt/InfoSequenceFiles.txt @ 1:2abf0d43254d draft
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:10:43 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/scripts/txt/InfoSequenceFiles.txt Wed Jan 20 09:10:43 2016 -0500 @@ -0,0 +1,139 @@ +NAME + InfoSequenceFiles.pl - List information about sequence and alignment + files + +SYNOPSIS + InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)... + + InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel] + [-f, --frequency] [--FrequencyBins number | "number, number, + [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest] + [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname] + SequenceFile(s)... + +DESCRIPTION + List information about contents of *SequenceFile(s) and + AlignmentFile(s)*: number of sequences, shortest and longest sequences, + distribution of sequence lengths and so on. The file names are separated + by spaces. All the sequence files in a current directory can be + specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other + supported formats; additionally, *DirName* corresponds to all the + sequence files in the current directory with any of the supported file + extension: *.aln, .msf, .fasta, .fta, and .pir*. + + Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, + *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file + formats are detected by parsing the contents of *SequenceFile(s) and + AlignmentFile(s)*. + +OPTIONS + -a, --all + List all the available information. + + -c, --count + List number of of sequences. This is default behavior. + + -d, --detail *InfoLevel* + Level of information to print about sequences during various + options. Default: *1*. Possible values: *1, 2 or 3*. + + -f, --frequency + List distribution of sequence lengths using the specified number of + bins or bin range specified using FrequencyBins option. + + This option is ignored for input files containing only single + sequence. + + --FrequencyBins *number | "number,number,[number,...]"* + This value is used with -f, --frequency option to list distribution + of sequence lengths using the specified number of bins or bin range. + Default value: *10*. + + The bin range list is used to group sequence lengths into different + groups; It must contain values in ascending order. Examples: + + 100,200,300,400,500,600 + 200,400,600,800,1000 + + The frequency value calculated for a specific bin corresponds to all + the sequence lengths which are greater than the previous bin value + and less than or equal to the current bin value. + + -h, --help + Print this help message. + + -i, --IgnoreGaps *yes | no* + Ignore gaps during calculation of sequence lengths. Possible values: + *yes or no*. Default value: *no*. + + -l, --longest + List information about longest sequence: ID, sequence and sequence + length. This option is ignored for input files containing only + single sequence. + + -s, --shortest + List information about shortest sequence: ID, sequence and sequence + length. This option is ignored for input files containing only + single sequence. + + --SequenceLengths + List information about sequence lengths. + + -w, --WorkingDir *dirname* + Location of working directory. Default: current directory. + +EXAMPLES + To count number of sequences in sequence files, type: + + % InfoSequenceFiles.pl Sample1.fasta + % InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir + % InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln + + To list all available information with maximum level of available detail + for a sequence alignment file Sample1.msf, type: + + % InfoSequenceFiles.pl -a -d 3 Sample1.msf + + To list sequence length information after ignoring sequence gaps in + Sample1.aln file, type: + + % InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes + Sample1.aln + + To list shortest and longest sequence length information after ignoring + sequence gaps in Sample1.aln file, type: + + % InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes + Sample1.aln + + To list distribution of sequence lengths after ignoring sequence gaps in + Sample1.aln file and report the frequency distribution into 10 bins, + type: + + % InfoSequenceFiles.pl --frequency --FrequencyBins 10 + --IgnoreGaps Yes Sample1.aln + + To list distribution of sequence lengths after ignoring sequence gaps in + Sample1.aln file and report the frequency distribution into specified + bin range, type: + + % InfoSequenceFiles.pl --frequency --FrequencyBins + "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln + +AUTHOR + Manish Sud <msud@san.rr.com> + +SEE ALSO + AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl, + InfoAminoAcids.pl, InfoNucleicAcids.pl + +COPYRIGHT + Copyright (C) 2015 Manish Sud. All rights reserved. + + This file is part of MayaChemTools. + + MayaChemTools is free software; you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published by + the Free Software Foundation; either version 3 of the License, or (at + your option) any later version. +