Mercurial > repos > deepakjadmin > mayatool3_test3
view mayachemtools/docs/scripts/txt/InfoSequenceFiles.txt @ 5:9a001a14a022 draft
Deleted selected files
author | deepakjadmin |
---|---|
date | Thu, 15 Dec 2016 14:04:29 -0500 |
parents | 73ae111cf86f |
children |
line wrap: on
line source
NAME InfoSequenceFiles.pl - List information about sequence and alignment files SYNOPSIS InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)... InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel] [-f, --frequency] [--FrequencyBins number | "number, number, [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest] [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname] SequenceFile(s)... DESCRIPTION List information about contents of *SequenceFile(s) and AlignmentFile(s)*: number of sequences, shortest and longest sequences, distribution of sequence lengths and so on. The file names are separated by spaces. All the sequence files in a current directory can be specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other supported formats; additionally, *DirName* corresponds to all the sequence files in the current directory with any of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*. Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file formats are detected by parsing the contents of *SequenceFile(s) and AlignmentFile(s)*. OPTIONS -a, --all List all the available information. -c, --count List number of of sequences. This is default behavior. -d, --detail *InfoLevel* Level of information to print about sequences during various options. Default: *1*. Possible values: *1, 2 or 3*. -f, --frequency List distribution of sequence lengths using the specified number of bins or bin range specified using FrequencyBins option. This option is ignored for input files containing only single sequence. --FrequencyBins *number | "number,number,[number,...]"* This value is used with -f, --frequency option to list distribution of sequence lengths using the specified number of bins or bin range. Default value: *10*. The bin range list is used to group sequence lengths into different groups; It must contain values in ascending order. Examples: 100,200,300,400,500,600 200,400,600,800,1000 The frequency value calculated for a specific bin corresponds to all the sequence lengths which are greater than the previous bin value and less than or equal to the current bin value. -h, --help Print this help message. -i, --IgnoreGaps *yes | no* Ignore gaps during calculation of sequence lengths. Possible values: *yes or no*. Default value: *no*. -l, --longest List information about longest sequence: ID, sequence and sequence length. This option is ignored for input files containing only single sequence. -s, --shortest List information about shortest sequence: ID, sequence and sequence length. This option is ignored for input files containing only single sequence. --SequenceLengths List information about sequence lengths. -w, --WorkingDir *dirname* Location of working directory. Default: current directory. EXAMPLES To count number of sequences in sequence files, type: % InfoSequenceFiles.pl Sample1.fasta % InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir % InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln To list all available information with maximum level of available detail for a sequence alignment file Sample1.msf, type: % InfoSequenceFiles.pl -a -d 3 Sample1.msf To list sequence length information after ignoring sequence gaps in Sample1.aln file, type: % InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes Sample1.aln To list shortest and longest sequence length information after ignoring sequence gaps in Sample1.aln file, type: % InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes Sample1.aln To list distribution of sequence lengths after ignoring sequence gaps in Sample1.aln file and report the frequency distribution into 10 bins, type: % InfoSequenceFiles.pl --frequency --FrequencyBins 10 --IgnoreGaps Yes Sample1.aln To list distribution of sequence lengths after ignoring sequence gaps in Sample1.aln file and report the frequency distribution into specified bin range, type: % InfoSequenceFiles.pl --frequency --FrequencyBins "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln AUTHOR Manish Sud <msud@san.rr.com> SEE ALSO AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl, InfoAminoAcids.pl, InfoNucleicAcids.pl COPYRIGHT Copyright (C) 2015 Manish Sud. All rights reserved. This file is part of MayaChemTools. MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.