diff docs/scripts/txt/InfoSequenceFiles.txt @ 1:2abf0d43254d draft

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:10:43 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/docs/scripts/txt/InfoSequenceFiles.txt	Wed Jan 20 09:10:43 2016 -0500
@@ -0,0 +1,139 @@
+NAME
+    InfoSequenceFiles.pl - List information about sequence and alignment
+    files
+
+SYNOPSIS
+    InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
+
+    InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel]
+    [-f, --frequency] [--FrequencyBins number | "number, number,
+    [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest]
+    [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname]
+    SequenceFile(s)...
+
+DESCRIPTION
+    List information about contents of *SequenceFile(s) and
+    AlignmentFile(s)*: number of sequences, shortest and longest sequences,
+    distribution of sequence lengths and so on. The file names are separated
+    by spaces. All the sequence files in a current directory can be
+    specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other
+    supported formats; additionally, *DirName* corresponds to all the
+    sequence files in the current directory with any of the supported file
+    extension: *.aln, .msf, .fasta, .fta, and .pir*.
+
+    Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
+    *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
+    formats are detected by parsing the contents of *SequenceFile(s) and
+    AlignmentFile(s)*.
+
+OPTIONS
+    -a, --all
+        List all the available information.
+
+    -c, --count
+        List number of of sequences. This is default behavior.
+
+    -d, --detail *InfoLevel*
+        Level of information to print about sequences during various
+        options. Default: *1*. Possible values: *1, 2 or 3*.
+
+    -f, --frequency
+        List distribution of sequence lengths using the specified number of
+        bins or bin range specified using FrequencyBins option.
+
+        This option is ignored for input files containing only single
+        sequence.
+
+    --FrequencyBins *number | "number,number,[number,...]"*
+        This value is used with -f, --frequency option to list distribution
+        of sequence lengths using the specified number of bins or bin range.
+        Default value: *10*.
+
+        The bin range list is used to group sequence lengths into different
+        groups; It must contain values in ascending order. Examples:
+
+            100,200,300,400,500,600
+            200,400,600,800,1000
+
+        The frequency value calculated for a specific bin corresponds to all
+        the sequence lengths which are greater than the previous bin value
+        and less than or equal to the current bin value.
+
+    -h, --help
+        Print this help message.
+
+    -i, --IgnoreGaps *yes | no*
+        Ignore gaps during calculation of sequence lengths. Possible values:
+        *yes or no*. Default value: *no*.
+
+    -l, --longest
+        List information about longest sequence: ID, sequence and sequence
+        length. This option is ignored for input files containing only
+        single sequence.
+
+    -s, --shortest
+        List information about shortest sequence: ID, sequence and sequence
+        length. This option is ignored for input files containing only
+        single sequence.
+
+    --SequenceLengths
+        List information about sequence lengths.
+
+    -w, --WorkingDir *dirname*
+        Location of working directory. Default: current directory.
+
+EXAMPLES
+    To count number of sequences in sequence files, type:
+
+        % InfoSequenceFiles.pl Sample1.fasta
+        % InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir
+        % InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln
+
+    To list all available information with maximum level of available detail
+    for a sequence alignment file Sample1.msf, type:
+
+        % InfoSequenceFiles.pl -a -d 3 Sample1.msf
+
+    To list sequence length information after ignoring sequence gaps in
+    Sample1.aln file, type:
+
+        % InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes
+          Sample1.aln
+
+    To list shortest and longest sequence length information after ignoring
+    sequence gaps in Sample1.aln file, type:
+
+        % InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes
+          Sample1.aln
+
+    To list distribution of sequence lengths after ignoring sequence gaps in
+    Sample1.aln file and report the frequency distribution into 10 bins,
+    type:
+
+        % InfoSequenceFiles.pl --frequency --FrequencyBins 10
+          --IgnoreGaps Yes Sample1.aln
+
+    To list distribution of sequence lengths after ignoring sequence gaps in
+    Sample1.aln file and report the frequency distribution into specified
+    bin range, type:
+
+        % InfoSequenceFiles.pl --frequency --FrequencyBins
+          "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln
+
+AUTHOR
+    Manish Sud <msud@san.rr.com>
+
+SEE ALSO
+    AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl,
+    InfoAminoAcids.pl, InfoNucleicAcids.pl
+
+COPYRIGHT
+    Copyright (C) 2015 Manish Sud. All rights reserved.
+
+    This file is part of MayaChemTools.
+
+    MayaChemTools is free software; you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published by
+    the Free Software Foundation; either version 3 of the License, or (at
+    your option) any later version.
+