comparison docs/scripts/txt/InfoSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 InfoSequenceFiles.pl - List information about sequence and alignment
3 files
4
5 SYNOPSIS
6 InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
7
8 InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel]
9 [-f, --frequency] [--FrequencyBins number | "number, number,
10 [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest]
11 [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname]
12 SequenceFile(s)...
13
14 DESCRIPTION
15 List information about contents of *SequenceFile(s) and
16 AlignmentFile(s)*: number of sequences, shortest and longest sequences,
17 distribution of sequence lengths and so on. The file names are separated
18 by spaces. All the sequence files in a current directory can be
19 specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other
20 supported formats; additionally, *DirName* corresponds to all the
21 sequence files in the current directory with any of the supported file
22 extension: *.aln, .msf, .fasta, .fta, and .pir*.
23
24 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
25 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
26 formats are detected by parsing the contents of *SequenceFile(s) and
27 AlignmentFile(s)*.
28
29 OPTIONS
30 -a, --all
31 List all the available information.
32
33 -c, --count
34 List number of of sequences. This is default behavior.
35
36 -d, --detail *InfoLevel*
37 Level of information to print about sequences during various
38 options. Default: *1*. Possible values: *1, 2 or 3*.
39
40 -f, --frequency
41 List distribution of sequence lengths using the specified number of
42 bins or bin range specified using FrequencyBins option.
43
44 This option is ignored for input files containing only single
45 sequence.
46
47 --FrequencyBins *number | "number,number,[number,...]"*
48 This value is used with -f, --frequency option to list distribution
49 of sequence lengths using the specified number of bins or bin range.
50 Default value: *10*.
51
52 The bin range list is used to group sequence lengths into different
53 groups; It must contain values in ascending order. Examples:
54
55 100,200,300,400,500,600
56 200,400,600,800,1000
57
58 The frequency value calculated for a specific bin corresponds to all
59 the sequence lengths which are greater than the previous bin value
60 and less than or equal to the current bin value.
61
62 -h, --help
63 Print this help message.
64
65 -i, --IgnoreGaps *yes | no*
66 Ignore gaps during calculation of sequence lengths. Possible values:
67 *yes or no*. Default value: *no*.
68
69 -l, --longest
70 List information about longest sequence: ID, sequence and sequence
71 length. This option is ignored for input files containing only
72 single sequence.
73
74 -s, --shortest
75 List information about shortest sequence: ID, sequence and sequence
76 length. This option is ignored for input files containing only
77 single sequence.
78
79 --SequenceLengths
80 List information about sequence lengths.
81
82 -w, --WorkingDir *dirname*
83 Location of working directory. Default: current directory.
84
85 EXAMPLES
86 To count number of sequences in sequence files, type:
87
88 % InfoSequenceFiles.pl Sample1.fasta
89 % InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir
90 % InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln
91
92 To list all available information with maximum level of available detail
93 for a sequence alignment file Sample1.msf, type:
94
95 % InfoSequenceFiles.pl -a -d 3 Sample1.msf
96
97 To list sequence length information after ignoring sequence gaps in
98 Sample1.aln file, type:
99
100 % InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes
101 Sample1.aln
102
103 To list shortest and longest sequence length information after ignoring
104 sequence gaps in Sample1.aln file, type:
105
106 % InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes
107 Sample1.aln
108
109 To list distribution of sequence lengths after ignoring sequence gaps in
110 Sample1.aln file and report the frequency distribution into 10 bins,
111 type:
112
113 % InfoSequenceFiles.pl --frequency --FrequencyBins 10
114 --IgnoreGaps Yes Sample1.aln
115
116 To list distribution of sequence lengths after ignoring sequence gaps in
117 Sample1.aln file and report the frequency distribution into specified
118 bin range, type:
119
120 % InfoSequenceFiles.pl --frequency --FrequencyBins
121 "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln
122
123 AUTHOR
124 Manish Sud <msud@san.rr.com>
125
126 SEE ALSO
127 AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl,
128 InfoAminoAcids.pl, InfoNucleicAcids.pl
129
130 COPYRIGHT
131 Copyright (C) 2015 Manish Sud. All rights reserved.
132
133 This file is part of MayaChemTools.
134
135 MayaChemTools is free software; you can redistribute it and/or modify it
136 under the terms of the GNU Lesser General Public License as published by
137 the Free Software Foundation; either version 3 of the License, or (at
138 your option) any later version.
139