annotate docs/scripts/txt/InfoSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
2 InfoSequenceFiles.pl - List information about sequence and alignment
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
3 files
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
4
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
5 SYNOPSIS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
6 InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
7
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
8 InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
9 [-f, --frequency] [--FrequencyBins number | "number, number,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
10 [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
11 [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname]
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
12 SequenceFile(s)...
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
13
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
14 DESCRIPTION
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
15 List information about contents of *SequenceFile(s) and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
16 AlignmentFile(s)*: number of sequences, shortest and longest sequences,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
17 distribution of sequence lengths and so on. The file names are separated
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
18 by spaces. All the sequence files in a current directory can be
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
19 specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
20 supported formats; additionally, *DirName* corresponds to all the
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
21 sequence files in the current directory with any of the supported file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
22 extension: *.aln, .msf, .fasta, .fta, and .pir*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
23
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
24 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
25 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
26 formats are detected by parsing the contents of *SequenceFile(s) and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
27 AlignmentFile(s)*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
28
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
29 OPTIONS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
30 -a, --all
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
31 List all the available information.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
32
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
33 -c, --count
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
34 List number of of sequences. This is default behavior.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
35
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
36 -d, --detail *InfoLevel*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
37 Level of information to print about sequences during various
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
38 options. Default: *1*. Possible values: *1, 2 or 3*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
39
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
40 -f, --frequency
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
41 List distribution of sequence lengths using the specified number of
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
42 bins or bin range specified using FrequencyBins option.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
43
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
44 This option is ignored for input files containing only single
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
45 sequence.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
46
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
47 --FrequencyBins *number | "number,number,[number,...]"*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
48 This value is used with -f, --frequency option to list distribution
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
49 of sequence lengths using the specified number of bins or bin range.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
50 Default value: *10*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
51
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
52 The bin range list is used to group sequence lengths into different
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
53 groups; It must contain values in ascending order. Examples:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
54
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
55 100,200,300,400,500,600
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
56 200,400,600,800,1000
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
57
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
58 The frequency value calculated for a specific bin corresponds to all
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
59 the sequence lengths which are greater than the previous bin value
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
60 and less than or equal to the current bin value.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
61
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
62 -h, --help
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
63 Print this help message.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
64
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
65 -i, --IgnoreGaps *yes | no*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
66 Ignore gaps during calculation of sequence lengths. Possible values:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
67 *yes or no*. Default value: *no*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
68
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
69 -l, --longest
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
70 List information about longest sequence: ID, sequence and sequence
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
71 length. This option is ignored for input files containing only
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
72 single sequence.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
73
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
74 -s, --shortest
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
75 List information about shortest sequence: ID, sequence and sequence
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
76 length. This option is ignored for input files containing only
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
77 single sequence.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
78
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
79 --SequenceLengths
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
80 List information about sequence lengths.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
81
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
82 -w, --WorkingDir *dirname*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
83 Location of working directory. Default: current directory.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
84
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
85 EXAMPLES
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
86 To count number of sequences in sequence files, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
87
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
88 % InfoSequenceFiles.pl Sample1.fasta
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
89 % InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
90 % InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
91
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
92 To list all available information with maximum level of available detail
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
93 for a sequence alignment file Sample1.msf, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
94
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
95 % InfoSequenceFiles.pl -a -d 3 Sample1.msf
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
96
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
97 To list sequence length information after ignoring sequence gaps in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
98 Sample1.aln file, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
99
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
100 % InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
101 Sample1.aln
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
102
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
103 To list shortest and longest sequence length information after ignoring
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
104 sequence gaps in Sample1.aln file, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
105
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
106 % InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
107 Sample1.aln
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
108
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
109 To list distribution of sequence lengths after ignoring sequence gaps in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
110 Sample1.aln file and report the frequency distribution into 10 bins,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
111 type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
112
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
113 % InfoSequenceFiles.pl --frequency --FrequencyBins 10
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
114 --IgnoreGaps Yes Sample1.aln
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
115
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
116 To list distribution of sequence lengths after ignoring sequence gaps in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
117 Sample1.aln file and report the frequency distribution into specified
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
118 bin range, type:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
119
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
120 % InfoSequenceFiles.pl --frequency --FrequencyBins
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
121 "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
122
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
123 AUTHOR
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
124 Manish Sud <msud@san.rr.com>
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
125
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
126 SEE ALSO
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
127 AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
128 InfoAminoAcids.pl, InfoNucleicAcids.pl
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
129
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
130 COPYRIGHT
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
131 Copyright (C) 2015 Manish Sud. All rights reserved.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
132
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
133 This file is part of MayaChemTools.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
134
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
135 MayaChemTools is free software; you can redistribute it and/or modify it
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
136 under the terms of the GNU Lesser General Public License as published by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
137 the Free Software Foundation; either version 3 of the License, or (at
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
138 your option) any later version.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
139