annotate docs/scripts/txt/InfoSequenceFiles.txt @ 3:90ea638ce878 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:11:59 -0500
parents 2abf0d43254d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
2 InfoSequenceFiles.pl - List information about sequence and alignment
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
3 files
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
4
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
5 SYNOPSIS
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
6 InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
7
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
8 InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel]
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
9 [-f, --frequency] [--FrequencyBins number | "number, number,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
10 [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest]
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
11 [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname]
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
12 SequenceFile(s)...
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
13
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
14 DESCRIPTION
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
15 List information about contents of *SequenceFile(s) and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
16 AlignmentFile(s)*: number of sequences, shortest and longest sequences,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
17 distribution of sequence lengths and so on. The file names are separated
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
18 by spaces. All the sequence files in a current directory can be
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
19 specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
20 supported formats; additionally, *DirName* corresponds to all the
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
21 sequence files in the current directory with any of the supported file
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
22 extension: *.aln, .msf, .fasta, .fta, and .pir*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
23
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
24 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
25 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
26 formats are detected by parsing the contents of *SequenceFile(s) and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
27 AlignmentFile(s)*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
28
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
29 OPTIONS
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
30 -a, --all
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
31 List all the available information.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
32
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
33 -c, --count
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
34 List number of of sequences. This is default behavior.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
35
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
36 -d, --detail *InfoLevel*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
37 Level of information to print about sequences during various
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
38 options. Default: *1*. Possible values: *1, 2 or 3*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
39
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
40 -f, --frequency
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
41 List distribution of sequence lengths using the specified number of
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
42 bins or bin range specified using FrequencyBins option.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
43
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
44 This option is ignored for input files containing only single
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
45 sequence.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
46
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
47 --FrequencyBins *number | "number,number,[number,...]"*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
48 This value is used with -f, --frequency option to list distribution
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
49 of sequence lengths using the specified number of bins or bin range.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
50 Default value: *10*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
51
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
52 The bin range list is used to group sequence lengths into different
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
53 groups; It must contain values in ascending order. Examples:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
54
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
55 100,200,300,400,500,600
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
56 200,400,600,800,1000
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
57
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
58 The frequency value calculated for a specific bin corresponds to all
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
59 the sequence lengths which are greater than the previous bin value
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
60 and less than or equal to the current bin value.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
61
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
62 -h, --help
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
63 Print this help message.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
64
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
65 -i, --IgnoreGaps *yes | no*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
66 Ignore gaps during calculation of sequence lengths. Possible values:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
67 *yes or no*. Default value: *no*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
68
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
69 -l, --longest
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
70 List information about longest sequence: ID, sequence and sequence
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
71 length. This option is ignored for input files containing only
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
72 single sequence.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
73
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
74 -s, --shortest
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
75 List information about shortest sequence: ID, sequence and sequence
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
76 length. This option is ignored for input files containing only
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
77 single sequence.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
78
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
79 --SequenceLengths
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
80 List information about sequence lengths.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
81
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
82 -w, --WorkingDir *dirname*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
83 Location of working directory. Default: current directory.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
84
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
85 EXAMPLES
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
86 To count number of sequences in sequence files, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
87
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
88 % InfoSequenceFiles.pl Sample1.fasta
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
89 % InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
90 % InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
91
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
92 To list all available information with maximum level of available detail
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
93 for a sequence alignment file Sample1.msf, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
94
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
95 % InfoSequenceFiles.pl -a -d 3 Sample1.msf
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
96
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
97 To list sequence length information after ignoring sequence gaps in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
98 Sample1.aln file, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
99
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
100 % InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
101 Sample1.aln
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
102
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
103 To list shortest and longest sequence length information after ignoring
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
104 sequence gaps in Sample1.aln file, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
105
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
106 % InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
107 Sample1.aln
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
108
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
109 To list distribution of sequence lengths after ignoring sequence gaps in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
110 Sample1.aln file and report the frequency distribution into 10 bins,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
111 type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
112
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
113 % InfoSequenceFiles.pl --frequency --FrequencyBins 10
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
114 --IgnoreGaps Yes Sample1.aln
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
115
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
116 To list distribution of sequence lengths after ignoring sequence gaps in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
117 Sample1.aln file and report the frequency distribution into specified
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
118 bin range, type:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
119
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
120 % InfoSequenceFiles.pl --frequency --FrequencyBins
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
121 "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
122
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
123 AUTHOR
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
124 Manish Sud <msud@san.rr.com>
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
125
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
126 SEE ALSO
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
127 AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
128 InfoAminoAcids.pl, InfoNucleicAcids.pl
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
129
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
130 COPYRIGHT
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
131 Copyright (C) 2015 Manish Sud. All rights reserved.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
132
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
133 This file is part of MayaChemTools.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
134
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
135 MayaChemTools is free software; you can redistribute it and/or modify it
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
136 under the terms of the GNU Lesser General Public License as published by
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
137 the Free Software Foundation; either version 3 of the License, or (at
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
138 your option) any later version.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
139