annotate mayachemtool/mayachemtools/docs/scripts/txt/InfoSequenceFiles.txt @ 0:68300206e90d draft default tip

Uploaded
author deepakjadmin
date Thu, 05 Nov 2015 02:41:30 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
2 InfoSequenceFiles.pl - List information about sequence and alignment
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
3 files
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
4
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
5 SYNOPSIS
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
6 InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)...
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
7
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
8 InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
9 [-f, --frequency] [--FrequencyBins number | "number, number,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
10 [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
11 [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname]
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
12 SequenceFile(s)...
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
13
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
14 DESCRIPTION
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
15 List information about contents of *SequenceFile(s) and
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
16 AlignmentFile(s)*: number of sequences, shortest and longest sequences,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
17 distribution of sequence lengths and so on. The file names are separated
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
18 by spaces. All the sequence files in a current directory can be
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
19 specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
20 supported formats; additionally, *DirName* corresponds to all the
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
21 sequence files in the current directory with any of the supported file
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
22 extension: *.aln, .msf, .fasta, .fta, and .pir*.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
23
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
24 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
25 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
26 formats are detected by parsing the contents of *SequenceFile(s) and
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
27 AlignmentFile(s)*.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
28
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
29 OPTIONS
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
30 -a, --all
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
31 List all the available information.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
32
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
33 -c, --count
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
34 List number of of sequences. This is default behavior.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
35
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
36 -d, --detail *InfoLevel*
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
37 Level of information to print about sequences during various
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
38 options. Default: *1*. Possible values: *1, 2 or 3*.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
39
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
40 -f, --frequency
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
41 List distribution of sequence lengths using the specified number of
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
42 bins or bin range specified using FrequencyBins option.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
43
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
44 This option is ignored for input files containing only single
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
45 sequence.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
46
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
47 --FrequencyBins *number | "number,number,[number,...]"*
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
48 This value is used with -f, --frequency option to list distribution
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
49 of sequence lengths using the specified number of bins or bin range.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
50 Default value: *10*.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
51
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
52 The bin range list is used to group sequence lengths into different
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
53 groups; It must contain values in ascending order. Examples:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
54
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
55 100,200,300,400,500,600
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
56 200,400,600,800,1000
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
57
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
58 The frequency value calculated for a specific bin corresponds to all
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
59 the sequence lengths which are greater than the previous bin value
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
60 and less than or equal to the current bin value.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
61
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
62 -h, --help
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
63 Print this help message.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
64
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
65 -i, --IgnoreGaps *yes | no*
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
66 Ignore gaps during calculation of sequence lengths. Possible values:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
67 *yes or no*. Default value: *no*.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
68
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
69 -l, --longest
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
70 List information about longest sequence: ID, sequence and sequence
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
71 length. This option is ignored for input files containing only
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
72 single sequence.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
73
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
74 -s, --shortest
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
75 List information about shortest sequence: ID, sequence and sequence
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
76 length. This option is ignored for input files containing only
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
77 single sequence.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
78
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
79 --SequenceLengths
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
80 List information about sequence lengths.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
81
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
82 -w, --WorkingDir *dirname*
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
83 Location of working directory. Default: current directory.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
84
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
85 EXAMPLES
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
86 To count number of sequences in sequence files, type:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
87
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
88 % InfoSequenceFiles.pl Sample1.fasta
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
89 % InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
90 % InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
91
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
92 To list all available information with maximum level of available detail
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
93 for a sequence alignment file Sample1.msf, type:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
94
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
95 % InfoSequenceFiles.pl -a -d 3 Sample1.msf
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
96
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
97 To list sequence length information after ignoring sequence gaps in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
98 Sample1.aln file, type:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
99
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
100 % InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
101 Sample1.aln
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
102
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
103 To list shortest and longest sequence length information after ignoring
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
104 sequence gaps in Sample1.aln file, type:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
105
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
106 % InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
107 Sample1.aln
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
108
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
109 To list distribution of sequence lengths after ignoring sequence gaps in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
110 Sample1.aln file and report the frequency distribution into 10 bins,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
111 type:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
112
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
113 % InfoSequenceFiles.pl --frequency --FrequencyBins 10
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
114 --IgnoreGaps Yes Sample1.aln
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
115
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
116 To list distribution of sequence lengths after ignoring sequence gaps in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
117 Sample1.aln file and report the frequency distribution into specified
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
118 bin range, type:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
119
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
120 % InfoSequenceFiles.pl --frequency --FrequencyBins
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
121 "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
122
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
123 AUTHOR
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
124 Manish Sud <msud@san.rr.com>
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
125
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
126 SEE ALSO
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
127 AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
128 InfoAminoAcids.pl, InfoNucleicAcids.pl
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
129
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
130 COPYRIGHT
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
131 Copyright (C) 2015 Manish Sud. All rights reserved.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
132
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
133 This file is part of MayaChemTools.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
134
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
135 MayaChemTools is free software; you can redistribute it and/or modify it
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
136 under the terms of the GNU Lesser General Public License as published by
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
137 the Free Software Foundation; either version 3 of the License, or (at
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
138 your option) any later version.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
139