Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/InfoSequenceFiles.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4816e4a8ae95 |
---|---|
1 NAME | |
2 InfoSequenceFiles.pl - List information about sequence and alignment | |
3 files | |
4 | |
5 SYNOPSIS | |
6 InfoSequenceFiles.pl SequenceFile(s) AlignmentFile(s)... | |
7 | |
8 InfoSequenceFiles.pl [-a, --all] [-c, --count] [-d, --detail infolevel] | |
9 [-f, --frequency] [--FrequencyBins number | "number, number, | |
10 [number,...]"] [-h, --help] [-i, --IgnoreGaps yes | no] [-l, --longest] | |
11 [-s, --shortest] [--SequenceLengths] [-w, --workingdir dirname] | |
12 SequenceFile(s)... | |
13 | |
14 DESCRIPTION | |
15 List information about contents of *SequenceFile(s) and | |
16 AlignmentFile(s)*: number of sequences, shortest and longest sequences, | |
17 distribution of sequence lengths and so on. The file names are separated | |
18 by spaces. All the sequence files in a current directory can be | |
19 specified by **.aln*, **.msf*, **.fasta*, **.fta*, **.pir* or any other | |
20 supported formats; additionally, *DirName* corresponds to all the | |
21 sequence files in the current directory with any of the supported file | |
22 extension: *.aln, .msf, .fasta, .fta, and .pir*. | |
23 | |
24 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, | |
25 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file | |
26 formats are detected by parsing the contents of *SequenceFile(s) and | |
27 AlignmentFile(s)*. | |
28 | |
29 OPTIONS | |
30 -a, --all | |
31 List all the available information. | |
32 | |
33 -c, --count | |
34 List number of of sequences. This is default behavior. | |
35 | |
36 -d, --detail *InfoLevel* | |
37 Level of information to print about sequences during various | |
38 options. Default: *1*. Possible values: *1, 2 or 3*. | |
39 | |
40 -f, --frequency | |
41 List distribution of sequence lengths using the specified number of | |
42 bins or bin range specified using FrequencyBins option. | |
43 | |
44 This option is ignored for input files containing only single | |
45 sequence. | |
46 | |
47 --FrequencyBins *number | "number,number,[number,...]"* | |
48 This value is used with -f, --frequency option to list distribution | |
49 of sequence lengths using the specified number of bins or bin range. | |
50 Default value: *10*. | |
51 | |
52 The bin range list is used to group sequence lengths into different | |
53 groups; It must contain values in ascending order. Examples: | |
54 | |
55 100,200,300,400,500,600 | |
56 200,400,600,800,1000 | |
57 | |
58 The frequency value calculated for a specific bin corresponds to all | |
59 the sequence lengths which are greater than the previous bin value | |
60 and less than or equal to the current bin value. | |
61 | |
62 -h, --help | |
63 Print this help message. | |
64 | |
65 -i, --IgnoreGaps *yes | no* | |
66 Ignore gaps during calculation of sequence lengths. Possible values: | |
67 *yes or no*. Default value: *no*. | |
68 | |
69 -l, --longest | |
70 List information about longest sequence: ID, sequence and sequence | |
71 length. This option is ignored for input files containing only | |
72 single sequence. | |
73 | |
74 -s, --shortest | |
75 List information about shortest sequence: ID, sequence and sequence | |
76 length. This option is ignored for input files containing only | |
77 single sequence. | |
78 | |
79 --SequenceLengths | |
80 List information about sequence lengths. | |
81 | |
82 -w, --WorkingDir *dirname* | |
83 Location of working directory. Default: current directory. | |
84 | |
85 EXAMPLES | |
86 To count number of sequences in sequence files, type: | |
87 | |
88 % InfoSequenceFiles.pl Sample1.fasta | |
89 % InfoSequenceFiles.pl Sample1.msf Sample1.aln Sample1.pir | |
90 % InfoSequenceFiles.pl *.fasta *.fta *.msf *.pir *.aln | |
91 | |
92 To list all available information with maximum level of available detail | |
93 for a sequence alignment file Sample1.msf, type: | |
94 | |
95 % InfoSequenceFiles.pl -a -d 3 Sample1.msf | |
96 | |
97 To list sequence length information after ignoring sequence gaps in | |
98 Sample1.aln file, type: | |
99 | |
100 % InfoSequenceFiles.pl --SequenceLengths --IgnoreGaps Yes | |
101 Sample1.aln | |
102 | |
103 To list shortest and longest sequence length information after ignoring | |
104 sequence gaps in Sample1.aln file, type: | |
105 | |
106 % InfoSequenceFiles.pl --longest --shortest --IgnoreGaps Yes | |
107 Sample1.aln | |
108 | |
109 To list distribution of sequence lengths after ignoring sequence gaps in | |
110 Sample1.aln file and report the frequency distribution into 10 bins, | |
111 type: | |
112 | |
113 % InfoSequenceFiles.pl --frequency --FrequencyBins 10 | |
114 --IgnoreGaps Yes Sample1.aln | |
115 | |
116 To list distribution of sequence lengths after ignoring sequence gaps in | |
117 Sample1.aln file and report the frequency distribution into specified | |
118 bin range, type: | |
119 | |
120 % InfoSequenceFiles.pl --frequency --FrequencyBins | |
121 "150,200,250,300,350" --IgnoreGaps Yes Sample1.aln | |
122 | |
123 AUTHOR | |
124 Manish Sud <msud@san.rr.com> | |
125 | |
126 SEE ALSO | |
127 AnalyzeSequenceFilesData.pl, ExtractFromSequenceFiles.pl, | |
128 InfoAminoAcids.pl, InfoNucleicAcids.pl | |
129 | |
130 COPYRIGHT | |
131 Copyright (C) 2015 Manish Sud. All rights reserved. | |
132 | |
133 This file is part of MayaChemTools. | |
134 | |
135 MayaChemTools is free software; you can redistribute it and/or modify it | |
136 under the terms of the GNU Lesser General Public License as published by | |
137 the Free Software Foundation; either version 3 of the License, or (at | |
138 your option) any later version. | |
139 |