comparison mayachemtools/docs/scripts/html/AnalyzeSequenceFilesData.html @ 0:73ae111cf86f draft

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 11:55:01 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:73ae111cf86f
1 <html>
2 <head>
3 <title>MayaChemTools:Documentation:AnalyzeSequenceFilesData.pl</title>
4 <meta http-equiv="content-type" content="text/html;charset=utf-8">
5 <link rel="stylesheet" type="text/css" href="../../css/MayaChemTools.css">
6 </head>
7 <body leftmargin="20" rightmargin="20" topmargin="10" bottommargin="10">
8 <br/>
9 <center>
10 <a href="http://www.mayachemtools.org" title="MayaChemTools Home"><img src="../../images/MayaChemToolsLogo.gif" border="0" alt="MayaChemTools"></a>
11 </center>
12 <br/>
13 <div class="DocNav">
14 <table width="100%" border=0 cellpadding=0 cellspacing=2>
15 <tr align="left" valign="top"><td width="33%" align="left"><a href="./AnalyzeSDFilesData.html" title="AnalyzeSDFilesData.html">Previous</a>&nbsp;&nbsp;<a href="./index.html" title="Table of Contents">TOC</a>&nbsp;&nbsp;<a href="./AnalyzeTextFilesData.html" title="AnalyzeTextFilesData.html">Next</a></td><td width="34%" align="middle"><strong>AnalyzeSequenceFilesData.pl</strong></td><td width="33%" align="right"><a href="././code/AnalyzeSequenceFilesData.html" title="View source code">Code</a>&nbsp;|&nbsp;<a href="./../pdf/AnalyzeSequenceFilesData.pdf" title="PDF US Letter Size">PDF</a>&nbsp;|&nbsp;<a href="./../pdfgreen/AnalyzeSequenceFilesData.pdf" title="PDF US Letter Size with narrow margins: www.changethemargins.com">PDFGreen</a>&nbsp;|&nbsp;<a href="./../pdfa4/AnalyzeSequenceFilesData.pdf" title="PDF A4 Size">PDFA4</a>&nbsp;|&nbsp;<a href="./../pdfa4green/AnalyzeSequenceFilesData.pdf" title="PDF A4 Size with narrow margins: www.changethemargins.com">PDFA4Green</a></td></tr>
16 </table>
17 </div>
18 <p>
19 </p>
20 <h2>NAME</h2>
21 <p>AnalyzeSequenceFilesData.pl - Analyze sequence and alignment files</p>
22 <p>
23 </p>
24 <h2>SYNOPSIS</h2>
25 <p>AnalyzeSequenceFilesData.pl SequenceFile(s) AlignmentFile(s)...</p>
26 <p>AnalyzeSequenceFilesData.pl [<strong>-h, --help</strong>] [<strong>-i, --IgnoreGaps</strong> yes | no]
27 [<strong>-m, --mode</strong> PercentIdentityMatrix | ResidueFrequencyAnalysis | All]
28 [<strong>--outdelim</strong> comma | tab | semicolon] [<strong>-o, --overwrite</strong>] [<strong>-p, --precision</strong> number] [<strong>-q, --quote</strong> yes | no]
29 [<strong>--ReferenceSequence</strong> SequenceID | UseFirstSequenceID]
30 [<strong>--region</strong> &quot;StartResNum, EndResNum, [StartResNum, EndResNum...]&quot; | UseCompleteSequence]
31 [<strong>--RegionResiduesMode</strong> AminoAcids | NucleicAcids | None]
32 [<strong>-w, --WorkingDir</strong> dirname] SequenceFile(s) AlignmentFile(s)...</p>
33 <p>
34 </p>
35 <h2>DESCRIPTION</h2>
36 <p>Analyze <em>SequenceFile(s) and AlignmentFile(s)</em> data: calculate pairwise percent identity matrix or
37 calculate percent occurrence of various residues in specified sequence regions. All the sequences
38 in the input file must have the same sequence lengths; otherwise, the sequence file is ignored.</p>
39 <p>The file names are separated by spaces. All the sequence files in a current directory can
40 be specified by <em>*.aln</em>, <em>*.msf</em>, <em>*.fasta</em>, <em>*.fta</em>, <em>*.pir</em> or any other supported
41 formats; additionally, <em>DirName</em> corresponds to all the sequence files in the current directory
42 with any of the supported file extension: <em>.aln, .msf, .fasta, .fta, and .pir</em>.</p>
43 <p>Supported sequence formats are: <em>ALN/CLustalW</em>, <em>GCG/MSF</em>, <em>PILEUP/MSF</em>, <em>Pearson/FASTA</em>,
44 and <em>NBRF/PIR</em>. Instead of using file extensions, file formats are detected by parsing the contents
45 of <em>SequenceFile(s) and AlignmentFile(s)</em>.</p>
46 <p>
47 </p>
48 <h2>OPTIONS</h2>
49 <dl>
50 <dt><strong><strong>-h, --help</strong></strong></dt>
51 <dd>
52 <p>Print this help message.</p>
53 </dd>
54 <dt><strong><strong>-i, --IgnoreGaps</strong> <em>yes | no</em></strong></dt>
55 <dd>
56 <p>Ignore gaps during calculation of sequence lengths and specification of regions during residue
57 frequency analysis. Possible values: <em>yes or no</em>. Default value: <em>yes</em>.</p>
58 </dd>
59 <dt><strong><strong>-m, --mode</strong> <em>PercentIdentityMatrix | ResidueFrequencyAnalysis | All</em></strong></dt>
60 <dd>
61 <p>Specify how to analyze data in sequence files: calculate percent identity matrix or calculate
62 frequency of occurrence of residues in specific regions. During <em>ResidueFrequencyAnalysis</em> value
63 of <strong>-m, --mode</strong> option, output files are generated for both the residue count and percent residue
64 count. Possible values: <em>PercentIdentityMatrix, ResidueFrequencyAnalysis, or All</em>. Default value:
65 <em>PercentIdentityMatrix</em>.</p>
66 </dd>
67 <dt><strong><strong>--outdelim</strong> <em>comma | tab | semicolon</em></strong></dt>
68 <dd>
69 <p>Output text file delimiter. Possible values: <em>comma, tab, or semicolon</em>.
70 Default value: <em>comma</em>.</p>
71 </dd>
72 <dt><strong><strong>-o, --overwrite</strong></strong></dt>
73 <dd>
74 <p>Overwrite existing files.</p>
75 </dd>
76 <dt><strong><strong>-p, --precision</strong> <em>number</em></strong></dt>
77 <dd>
78 <p>Precision of calculated values in the output file. Default: up to <em>2</em> decimal places.
79 Valid values: positive integers.</p>
80 </dd>
81 <dt><strong><strong>-q, --quote</strong> <em>yes | no</em></strong></dt>
82 <dd>
83 <p>Put quotes around column values in output text file. Possible values: <em>yes or
84 no</em>. Default value: <em>yes</em>.</p>
85 </dd>
86 <dt><strong><strong>--ReferenceSequence</strong> <em>SequenceID | UseFirstSequenceID</em></strong></dt>
87 <dd>
88 <p>Specify reference sequence ID to identify regions for performing <em>ResidueFrequencyAnalysis</em> specified
89 using <strong>-m, --mode</strong> option. Default: <em>UseFirstSequenceID</em>.</p>
90 </dd>
91 <dt><strong><strong>--region</strong> <em>StartResNum,EndResNum,[StartResNum,EndResNum...] | UseCompleteSequence</em></strong></dt>
92 <dd>
93 <p>Specify how to perform frequency of occurrence analysis for residues: use specific regions
94 indicated by starting and ending residue numbers in reference sequence or use the whole reference
95 sequence as one region. Default: <em>UseCompleteSequence</em>.</p>
96 <p>Based on the value of <strong>-i, --IgnoreGaps</strong> option, specified residue numbers <em>StartResNum,EndResNum</em>
97 correspond to the positions in the reference sequence without gaps or with gaps.</p>
98 <p>For residue numbers corresponding to the reference sequence including gaps, percent occurrence
99 of various residues corresponding to gap position in reference sequence is also calculated.</p>
100 </dd>
101 <dt><strong><strong>--RegionResiduesMode</strong> <em>AminoAcids | NucleicAcids | None</em></strong></dt>
102 <dd>
103 <p>Specify how to process residues in the regions specified using <strong>--region</strong> option during
104 <em>ResidueFrequencyAnalysis</em> calculation: categorize residues as amino acids, nucleic acids, or simply
105 ignore residue category during the calculation. Possible values: <em>AminoAcids, NucleicAcids or None</em>.
106 Default value: <em>None</em>.</p>
107 <p>For <em>AminoAcids</em> or <em>NucleicAcids</em> values of <strong>--RegionResiduesMode</strong> option, all the standard amino
108 acids or nucleic acids are listed in the output file for each region; Any gaps and other non standard residues
109 are added to the list as encountered.</p>
110 <p>For <em>None</em> value of <strong>--RegionResiduesMode</strong> option, no assumption is made about type of residues.
111 Residue and gaps are added to the list as encountered.</p>
112 </dd>
113 <dt><strong><strong>-r, --root</strong> <em>rootname</em></strong></dt>
114 <dd>
115 <p>New sequence file name is generated using the root: &lt;Root&gt;&lt;Mode&gt;.&lt;Ext&gt; and
116 &lt;Root&gt;&lt;Mode&gt;&lt;RegionNum&gt;.&lt;Ext&gt;. Default new file
117 name: &lt;SequenceFileName&gt;&lt;Mode&gt;.&lt;Ext&gt; for <em>PercentIdentityMatrix</em> value <strong>m, --mode</strong> option
118 and &lt;SequenceFileName&gt;&lt;Mode&gt;&lt;RegionNum&gt;.&lt;Ext&gt; for <em>ResidueFrequencyAnalysis</em>.
119 The csv, and tsv &lt;Ext&gt; values are used for comma/semicolon, and tab delimited text
120 files respectively. This option is ignored for multiple input files.</p>
121 </dd>
122 <dt><strong><strong>-w --WorkingDir</strong> <em>text</em></strong></dt>
123 <dd>
124 <p>Location of working directory. Default: current directory.</p>
125 </dd>
126 </dl>
127 <p>
128 </p>
129 <h2>EXAMPLES</h2>
130 <p>To calculate percent identity matrix for all sequences in Sample1.msf file and generate
131 Sample1PercentIdentityMatrix.csv, type:</p>
132 <div class="ExampleBox">
133 % AnalyzeSequenceFilesData.pl Sample1.msf</div>
134 <p>To perform residue frequency analysis for all sequences in Sample1.aln file corresponding to
135 non-gap positions in the first sequence and generate Sample1ResidueFrequencyAnalysisRegion1.csv
136 and Sample1PercentResidueFrequencyAnalysisRegion1.csv files, type:</p>
137 <div class="ExampleBox">
138 % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis -o
139 Sample1.aln</div>
140 <p>To perform residue frequency analysis for all sequences in Sample1.aln file corresponding to
141 all positions in the first sequence and generate TestResidueFrequencyAnalysisRegion1.csv
142 and TestPercentResidueFrequencyAnalysisRegion1.csv files, type:</p>
143 <div class="ExampleBox">
144 % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis --IgnoreGaps
145 No -o -r Test Sample1.aln</div>
146 <p>To perform residue frequency analysis for all sequences in Sample1.aln file corresponding to
147 non-gap residue positions 5 to 10, and 30 to 40 in sequence ACHE_BOVIN and generate
148 Sample1ResidueFrequencyAnalysisRegion1.csv, Sample1ResidueFrequencyAnalysisRegion2.csv,
149 SamplePercentResidueFrequencyAnalysisRegion1.csv, and
150 SamplePercentResidueFrequencyAnalysisRegion2.csv files, type:</p>
151 <div class="ExampleBox">
152 % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis
153 --ReferenceSequence ACHE_BOVIN --region &quot;5,15,30,40&quot; -o Sample1.msf</div>
154 <p>
155 </p>
156 <h2>AUTHOR</h2>
157 <p><a href="mailto:msud@san.rr.com">Manish Sud</a></p>
158 <p>
159 </p>
160 <h2>SEE ALSO</h2>
161 <p><a href="./ExtractFromSequenceFiles.html">ExtractFromSequenceFiles.pl</a>,&nbsp<a href="./InfoSequenceFiles.html">InfoSequenceFiles.pl</a>
162 </p>
163 <p>
164 </p>
165 <h2>COPYRIGHT</h2>
166 <p>Copyright (C) 2015 Manish Sud. All rights reserved.</p>
167 <p>This file is part of MayaChemTools.</p>
168 <p>MayaChemTools is free software; you can redistribute it and/or modify it under
169 the terms of the GNU Lesser General Public License as published by the Free
170 Software Foundation; either version 3 of the License, or (at your option)
171 any later version.</p>
172 <p>&nbsp</p><p>&nbsp</p><div class="DocNav">
173 <table width="100%" border=0 cellpadding=0 cellspacing=2>
174 <tr align="left" valign="top"><td width="33%" align="left"><a href="./AnalyzeSDFilesData.html" title="AnalyzeSDFilesData.html">Previous</a>&nbsp;&nbsp;<a href="./index.html" title="Table of Contents">TOC</a>&nbsp;&nbsp;<a href="./AnalyzeTextFilesData.html" title="AnalyzeTextFilesData.html">Next</a></td><td width="34%" align="middle"><strong>March 29, 2015</strong></td><td width="33%" align="right"><strong>AnalyzeSequenceFilesData.pl</strong></td></tr>
175 </table>
176 </div>
177 <br />
178 <center>
179 <img src="../../images/h2o2.png">
180 </center>
181 </body>
182 </html>