| 
0
 | 
     1 <html>
 | 
| 
 | 
     2 <head>
 | 
| 
 | 
     3 <title>MayaChemTools:Documentation:SequenceFileUtil.pm</title>
 | 
| 
 | 
     4 <meta http-equiv="content-type" content="text/html;charset=utf-8">
 | 
| 
 | 
     5 <link rel="stylesheet" type="text/css" href="../../css/MayaChemTools.css">
 | 
| 
 | 
     6 </head>
 | 
| 
 | 
     7 <body leftmargin="20" rightmargin="20" topmargin="10" bottommargin="10">
 | 
| 
 | 
     8 <br/>
 | 
| 
 | 
     9 <center>
 | 
| 
 | 
    10 <a href="http://www.mayachemtools.org" title="MayaChemTools Home"><img src="../../images/MayaChemToolsLogo.gif" border="0" alt="MayaChemTools"></a>
 | 
| 
 | 
    11 </center>
 | 
| 
 | 
    12 <br/>
 | 
| 
 | 
    13 <div class="DocNav">
 | 
| 
 | 
    14 <table width="100%" border=0 cellpadding=0 cellspacing=2>
 | 
| 
 | 
    15 <tr align="left" valign="top"><td width="33%" align="left"><a href="./SDFileUtil.html" title="SDFileUtil.html">Previous</a>  <a href="./index.html" title="Table of Contents">TOC</a>  <a href="./StatisticsUtil.html" title="StatisticsUtil.html">Next</a></td><td width="34%" align="middle"><strong>SequenceFileUtil.pm</strong></td><td width="33%" align="right"><a href="././code/SequenceFileUtil.html" title="View source code">Code</a> | <a href="./../pdf/SequenceFileUtil.pdf" title="PDF US Letter Size">PDF</a> | <a href="./../pdfgreen/SequenceFileUtil.pdf" title="PDF US Letter Size with narrow margins: www.changethemargins.com">PDFGreen</a> | <a href="./../pdfa4/SequenceFileUtil.pdf" title="PDF A4 Size">PDFA4</a> | <a href="./../pdfa4green/SequenceFileUtil.pdf" title="PDF A4 Size with narrow margins: www.changethemargins.com">PDFA4Green</a></td></tr>
 | 
| 
 | 
    16 </table>
 | 
| 
 | 
    17 </div>
 | 
| 
 | 
    18 <p>
 | 
| 
 | 
    19 </p>
 | 
| 
 | 
    20 <h2>NAME</h2>
 | 
| 
 | 
    21 <p>SequenceFileUtil</p>
 | 
| 
 | 
    22 <p>
 | 
| 
 | 
    23 </p>
 | 
| 
 | 
    24 <h2>SYNOPSIS</h2>
 | 
| 
 | 
    25 <p>use SequenceFileUtil ;</p>
 | 
| 
 | 
    26 <p>use SequenceFileUtil qw(:all);</p>
 | 
| 
 | 
    27 <p>
 | 
| 
 | 
    28 </p>
 | 
| 
 | 
    29 <h2>DESCRIPTION</h2>
 | 
| 
 | 
    30 <p><strong>SequenceFileUtil</strong> module provides the following functions:</p>
 | 
| 
 | 
    31 <p> <a href="#aresequencelengthsidentical">AreSequenceLengthsIdentical</a>, <a href="#calcuatepercentsequenceidentity">CalcuatePercentSequenceIdentity</a>
 | 
| 
 | 
    32 , <a href="#calculatepercentsequenceidentitymatrix">CalculatePercentSequenceIdentityMatrix</a>, <a href="#getlongestsequence">GetLongestSequence</a>, <a href="#getsequencelength">GetSequenceLength</a>
 | 
| 
 | 
    33 , <a href="#getshortestsequence">GetShortestSequence</a>, <a href="#isclustalwsequencefile">IsClustalWSequenceFile</a>, <a href="#isgapresidue">IsGapResidue</a>, <a href="#ismsfsequencefile">IsMSFSequenceFile</a>
 | 
| 
 | 
    34 , <a href="#ispirfastasequencefile">IsPIRFastaSequenceFile</a>, <a href="#ispearsonfastasequencefile">IsPearsonFastaSequenceFile</a>, <a href="#issupportedsequencefile">IsSupportedSequenceFile</a>
 | 
| 
 | 
    35 , <a href="#readclustalwsequencefile">ReadClustalWSequenceFile</a>, <a href="#readmsfsequencefile">ReadMSFSequenceFile</a>, <a href="#readpirfastasequencefile">ReadPIRFastaSequenceFile</a>
 | 
| 
 | 
    36 , <a href="#readpearsonfastasequencefile">ReadPearsonFastaSequenceFile</a>, <a href="#readsequencefile">ReadSequenceFile</a>, <a href="#removesequencealignmentgapcolumns">RemoveSequenceAlignmentGapColumns</a>
 | 
| 
 | 
    37 , <a href="#removesequencegaps">RemoveSequenceGaps</a>, <a href="#writepearsonfastasequencefile">WritePearsonFastaSequenceFile</a>
 | 
| 
 | 
    38 , <a href="#sequencefileutil module provides various methods to process sequence">SequenceFileUtil module provides various methods to process sequence</a>
 | 
| 
 | 
    39 , <a href="#files and retreive appropriate information.">files and retreive appropriate information.</a>
 | 
| 
 | 
    40 </p><p>
 | 
| 
 | 
    41 </p>
 | 
| 
 | 
    42 <h2>FUNCTIONS</h2>
 | 
| 
 | 
    43 <dl>
 | 
| 
 | 
    44 <dt><strong><a name="aresequencelengthsidentical" class="item"><strong>AreSequenceLengthsIdentical</strong></a></strong></dt>
 | 
| 
 | 
    45 <dd>
 | 
| 
 | 
    46 <div class="OptionsBox">
 | 
| 
 | 
    47     $Status = AreSequenceLengthsIdentical($SequencesDataRef);</div>
 | 
| 
 | 
    48 <p>Checks the lengths of all the sequences available in <em>SequencesDataRef</em> and returns 1
 | 
| 
 | 
    49 or 0 based whether lengths of all the sequence is same.</p>
 | 
| 
 | 
    50 </dd>
 | 
| 
 | 
    51 <dt><strong><a name="calcuatepercentsequenceidentity" class="item"><strong>CalcuatePercentSequenceIdentity</strong></a></strong></dt>
 | 
| 
 | 
    52 <dd>
 | 
| 
 | 
    53 <div class="OptionsBox">
 | 
| 
 | 
    54     $PercentIdentity =
 | 
| 
 | 
    55        AreSequenceLengthsIdenticalAreSequenceLengthsIdentical(
 | 
| 
 | 
    56           $Sequence1, $Sequence2, [$IgnoreGaps, $Precision]);</div>
 | 
| 
 | 
    57 <p>Returns percent identity between <em>Sequence1</em> and <em>Sequence2</em>. Optional arguments
 | 
| 
 | 
    58 <em>IgnoreGaps</em> and <em>Precision</em> control handling of gaps in sequences and precision of the
 | 
| 
 | 
    59 returned value. By default, gaps are ignored and precision is set up to 1 decimal.</p>
 | 
| 
 | 
    60 </dd>
 | 
| 
 | 
    61 <dt><strong><a name="calculatepercentsequenceidentitymatrix" class="item"><strong>CalculatePercentSequenceIdentityMatrix</strong></a></strong></dt>
 | 
| 
 | 
    62 <dd>
 | 
| 
 | 
    63 <div class="OptionsBox">
 | 
| 
 | 
    64     $IdentityMatrixDataRef = CalculatePercentSequenceIdentityMatrix(
 | 
| 
 | 
    65                              $SequencesDataRef, [$IgnoreGaps,
 | 
| 
 | 
    66                              $Precision]);</div>
 | 
| 
 | 
    67 <p>Calculate pairwise percent identity between all the sequences available in <em>SequencesDataRef</em>
 | 
| 
 | 
    68 and returns a reference to identity matrix hash. Optional arguments <em>IgnoreGaps</em> and
 | 
| 
 | 
    69 <em>Precision</em> control handling of gaps in sequences and precision of the returned value. By default, gaps
 | 
| 
 | 
    70 are ignored and precision is set up to 1 decimal.</p>
 | 
| 
 | 
    71 </dd>
 | 
| 
 | 
    72 <dt><strong><a name="getsequencelength" class="item"><strong>GetSequenceLength</strong></a></strong></dt>
 | 
| 
 | 
    73 <dd>
 | 
| 
 | 
    74 <div class="OptionsBox">
 | 
| 
 | 
    75     $SeqquenceLength = GetSequenceLength($Sequence, [$IgnoreGaps]);</div>
 | 
| 
 | 
    76 <p>Returns length of the specified sequence. Optional argument <em>IgnoreGaps</em> controls handling
 | 
| 
 | 
    77 of gaps. By default, gaps are ignored.</p>
 | 
| 
 | 
    78 </dd>
 | 
| 
 | 
    79 <dt><strong><a name="getshortestsequence" class="item"><strong>GetShortestSequence</strong></a></strong></dt>
 | 
| 
 | 
    80 <dd>
 | 
| 
 | 
    81 <div class="OptionsBox">
 | 
| 
 | 
    82    ($ID, $Sequence, $SeqLen, $Description) = GetShortestSequence(
 | 
| 
 | 
    83           $SequencesDataRef, [$IgnoreGaps]);</div>
 | 
| 
 | 
    84 <p>Checks the lengths of all the sequences available in $SequencesDataRef and returns $ID,
 | 
| 
 | 
    85 $Sequence, $SeqLen, and $Description values for the shortest sequence. Optional arguments $IgnoreGaps
 | 
| 
 | 
    86 controls handling of gaps in sequences. By default, gaps are ignored.</p>
 | 
| 
 | 
    87 </dd>
 | 
| 
 | 
    88 <dt><strong><a name="getlongestsequence" class="item"><strong>GetLongestSequence</strong></a></strong></dt>
 | 
| 
 | 
    89 <dd>
 | 
| 
 | 
    90 <div class="OptionsBox">
 | 
| 
 | 
    91    ($ID, $Sequence, $SeqLen, $Description) = GetLongestSequence(
 | 
| 
 | 
    92           $SequencesDataRef, [$IgnoreGaps]);</div>
 | 
| 
 | 
    93 <p>Checks the lengths of all the sequences available in <em>SequencesDataRef</em> and returns <strong>ID</strong>,
 | 
| 
 | 
    94 <strong>Sequence</strong>, <strong>SeqLen</strong>, and <strong>Description</strong> values for the longest sequence. Optional argument
 | 
| 
 | 
    95 $<em>IgnoreGaps</em> controls handling of gaps in sequences. By default, gaps are ignored.</p>
 | 
| 
 | 
    96 </dd>
 | 
| 
 | 
    97 <dt><strong><a name="isgapresidue" class="item"><strong>IsGapResidue</strong></a></strong></dt>
 | 
| 
 | 
    98 <dd>
 | 
| 
 | 
    99 <div class="OptionsBox">
 | 
| 
 | 
   100     $Status = AreSequenceLengthsIdentical($Residue);</div>
 | 
| 
 | 
   101 <p>Returns 1 or 0 based on whether <em>Residue</em> corresponds to a gap. Any character other than A to Z is
 | 
| 
 | 
   102 considered a gap residue.</p>
 | 
| 
 | 
   103 </dd>
 | 
| 
 | 
   104 <dt><strong><a name="issupportedsequencefile" class="item"><strong>IsSupportedSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   105 <dd>
 | 
| 
 | 
   106 <div class="OptionsBox">
 | 
| 
 | 
   107     $Status = IsSupportedSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   108 <p>Returns 1 or 0 based on whether <em>SequenceFile</em> corresponds to a supported sequence
 | 
| 
 | 
   109 format.</p>
 | 
| 
 | 
   110 </dd>
 | 
| 
 | 
   111 <dt><strong><a name="isclustalwsequencefile" class="item"><strong>IsClustalWSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   112 <dd>
 | 
| 
 | 
   113 <div class="OptionsBox">
 | 
| 
 | 
   114     $Status = IsClustalWSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   115 <p>Returns 1 or 0 based on whether <em>SequenceFile</em> corresponds to Clustal sequence alignment
 | 
| 
 | 
   116 format.</p>
 | 
| 
 | 
   117 </dd>
 | 
| 
 | 
   118 <dt><strong><a name="ispearsonfastasequencefile" class="item"><strong>IsPearsonFastaSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   119 <dd>
 | 
| 
 | 
   120 <div class="OptionsBox">
 | 
| 
 | 
   121     $Status = IsPearsonFastaSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   122 <p>Returns 1 or 0 based on whether <em>SequenceFile</em> corresponds to Pearson FASTA sequence
 | 
| 
 | 
   123 format.</p>
 | 
| 
 | 
   124 </dd>
 | 
| 
 | 
   125 <dt><strong><a name="ispirfastasequencefile" class="item"><strong>IsPIRFastaSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   126 <dd>
 | 
| 
 | 
   127 <div class="OptionsBox">
 | 
| 
 | 
   128     $Status = IsPIRFastaSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   129 <p>Returns 1 or 0 based on whether <em>SequenceFile</em> corresponds to PIR FASTA sequence
 | 
| 
 | 
   130 format.</p>
 | 
| 
 | 
   131 </dd>
 | 
| 
 | 
   132 <dt><strong><a name="ismsfsequencefile" class="item"><strong>IsMSFSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   133 <dd>
 | 
| 
 | 
   134 <div class="OptionsBox">
 | 
| 
 | 
   135     $Status = IsClustalWSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   136 <p>Returns 1 or 0 based on whether <em>SequenceFile</em> corresponds to MSF sequence alignment
 | 
| 
 | 
   137 format.</p>
 | 
| 
 | 
   138 </dd>
 | 
| 
 | 
   139 <dt><strong><a name="readsequencefile" class="item"><strong>ReadSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   140 <dd>
 | 
| 
 | 
   141 <div class="OptionsBox">
 | 
| 
 | 
   142     $SequenceDataMapRef = ReadSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   143 <p>Reads <em>SequenceFile</em> and returns reference to a hash containing following key/value
 | 
| 
 | 
   144 pairs:</p>
 | 
| 
 | 
   145 <div class="OptionsBox">
 | 
| 
 | 
   146     $SequenceDataMapRef->{IDs} - Array of sequence IDs
 | 
| 
 | 
   147 <br/>    $SequenceDataMapRef->{Count} - Number of sequences
 | 
| 
 | 
   148 <br/>    $SequenceDataMapRef->{Description}{$ID} - Sequence description
 | 
| 
 | 
   149 <br/>    $SequenceDataMapRef->{Sequence}{$ID} - Sequence for a specific ID
 | 
| 
 | 
   150 <br/>    $SequenceDataMapRef->{Sequence}{InputFileType} - File format</div>
 | 
| 
 | 
   151 </dd>
 | 
| 
 | 
   152 <dt><strong><a name="readclustalwsequencefile" class="item"><strong>ReadClustalWSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   153 <dd>
 | 
| 
 | 
   154 <div class="OptionsBox">
 | 
| 
 | 
   155     $SequenceDataMapRef = ReadClustalWSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   156 <p>Reads ClustalW <em>SequenceFile</em> and returns reference to a hash containing following key/value
 | 
| 
 | 
   157 pairs as describes in <strong>ReadSequenceFile</strong> method.</p>
 | 
| 
 | 
   158 </dd>
 | 
| 
 | 
   159 <dt><strong><a name="readmsfsequencefile" class="item"><strong>ReadMSFSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   160 <dd>
 | 
| 
 | 
   161 <div class="OptionsBox">
 | 
| 
 | 
   162     $SequenceDataMapRef = ReadMSFSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   163 <p>Reads MSF <em>SequenceFile</em> and returns reference to a hash containing following key/value
 | 
| 
 | 
   164 pairs as describes in <strong>ReadSequenceFile</strong> method.</p>
 | 
| 
 | 
   165 </dd>
 | 
| 
 | 
   166 <dt><strong><a name="readpirfastasequencefile" class="item"><strong>ReadPIRFastaSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   167 <dd>
 | 
| 
 | 
   168 <div class="OptionsBox">
 | 
| 
 | 
   169     $SequenceDataMapRef = ReadPIRFastaSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   170 <p>Reads PIR FASTA <em>SequenceFile</em> and returns reference to a hash containing following key/value
 | 
| 
 | 
   171 pairs as describes in <strong>ReadSequenceFile</strong> method.</p>
 | 
| 
 | 
   172 </dd>
 | 
| 
 | 
   173 <dt><strong><a name="readpearsonfastasequencefile" class="item"><strong>ReadPearsonFastaSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   174 <dd>
 | 
| 
 | 
   175 <div class="OptionsBox">
 | 
| 
 | 
   176     $SequenceDataMapRef = ReadPearsonFastaSequenceFile($SequenceFile);</div>
 | 
| 
 | 
   177 <p>Reads Pearson FASTA <em>SequenceFile</em> and returns reference to a hash containing following key/value
 | 
| 
 | 
   178 pairs as describes in <strong>ReadSequenceFile</strong> method.</p>
 | 
| 
 | 
   179 </dd>
 | 
| 
 | 
   180 <dt><strong><a name="removesequencegaps" class="item"><strong>RemoveSequenceGaps</strong></a></strong></dt>
 | 
| 
 | 
   181 <dd>
 | 
| 
 | 
   182 <div class="OptionsBox">
 | 
| 
 | 
   183     $SeqWithoutGaps = RemoveSequenceGaps($Sequence);</div>
 | 
| 
 | 
   184 <p>Removes gaps from <em>Sequence</em> and return a sequence without any gaps.</p>
 | 
| 
 | 
   185 </dd>
 | 
| 
 | 
   186 <dt><strong><a name="removesequencealignmentgapcolumns" class="item"><strong>RemoveSequenceAlignmentGapColumns</strong></a></strong></dt>
 | 
| 
 | 
   187 <dd>
 | 
| 
 | 
   188 <div class="OptionsBox">
 | 
| 
 | 
   189     $NewAlignmentDataMapRef = RemoveSequenceAlignmentGapColumns(
 | 
| 
 | 
   190                               $AlignmentDataMapRef);</div>
 | 
| 
 | 
   191 <p>Using input alignment data map ref containing following keys, generate a new hash with
 | 
| 
 | 
   192 same set of keys after residue columns containg only gaps have been removed:</p>
 | 
| 
 | 
   193 <div class="OptionsBox">
 | 
| 
 | 
   194     {IDs} : Array of IDs in order as they appear in file
 | 
| 
 | 
   195 <br/>    {Count}: ID count
 | 
| 
 | 
   196 <br/>    {Description}{$ID} : Description data
 | 
| 
 | 
   197 <br/>    {Sequence}{$ID} : Sequence data</div>
 | 
| 
 | 
   198 </dd>
 | 
| 
 | 
   199 <dt><strong><a name="writepearsonfastasequencefile" class="item"><strong>WritePearsonFastaSequenceFile</strong></a></strong></dt>
 | 
| 
 | 
   200 <dd>
 | 
| 
 | 
   201 <div class="OptionsBox">
 | 
| 
 | 
   202     WritePearsonFastaSequenceFile($SequenceFileName, $SequenceDataRef,
 | 
| 
 | 
   203                                   [$MaxLength]);</div>
 | 
| 
 | 
   204 <p>Using sequence data specified via <em>SequenceDataRef</em>, write out a Pearson FASTA sequence
 | 
| 
 | 
   205 file. Optional argument <em>MaxLength</em> controls maximum length sequence in each line; default is
 | 
| 
 | 
   206 80.</p>
 | 
| 
 | 
   207 </dd>
 | 
| 
 | 
   208 </dl>
 | 
| 
 | 
   209 <p>
 | 
| 
 | 
   210 </p>
 | 
| 
 | 
   211 <h2>AUTHOR</h2>
 | 
| 
 | 
   212 <p><a href="mailto:msud@san.rr.com">Manish Sud</a></p>
 | 
| 
 | 
   213 <p>
 | 
| 
 | 
   214 </p>
 | 
| 
 | 
   215 <h2>SEE ALSO</h2>
 | 
| 
 | 
   216 <p><a href="./PDBFileUtil.html">PDBFileUtil.pm</a>
 | 
| 
 | 
   217 </p>
 | 
| 
 | 
   218 <p>
 | 
| 
 | 
   219 </p>
 | 
| 
 | 
   220 <h2>COPYRIGHT</h2>
 | 
| 
 | 
   221 <p>Copyright (C) 2015 Manish Sud. All rights reserved.</p>
 | 
| 
 | 
   222 <p>This file is part of MayaChemTools.</p>
 | 
| 
 | 
   223 <p>MayaChemTools is free software; you can redistribute it and/or modify it under
 | 
| 
 | 
   224 the terms of the GNU Lesser General Public License as published by the Free
 | 
| 
 | 
   225 Software Foundation; either version 3 of the License, or (at your option)
 | 
| 
 | 
   226 any later version.</p>
 | 
| 
 | 
   227 <p> </p><p> </p><div class="DocNav">
 | 
| 
 | 
   228 <table width="100%" border=0 cellpadding=0 cellspacing=2>
 | 
| 
 | 
   229 <tr align="left" valign="top"><td width="33%" align="left"><a href="./SDFileUtil.html" title="SDFileUtil.html">Previous</a>  <a href="./index.html" title="Table of Contents">TOC</a>  <a href="./StatisticsUtil.html" title="StatisticsUtil.html">Next</a></td><td width="34%" align="middle"><strong>March 29, 2015</strong></td><td width="33%" align="right"><strong>SequenceFileUtil.pm</strong></td></tr>
 | 
| 
 | 
   230 </table>
 | 
| 
 | 
   231 </div>
 | 
| 
 | 
   232 <br />
 | 
| 
 | 
   233 <center>
 | 
| 
 | 
   234 <img src="../../images/h2o2.png">
 | 
| 
 | 
   235 </center>
 | 
| 
 | 
   236 </body>
 | 
| 
 | 
   237 </html>
 |