Mercurial > repos > deepakjadmin > mayatool3_test3
view mayachemtools/docs/modules/man3/SequenceFileUtil.3 @ 3:e420415a1799 draft
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 12:16:47 -0500 |
parents | 73ae111cf86f |
children |
line wrap: on
line source
.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.22) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "SEQUENCEFILEUTIL 1" .TH SEQUENCEFILEUTIL 1 "2015-03-29" "perl v5.14.2" "MayaChemTools" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" SequenceFileUtil .SH "SYNOPSIS" .IX Header "SYNOPSIS" use SequenceFileUtil ; .PP use SequenceFileUtil qw(:all); .SH "DESCRIPTION" .IX Header "DESCRIPTION" \&\fBSequenceFileUtil\fR module provides the following functions: .PP AreSequenceLengthsIdentical, CalcuatePercentSequenceIdentity, CalculatePercentSequenceIdentityMatrix, GetLongestSequence, GetSequenceLength, GetShortestSequence, IsClustalWSequenceFile, IsGapResidue, IsMSFSequenceFile, IsPIRFastaSequenceFile, IsPearsonFastaSequenceFile, IsSupportedSequenceFile, ReadClustalWSequenceFile, ReadMSFSequenceFile, ReadPIRFastaSequenceFile, ReadPearsonFastaSequenceFile, ReadSequenceFile, RemoveSequenceAlignmentGapColumns, RemoveSequenceGaps, WritePearsonFastaSequenceFile SequenceFileUtil module provides various methods to process sequence files and retreive appropriate information. .SH "FUNCTIONS" .IX Header "FUNCTIONS" .IP "\fBAreSequenceLengthsIdentical\fR" 4 .IX Item "AreSequenceLengthsIdentical" .Vb 1 \& $Status = AreSequenceLengthsIdentical($SequencesDataRef); .Ve .Sp Checks the lengths of all the sequences available in \fISequencesDataRef\fR and returns 1 or 0 based whether lengths of all the sequence is same. .IP "\fBCalcuatePercentSequenceIdentity\fR" 4 .IX Item "CalcuatePercentSequenceIdentity" .Vb 3 \& $PercentIdentity = \& AreSequenceLengthsIdenticalAreSequenceLengthsIdentical( \& $Sequence1, $Sequence2, [$IgnoreGaps, $Precision]); .Ve .Sp Returns percent identity between \fISequence1\fR and \fISequence2\fR. Optional arguments \&\fIIgnoreGaps\fR and \fIPrecision\fR control handling of gaps in sequences and precision of the returned value. By default, gaps are ignored and precision is set up to 1 decimal. .IP "\fBCalculatePercentSequenceIdentityMatrix\fR" 4 .IX Item "CalculatePercentSequenceIdentityMatrix" .Vb 3 \& $IdentityMatrixDataRef = CalculatePercentSequenceIdentityMatrix( \& $SequencesDataRef, [$IgnoreGaps, \& $Precision]); .Ve .Sp Calculate pairwise percent identity between all the sequences available in \fISequencesDataRef\fR and returns a reference to identity matrix hash. Optional arguments \fIIgnoreGaps\fR and \&\fIPrecision\fR control handling of gaps in sequences and precision of the returned value. By default, gaps are ignored and precision is set up to 1 decimal. .IP "\fBGetSequenceLength\fR" 4 .IX Item "GetSequenceLength" .Vb 1 \& $SeqquenceLength = GetSequenceLength($Sequence, [$IgnoreGaps]); .Ve .Sp Returns length of the specified sequence. Optional argument \fIIgnoreGaps\fR controls handling of gaps. By default, gaps are ignored. .IP "\fBGetShortestSequence\fR" 4 .IX Item "GetShortestSequence" .Vb 2 \& ($ID, $Sequence, $SeqLen, $Description) = GetShortestSequence( \& $SequencesDataRef, [$IgnoreGaps]); .Ve .Sp Checks the lengths of all the sequences available in \f(CW$SequencesDataRef\fR and returns \f(CW$ID\fR, \&\f(CW$Sequence\fR, \f(CW$SeqLen\fR, and \f(CW$Description\fR values for the shortest sequence. Optional arguments \f(CW$IgnoreGaps\fR controls handling of gaps in sequences. By default, gaps are ignored. .IP "\fBGetLongestSequence\fR" 4 .IX Item "GetLongestSequence" .Vb 2 \& ($ID, $Sequence, $SeqLen, $Description) = GetLongestSequence( \& $SequencesDataRef, [$IgnoreGaps]); .Ve .Sp Checks the lengths of all the sequences available in \fISequencesDataRef\fR and returns \fB\s-1ID\s0\fR, \&\fBSequence\fR, \fBSeqLen\fR, and \fBDescription\fR values for the longest sequence. Optional argument $\fIIgnoreGaps\fR controls handling of gaps in sequences. By default, gaps are ignored. .IP "\fBIsGapResidue\fR" 4 .IX Item "IsGapResidue" .Vb 1 \& $Status = AreSequenceLengthsIdentical($Residue); .Ve .Sp Returns 1 or 0 based on whether \fIResidue\fR corresponds to a gap. Any character other than A to Z is considered a gap residue. .IP "\fBIsSupportedSequenceFile\fR" 4 .IX Item "IsSupportedSequenceFile" .Vb 1 \& $Status = IsSupportedSequenceFile($SequenceFile); .Ve .Sp Returns 1 or 0 based on whether \fISequenceFile\fR corresponds to a supported sequence format. .IP "\fBIsClustalWSequenceFile\fR" 4 .IX Item "IsClustalWSequenceFile" .Vb 1 \& $Status = IsClustalWSequenceFile($SequenceFile); .Ve .Sp Returns 1 or 0 based on whether \fISequenceFile\fR corresponds to Clustal sequence alignment format. .IP "\fBIsPearsonFastaSequenceFile\fR" 4 .IX Item "IsPearsonFastaSequenceFile" .Vb 1 \& $Status = IsPearsonFastaSequenceFile($SequenceFile); .Ve .Sp Returns 1 or 0 based on whether \fISequenceFile\fR corresponds to Pearson \s-1FASTA\s0 sequence format. .IP "\fBIsPIRFastaSequenceFile\fR" 4 .IX Item "IsPIRFastaSequenceFile" .Vb 1 \& $Status = IsPIRFastaSequenceFile($SequenceFile); .Ve .Sp Returns 1 or 0 based on whether \fISequenceFile\fR corresponds to \s-1PIR\s0 \s-1FASTA\s0 sequence format. .IP "\fBIsMSFSequenceFile\fR" 4 .IX Item "IsMSFSequenceFile" .Vb 1 \& $Status = IsClustalWSequenceFile($SequenceFile); .Ve .Sp Returns 1 or 0 based on whether \fISequenceFile\fR corresponds to \s-1MSF\s0 sequence alignment format. .IP "\fBReadSequenceFile\fR" 4 .IX Item "ReadSequenceFile" .Vb 1 \& $SequenceDataMapRef = ReadSequenceFile($SequenceFile); .Ve .Sp Reads \fISequenceFile\fR and returns reference to a hash containing following key/value pairs: .Sp .Vb 5 \& $SequenceDataMapRef\->{IDs} \- Array of sequence IDs \& $SequenceDataMapRef\->{Count} \- Number of sequences \& $SequenceDataMapRef\->{Description}{$ID} \- Sequence description \& $SequenceDataMapRef\->{Sequence}{$ID} \- Sequence for a specific ID \& $SequenceDataMapRef\->{Sequence}{InputFileType} \- File format .Ve .IP "\fBReadClustalWSequenceFile\fR" 4 .IX Item "ReadClustalWSequenceFile" .Vb 1 \& $SequenceDataMapRef = ReadClustalWSequenceFile($SequenceFile); .Ve .Sp Reads ClustalW \fISequenceFile\fR and returns reference to a hash containing following key/value pairs as describes in \fBReadSequenceFile\fR method. .IP "\fBReadMSFSequenceFile\fR" 4 .IX Item "ReadMSFSequenceFile" .Vb 1 \& $SequenceDataMapRef = ReadMSFSequenceFile($SequenceFile); .Ve .Sp Reads \s-1MSF\s0 \fISequenceFile\fR and returns reference to a hash containing following key/value pairs as describes in \fBReadSequenceFile\fR method. .IP "\fBReadPIRFastaSequenceFile\fR" 4 .IX Item "ReadPIRFastaSequenceFile" .Vb 1 \& $SequenceDataMapRef = ReadPIRFastaSequenceFile($SequenceFile); .Ve .Sp Reads \s-1PIR\s0 \s-1FASTA\s0 \fISequenceFile\fR and returns reference to a hash containing following key/value pairs as describes in \fBReadSequenceFile\fR method. .IP "\fBReadPearsonFastaSequenceFile\fR" 4 .IX Item "ReadPearsonFastaSequenceFile" .Vb 1 \& $SequenceDataMapRef = ReadPearsonFastaSequenceFile($SequenceFile); .Ve .Sp Reads Pearson \s-1FASTA\s0 \fISequenceFile\fR and returns reference to a hash containing following key/value pairs as describes in \fBReadSequenceFile\fR method. .IP "\fBRemoveSequenceGaps\fR" 4 .IX Item "RemoveSequenceGaps" .Vb 1 \& $SeqWithoutGaps = RemoveSequenceGaps($Sequence); .Ve .Sp Removes gaps from \fISequence\fR and return a sequence without any gaps. .IP "\fBRemoveSequenceAlignmentGapColumns\fR" 4 .IX Item "RemoveSequenceAlignmentGapColumns" .Vb 2 \& $NewAlignmentDataMapRef = RemoveSequenceAlignmentGapColumns( \& $AlignmentDataMapRef); .Ve .Sp Using input alignment data map ref containing following keys, generate a new hash with same set of keys after residue columns containg only gaps have been removed: .Sp .Vb 4 \& {IDs} : Array of IDs in order as they appear in file \& {Count}: ID count \& {Description}{$ID} : Description data \& {Sequence}{$ID} : Sequence data .Ve .IP "\fBWritePearsonFastaSequenceFile\fR" 4 .IX Item "WritePearsonFastaSequenceFile" .Vb 2 \& WritePearsonFastaSequenceFile($SequenceFileName, $SequenceDataRef, \& [$MaxLength]); .Ve .Sp Using sequence data specified via \fISequenceDataRef\fR, write out a Pearson \s-1FASTA\s0 sequence file. Optional argument \fIMaxLength\fR controls maximum length sequence in each line; default is 80. .SH "AUTHOR" .IX Header "AUTHOR" Manish Sud <msud@san.rr.com> .SH "SEE ALSO" .IX Header "SEE ALSO" PDBFileUtil.pm .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright (C) 2015 Manish Sud. All rights reserved. .PP This file is part of MayaChemTools. .PP MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the \s-1GNU\s0 Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.