annotate docs/modules/txt/SequenceFileUtil.txt @ 3:90ea638ce878 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:11:59 -0500
parents 2abf0d43254d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
2 SequenceFileUtil
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
3
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
4 SYNOPSIS
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
5 use SequenceFileUtil ;
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
6
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
7 use SequenceFileUtil qw(:all);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
8
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
9 DESCRIPTION
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
10 SequenceFileUtil module provides the following functions:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
11
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
12 AreSequenceLengthsIdentical, CalcuatePercentSequenceIdentity,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
13 CalculatePercentSequenceIdentityMatrix, GetLongestSequence,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
14 GetSequenceLength, GetShortestSequence, IsClustalWSequenceFile,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
15 IsGapResidue, IsMSFSequenceFile, IsPIRFastaSequenceFile,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
16 IsPearsonFastaSequenceFile, IsSupportedSequenceFile,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
17 ReadClustalWSequenceFile, ReadMSFSequenceFile, ReadPIRFastaSequenceFile,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
18 ReadPearsonFastaSequenceFile, ReadSequenceFile,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
19 RemoveSequenceAlignmentGapColumns, RemoveSequenceGaps,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
20 WritePearsonFastaSequenceFile SequenceFileUtil module provides various
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
21 methods to process sequence files and retreive appropriate information.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
22
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
23 FUNCTIONS
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
24 AreSequenceLengthsIdentical
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
25 $Status = AreSequenceLengthsIdentical($SequencesDataRef);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
26
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
27 Checks the lengths of all the sequences available in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
28 *SequencesDataRef* and returns 1 or 0 based whether lengths of all
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
29 the sequence is same.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
30
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
31 CalcuatePercentSequenceIdentity
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
32 $PercentIdentity =
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
33 AreSequenceLengthsIdenticalAreSequenceLengthsIdentical(
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
34 $Sequence1, $Sequence2, [$IgnoreGaps, $Precision]);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
35
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
36 Returns percent identity between *Sequence1* and *Sequence2*.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
37 Optional arguments *IgnoreGaps* and *Precision* control handling of
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
38 gaps in sequences and precision of the returned value. By default,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
39 gaps are ignored and precision is set up to 1 decimal.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
40
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
41 CalculatePercentSequenceIdentityMatrix
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
42 $IdentityMatrixDataRef = CalculatePercentSequenceIdentityMatrix(
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
43 $SequencesDataRef, [$IgnoreGaps,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
44 $Precision]);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
45
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
46 Calculate pairwise percent identity between all the sequences
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
47 available in *SequencesDataRef* and returns a reference to identity
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
48 matrix hash. Optional arguments *IgnoreGaps* and *Precision* control
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
49 handling of gaps in sequences and precision of the returned value.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
50 By default, gaps are ignored and precision is set up to 1 decimal.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
51
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
52 GetSequenceLength
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
53 $SeqquenceLength = GetSequenceLength($Sequence, [$IgnoreGaps]);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
54
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
55 Returns length of the specified sequence. Optional argument
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
56 *IgnoreGaps* controls handling of gaps. By default, gaps are
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
57 ignored.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
58
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
59 GetShortestSequence
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
60 ($ID, $Sequence, $SeqLen, $Description) = GetShortestSequence(
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
61 $SequencesDataRef, [$IgnoreGaps]);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
62
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
63 Checks the lengths of all the sequences available in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
64 $SequencesDataRef and returns $ID, $Sequence, $SeqLen, and
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
65 $Description values for the shortest sequence. Optional arguments
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
66 $IgnoreGaps controls handling of gaps in sequences. By default, gaps
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
67 are ignored.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
68
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
69 GetLongestSequence
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
70 ($ID, $Sequence, $SeqLen, $Description) = GetLongestSequence(
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
71 $SequencesDataRef, [$IgnoreGaps]);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
72
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
73 Checks the lengths of all the sequences available in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
74 *SequencesDataRef* and returns ID, Sequence, SeqLen, and Description
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
75 values for the longest sequence. Optional argument $*IgnoreGaps*
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
76 controls handling of gaps in sequences. By default, gaps are
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
77 ignored.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
78
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
79 IsGapResidue
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
80 $Status = AreSequenceLengthsIdentical($Residue);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
81
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
82 Returns 1 or 0 based on whether *Residue* corresponds to a gap. Any
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
83 character other than A to Z is considered a gap residue.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
84
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
85 IsSupportedSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
86 $Status = IsSupportedSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
87
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
88 Returns 1 or 0 based on whether *SequenceFile* corresponds to a
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
89 supported sequence format.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
90
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
91 IsClustalWSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
92 $Status = IsClustalWSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
93
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
94 Returns 1 or 0 based on whether *SequenceFile* corresponds to
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
95 Clustal sequence alignment format.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
96
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
97 IsPearsonFastaSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
98 $Status = IsPearsonFastaSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
99
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
100 Returns 1 or 0 based on whether *SequenceFile* corresponds to
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
101 Pearson FASTA sequence format.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
102
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
103 IsPIRFastaSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
104 $Status = IsPIRFastaSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
105
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
106 Returns 1 or 0 based on whether *SequenceFile* corresponds to PIR
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
107 FASTA sequence format.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
108
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
109 IsMSFSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
110 $Status = IsClustalWSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
111
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
112 Returns 1 or 0 based on whether *SequenceFile* corresponds to MSF
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
113 sequence alignment format.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
114
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
115 ReadSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
116 $SequenceDataMapRef = ReadSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
117
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
118 Reads *SequenceFile* and returns reference to a hash containing
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
119 following key/value pairs:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
120
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
121 $SequenceDataMapRef->{IDs} - Array of sequence IDs
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
122 $SequenceDataMapRef->{Count} - Number of sequences
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
123 $SequenceDataMapRef->{Description}{$ID} - Sequence description
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
124 $SequenceDataMapRef->{Sequence}{$ID} - Sequence for a specific ID
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
125 $SequenceDataMapRef->{Sequence}{InputFileType} - File format
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
126
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
127 ReadClustalWSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
128 $SequenceDataMapRef = ReadClustalWSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
129
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
130 Reads ClustalW *SequenceFile* and returns reference to a hash
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
131 containing following key/value pairs as describes in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
132 ReadSequenceFile method.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
133
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
134 ReadMSFSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
135 $SequenceDataMapRef = ReadMSFSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
136
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
137 Reads MSF *SequenceFile* and returns reference to a hash containing
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
138 following key/value pairs as describes in ReadSequenceFile method.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
139
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
140 ReadPIRFastaSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
141 $SequenceDataMapRef = ReadPIRFastaSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
142
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
143 Reads PIR FASTA *SequenceFile* and returns reference to a hash
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
144 containing following key/value pairs as describes in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
145 ReadSequenceFile method.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
146
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
147 ReadPearsonFastaSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
148 $SequenceDataMapRef = ReadPearsonFastaSequenceFile($SequenceFile);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
149
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
150 Reads Pearson FASTA *SequenceFile* and returns reference to a hash
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
151 containing following key/value pairs as describes in
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
152 ReadSequenceFile method.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
153
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
154 RemoveSequenceGaps
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
155 $SeqWithoutGaps = RemoveSequenceGaps($Sequence);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
156
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
157 Removes gaps from *Sequence* and return a sequence without any gaps.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
158
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
159 RemoveSequenceAlignmentGapColumns
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
160 $NewAlignmentDataMapRef = RemoveSequenceAlignmentGapColumns(
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
161 $AlignmentDataMapRef);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
162
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
163 Using input alignment data map ref containing following keys,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
164 generate a new hash with same set of keys after residue columns
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
165 containg only gaps have been removed:
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
166
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
167 {IDs} : Array of IDs in order as they appear in file
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
168 {Count}: ID count
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
169 {Description}{$ID} : Description data
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
170 {Sequence}{$ID} : Sequence data
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
171
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
172 WritePearsonFastaSequenceFile
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
173 WritePearsonFastaSequenceFile($SequenceFileName, $SequenceDataRef,
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
174 [$MaxLength]);
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
175
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
176 Using sequence data specified via *SequenceDataRef*, write out a
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
177 Pearson FASTA sequence file. Optional argument *MaxLength* controls
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
178 maximum length sequence in each line; default is 80.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
179
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
180 AUTHOR
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
181 Manish Sud <msud@san.rr.com>
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
182
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
183 SEE ALSO
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
184 PDBFileUtil.pm
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
185
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
186 COPYRIGHT
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
187 Copyright (C) 2015 Manish Sud. All rights reserved.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
188
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
189 This file is part of MayaChemTools.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
190
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
191 MayaChemTools is free software; you can redistribute it and/or modify it
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
192 under the terms of the GNU Lesser General Public License as published by
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
193 the Free Software Foundation; either version 3 of the License, or (at
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
194 your option) any later version.
2abf0d43254d Uploaded
deepakjadmin
parents:
diff changeset
195