annotate docs/modules/txt/SequenceFileUtil.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
2 SequenceFileUtil
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
3
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
4 SYNOPSIS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
5 use SequenceFileUtil ;
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
6
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
7 use SequenceFileUtil qw(:all);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
8
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
9 DESCRIPTION
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
10 SequenceFileUtil module provides the following functions:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
11
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
12 AreSequenceLengthsIdentical, CalcuatePercentSequenceIdentity,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
13 CalculatePercentSequenceIdentityMatrix, GetLongestSequence,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
14 GetSequenceLength, GetShortestSequence, IsClustalWSequenceFile,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
15 IsGapResidue, IsMSFSequenceFile, IsPIRFastaSequenceFile,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
16 IsPearsonFastaSequenceFile, IsSupportedSequenceFile,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
17 ReadClustalWSequenceFile, ReadMSFSequenceFile, ReadPIRFastaSequenceFile,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
18 ReadPearsonFastaSequenceFile, ReadSequenceFile,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
19 RemoveSequenceAlignmentGapColumns, RemoveSequenceGaps,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
20 WritePearsonFastaSequenceFile SequenceFileUtil module provides various
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
21 methods to process sequence files and retreive appropriate information.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
22
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
23 FUNCTIONS
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
24 AreSequenceLengthsIdentical
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
25 $Status = AreSequenceLengthsIdentical($SequencesDataRef);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
26
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
27 Checks the lengths of all the sequences available in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
28 *SequencesDataRef* and returns 1 or 0 based whether lengths of all
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
29 the sequence is same.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
30
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
31 CalcuatePercentSequenceIdentity
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
32 $PercentIdentity =
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
33 AreSequenceLengthsIdenticalAreSequenceLengthsIdentical(
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
34 $Sequence1, $Sequence2, [$IgnoreGaps, $Precision]);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
35
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
36 Returns percent identity between *Sequence1* and *Sequence2*.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
37 Optional arguments *IgnoreGaps* and *Precision* control handling of
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
38 gaps in sequences and precision of the returned value. By default,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
39 gaps are ignored and precision is set up to 1 decimal.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
40
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
41 CalculatePercentSequenceIdentityMatrix
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
42 $IdentityMatrixDataRef = CalculatePercentSequenceIdentityMatrix(
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
43 $SequencesDataRef, [$IgnoreGaps,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
44 $Precision]);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
45
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
46 Calculate pairwise percent identity between all the sequences
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
47 available in *SequencesDataRef* and returns a reference to identity
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
48 matrix hash. Optional arguments *IgnoreGaps* and *Precision* control
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
49 handling of gaps in sequences and precision of the returned value.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
50 By default, gaps are ignored and precision is set up to 1 decimal.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
51
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
52 GetSequenceLength
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
53 $SeqquenceLength = GetSequenceLength($Sequence, [$IgnoreGaps]);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
54
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
55 Returns length of the specified sequence. Optional argument
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
56 *IgnoreGaps* controls handling of gaps. By default, gaps are
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
57 ignored.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
58
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
59 GetShortestSequence
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
60 ($ID, $Sequence, $SeqLen, $Description) = GetShortestSequence(
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
61 $SequencesDataRef, [$IgnoreGaps]);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
62
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
63 Checks the lengths of all the sequences available in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
64 $SequencesDataRef and returns $ID, $Sequence, $SeqLen, and
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
65 $Description values for the shortest sequence. Optional arguments
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
66 $IgnoreGaps controls handling of gaps in sequences. By default, gaps
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
67 are ignored.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
68
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
69 GetLongestSequence
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
70 ($ID, $Sequence, $SeqLen, $Description) = GetLongestSequence(
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
71 $SequencesDataRef, [$IgnoreGaps]);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
72
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
73 Checks the lengths of all the sequences available in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
74 *SequencesDataRef* and returns ID, Sequence, SeqLen, and Description
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
75 values for the longest sequence. Optional argument $*IgnoreGaps*
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
76 controls handling of gaps in sequences. By default, gaps are
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
77 ignored.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
78
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
79 IsGapResidue
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
80 $Status = AreSequenceLengthsIdentical($Residue);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
81
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
82 Returns 1 or 0 based on whether *Residue* corresponds to a gap. Any
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
83 character other than A to Z is considered a gap residue.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
84
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
85 IsSupportedSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
86 $Status = IsSupportedSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
87
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
88 Returns 1 or 0 based on whether *SequenceFile* corresponds to a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
89 supported sequence format.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
90
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
91 IsClustalWSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
92 $Status = IsClustalWSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
93
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
94 Returns 1 or 0 based on whether *SequenceFile* corresponds to
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
95 Clustal sequence alignment format.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
96
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
97 IsPearsonFastaSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
98 $Status = IsPearsonFastaSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
99
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
100 Returns 1 or 0 based on whether *SequenceFile* corresponds to
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
101 Pearson FASTA sequence format.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
102
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
103 IsPIRFastaSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
104 $Status = IsPIRFastaSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
105
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
106 Returns 1 or 0 based on whether *SequenceFile* corresponds to PIR
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
107 FASTA sequence format.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
108
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
109 IsMSFSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
110 $Status = IsClustalWSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
111
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
112 Returns 1 or 0 based on whether *SequenceFile* corresponds to MSF
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
113 sequence alignment format.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
114
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
115 ReadSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
116 $SequenceDataMapRef = ReadSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
117
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
118 Reads *SequenceFile* and returns reference to a hash containing
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
119 following key/value pairs:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
120
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
121 $SequenceDataMapRef->{IDs} - Array of sequence IDs
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
122 $SequenceDataMapRef->{Count} - Number of sequences
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
123 $SequenceDataMapRef->{Description}{$ID} - Sequence description
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
124 $SequenceDataMapRef->{Sequence}{$ID} - Sequence for a specific ID
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
125 $SequenceDataMapRef->{Sequence}{InputFileType} - File format
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
126
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
127 ReadClustalWSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
128 $SequenceDataMapRef = ReadClustalWSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
129
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
130 Reads ClustalW *SequenceFile* and returns reference to a hash
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
131 containing following key/value pairs as describes in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
132 ReadSequenceFile method.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
133
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
134 ReadMSFSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
135 $SequenceDataMapRef = ReadMSFSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
136
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
137 Reads MSF *SequenceFile* and returns reference to a hash containing
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
138 following key/value pairs as describes in ReadSequenceFile method.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
139
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
140 ReadPIRFastaSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
141 $SequenceDataMapRef = ReadPIRFastaSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
142
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
143 Reads PIR FASTA *SequenceFile* and returns reference to a hash
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
144 containing following key/value pairs as describes in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
145 ReadSequenceFile method.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
146
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
147 ReadPearsonFastaSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
148 $SequenceDataMapRef = ReadPearsonFastaSequenceFile($SequenceFile);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
149
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
150 Reads Pearson FASTA *SequenceFile* and returns reference to a hash
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
151 containing following key/value pairs as describes in
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
152 ReadSequenceFile method.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
153
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
154 RemoveSequenceGaps
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
155 $SeqWithoutGaps = RemoveSequenceGaps($Sequence);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
156
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
157 Removes gaps from *Sequence* and return a sequence without any gaps.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
158
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
159 RemoveSequenceAlignmentGapColumns
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
160 $NewAlignmentDataMapRef = RemoveSequenceAlignmentGapColumns(
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
161 $AlignmentDataMapRef);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
162
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
163 Using input alignment data map ref containing following keys,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
164 generate a new hash with same set of keys after residue columns
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
165 containg only gaps have been removed:
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
166
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
167 {IDs} : Array of IDs in order as they appear in file
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
168 {Count}: ID count
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
169 {Description}{$ID} : Description data
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
170 {Sequence}{$ID} : Sequence data
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
171
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
172 WritePearsonFastaSequenceFile
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
173 WritePearsonFastaSequenceFile($SequenceFileName, $SequenceDataRef,
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
174 [$MaxLength]);
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
175
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
176 Using sequence data specified via *SequenceDataRef*, write out a
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
177 Pearson FASTA sequence file. Optional argument *MaxLength* controls
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
178 maximum length sequence in each line; default is 80.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
179
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
180 AUTHOR
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
181 Manish Sud <msud@san.rr.com>
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
182
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
183 SEE ALSO
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
184 PDBFileUtil.pm
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
185
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
186 COPYRIGHT
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
187 Copyright (C) 2015 Manish Sud. All rights reserved.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
188
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
189 This file is part of MayaChemTools.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
190
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
191 MayaChemTools is free software; you can redistribute it and/or modify it
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
192 under the terms of the GNU Lesser General Public License as published by
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
193 the Free Software Foundation; either version 3 of the License, or (at
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
194 your option) any later version.
4816e4a8ae95 Uploaded
deepakjadmin
parents:
diff changeset
195