annotate mayachemtool/mayachemtools/docs/modules/txt/SequenceFileUtil.txt @ 0:68300206e90d draft default tip

Uploaded
author deepakjadmin
date Thu, 05 Nov 2015 02:41:30 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
1 NAME
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
2 SequenceFileUtil
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
3
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
4 SYNOPSIS
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
5 use SequenceFileUtil ;
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
6
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
7 use SequenceFileUtil qw(:all);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
8
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
9 DESCRIPTION
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
10 SequenceFileUtil module provides the following functions:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
11
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
12 AreSequenceLengthsIdentical, CalcuatePercentSequenceIdentity,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
13 CalculatePercentSequenceIdentityMatrix, GetLongestSequence,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
14 GetSequenceLength, GetShortestSequence, IsClustalWSequenceFile,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
15 IsGapResidue, IsMSFSequenceFile, IsPIRFastaSequenceFile,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
16 IsPearsonFastaSequenceFile, IsSupportedSequenceFile,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
17 ReadClustalWSequenceFile, ReadMSFSequenceFile, ReadPIRFastaSequenceFile,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
18 ReadPearsonFastaSequenceFile, ReadSequenceFile,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
19 RemoveSequenceAlignmentGapColumns, RemoveSequenceGaps,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
20 WritePearsonFastaSequenceFile SequenceFileUtil module provides various
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
21 methods to process sequence files and retreive appropriate information.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
22
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
23 FUNCTIONS
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
24 AreSequenceLengthsIdentical
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
25 $Status = AreSequenceLengthsIdentical($SequencesDataRef);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
26
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
27 Checks the lengths of all the sequences available in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
28 *SequencesDataRef* and returns 1 or 0 based whether lengths of all
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
29 the sequence is same.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
30
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
31 CalcuatePercentSequenceIdentity
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
32 $PercentIdentity =
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
33 AreSequenceLengthsIdenticalAreSequenceLengthsIdentical(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
34 $Sequence1, $Sequence2, [$IgnoreGaps, $Precision]);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
35
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
36 Returns percent identity between *Sequence1* and *Sequence2*.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
37 Optional arguments *IgnoreGaps* and *Precision* control handling of
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
38 gaps in sequences and precision of the returned value. By default,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
39 gaps are ignored and precision is set up to 1 decimal.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
40
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
41 CalculatePercentSequenceIdentityMatrix
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
42 $IdentityMatrixDataRef = CalculatePercentSequenceIdentityMatrix(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
43 $SequencesDataRef, [$IgnoreGaps,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
44 $Precision]);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
45
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
46 Calculate pairwise percent identity between all the sequences
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
47 available in *SequencesDataRef* and returns a reference to identity
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
48 matrix hash. Optional arguments *IgnoreGaps* and *Precision* control
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
49 handling of gaps in sequences and precision of the returned value.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
50 By default, gaps are ignored and precision is set up to 1 decimal.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
51
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
52 GetSequenceLength
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
53 $SeqquenceLength = GetSequenceLength($Sequence, [$IgnoreGaps]);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
54
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
55 Returns length of the specified sequence. Optional argument
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
56 *IgnoreGaps* controls handling of gaps. By default, gaps are
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
57 ignored.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
58
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
59 GetShortestSequence
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
60 ($ID, $Sequence, $SeqLen, $Description) = GetShortestSequence(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
61 $SequencesDataRef, [$IgnoreGaps]);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
62
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
63 Checks the lengths of all the sequences available in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
64 $SequencesDataRef and returns $ID, $Sequence, $SeqLen, and
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
65 $Description values for the shortest sequence. Optional arguments
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
66 $IgnoreGaps controls handling of gaps in sequences. By default, gaps
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
67 are ignored.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
68
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
69 GetLongestSequence
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
70 ($ID, $Sequence, $SeqLen, $Description) = GetLongestSequence(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
71 $SequencesDataRef, [$IgnoreGaps]);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
72
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
73 Checks the lengths of all the sequences available in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
74 *SequencesDataRef* and returns ID, Sequence, SeqLen, and Description
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
75 values for the longest sequence. Optional argument $*IgnoreGaps*
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
76 controls handling of gaps in sequences. By default, gaps are
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
77 ignored.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
78
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
79 IsGapResidue
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
80 $Status = AreSequenceLengthsIdentical($Residue);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
81
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
82 Returns 1 or 0 based on whether *Residue* corresponds to a gap. Any
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
83 character other than A to Z is considered a gap residue.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
84
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
85 IsSupportedSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
86 $Status = IsSupportedSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
87
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
88 Returns 1 or 0 based on whether *SequenceFile* corresponds to a
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
89 supported sequence format.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
90
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
91 IsClustalWSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
92 $Status = IsClustalWSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
93
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
94 Returns 1 or 0 based on whether *SequenceFile* corresponds to
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
95 Clustal sequence alignment format.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
96
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
97 IsPearsonFastaSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
98 $Status = IsPearsonFastaSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
99
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
100 Returns 1 or 0 based on whether *SequenceFile* corresponds to
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
101 Pearson FASTA sequence format.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
102
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
103 IsPIRFastaSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
104 $Status = IsPIRFastaSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
105
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
106 Returns 1 or 0 based on whether *SequenceFile* corresponds to PIR
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
107 FASTA sequence format.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
108
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
109 IsMSFSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
110 $Status = IsClustalWSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
111
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
112 Returns 1 or 0 based on whether *SequenceFile* corresponds to MSF
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
113 sequence alignment format.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
114
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
115 ReadSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
116 $SequenceDataMapRef = ReadSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
117
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
118 Reads *SequenceFile* and returns reference to a hash containing
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
119 following key/value pairs:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
120
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
121 $SequenceDataMapRef->{IDs} - Array of sequence IDs
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
122 $SequenceDataMapRef->{Count} - Number of sequences
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
123 $SequenceDataMapRef->{Description}{$ID} - Sequence description
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
124 $SequenceDataMapRef->{Sequence}{$ID} - Sequence for a specific ID
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
125 $SequenceDataMapRef->{Sequence}{InputFileType} - File format
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
126
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
127 ReadClustalWSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
128 $SequenceDataMapRef = ReadClustalWSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
129
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
130 Reads ClustalW *SequenceFile* and returns reference to a hash
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
131 containing following key/value pairs as describes in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
132 ReadSequenceFile method.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
133
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
134 ReadMSFSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
135 $SequenceDataMapRef = ReadMSFSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
136
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
137 Reads MSF *SequenceFile* and returns reference to a hash containing
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
138 following key/value pairs as describes in ReadSequenceFile method.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
139
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
140 ReadPIRFastaSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
141 $SequenceDataMapRef = ReadPIRFastaSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
142
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
143 Reads PIR FASTA *SequenceFile* and returns reference to a hash
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
144 containing following key/value pairs as describes in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
145 ReadSequenceFile method.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
146
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
147 ReadPearsonFastaSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
148 $SequenceDataMapRef = ReadPearsonFastaSequenceFile($SequenceFile);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
149
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
150 Reads Pearson FASTA *SequenceFile* and returns reference to a hash
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
151 containing following key/value pairs as describes in
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
152 ReadSequenceFile method.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
153
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
154 RemoveSequenceGaps
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
155 $SeqWithoutGaps = RemoveSequenceGaps($Sequence);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
156
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
157 Removes gaps from *Sequence* and return a sequence without any gaps.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
158
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
159 RemoveSequenceAlignmentGapColumns
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
160 $NewAlignmentDataMapRef = RemoveSequenceAlignmentGapColumns(
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
161 $AlignmentDataMapRef);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
162
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
163 Using input alignment data map ref containing following keys,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
164 generate a new hash with same set of keys after residue columns
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
165 containg only gaps have been removed:
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
166
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
167 {IDs} : Array of IDs in order as they appear in file
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
168 {Count}: ID count
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
169 {Description}{$ID} : Description data
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
170 {Sequence}{$ID} : Sequence data
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
171
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
172 WritePearsonFastaSequenceFile
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
173 WritePearsonFastaSequenceFile($SequenceFileName, $SequenceDataRef,
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
174 [$MaxLength]);
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
175
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
176 Using sequence data specified via *SequenceDataRef*, write out a
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
177 Pearson FASTA sequence file. Optional argument *MaxLength* controls
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
178 maximum length sequence in each line; default is 80.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
179
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
180 AUTHOR
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
181 Manish Sud <msud@san.rr.com>
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
182
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
183 SEE ALSO
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
184 PDBFileUtil.pm
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
185
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
186 COPYRIGHT
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
187 Copyright (C) 2015 Manish Sud. All rights reserved.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
188
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
189 This file is part of MayaChemTools.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
190
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
191 MayaChemTools is free software; you can redistribute it and/or modify it
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
192 under the terms of the GNU Lesser General Public License as published by
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
193 the Free Software Foundation; either version 3 of the License, or (at
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
194 your option) any later version.
68300206e90d Uploaded
deepakjadmin
parents:
diff changeset
195