Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/AnalyzeSequenceFilesData.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
| author | deepakjadmin |
|---|---|
| date | Wed, 20 Jan 2016 09:23:18 -0500 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:4816e4a8ae95 |
|---|---|
| 1 NAME | |
| 2 AnalyzeSequenceFilesData.pl - Analyze sequence and alignment files | |
| 3 | |
| 4 SYNOPSIS | |
| 5 AnalyzeSequenceFilesData.pl SequenceFile(s) AlignmentFile(s)... | |
| 6 | |
| 7 AnalyzeSequenceFilesData.pl [-h, --help] [-i, --IgnoreGaps yes | no] | |
| 8 [-m, --mode PercentIdentityMatrix | ResidueFrequencyAnalysis | All] | |
| 9 [--outdelim comma | tab | semicolon] [-o, --overwrite] [-p, --precision | |
| 10 number] [-q, --quote yes | no] [--ReferenceSequence SequenceID | | |
| 11 UseFirstSequenceID] [--region "StartResNum, EndResNum, [StartResNum, | |
| 12 EndResNum...]" | UseCompleteSequence] [--RegionResiduesMode AminoAcids | | |
| 13 NucleicAcids | None] [-w, --WorkingDir dirname] SequenceFile(s) | |
| 14 AlignmentFile(s)... | |
| 15 | |
| 16 DESCRIPTION | |
| 17 Analyze *SequenceFile(s) and AlignmentFile(s)* data: calculate pairwise | |
| 18 percent identity matrix or calculate percent occurrence of various | |
| 19 residues in specified sequence regions. All the sequences in the input | |
| 20 file must have the same sequence lengths; otherwise, the sequence file | |
| 21 is ignored. | |
| 22 | |
| 23 The file names are separated by spaces. All the sequence files in a | |
| 24 current directory can be specified by **.aln*, **.msf*, **.fasta*, | |
| 25 **.fta*, **.pir* or any other supported formats; additionally, *DirName* | |
| 26 corresponds to all the sequence files in the current directory with any | |
| 27 of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*. | |
| 28 | |
| 29 Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, | |
| 30 *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file | |
| 31 formats are detected by parsing the contents of *SequenceFile(s) and | |
| 32 AlignmentFile(s)*. | |
| 33 | |
| 34 OPTIONS | |
| 35 -h, --help | |
| 36 Print this help message. | |
| 37 | |
| 38 -i, --IgnoreGaps *yes | no* | |
| 39 Ignore gaps during calculation of sequence lengths and specification | |
| 40 of regions during residue frequency analysis. Possible values: *yes | |
| 41 or no*. Default value: *yes*. | |
| 42 | |
| 43 -m, --mode *PercentIdentityMatrix | ResidueFrequencyAnalysis | All* | |
| 44 Specify how to analyze data in sequence files: calculate percent | |
| 45 identity matrix or calculate frequency of occurrence of residues in | |
| 46 specific regions. During *ResidueFrequencyAnalysis* value of -m, | |
| 47 --mode option, output files are generated for both the residue count | |
| 48 and percent residue count. Possible values: *PercentIdentityMatrix, | |
| 49 ResidueFrequencyAnalysis, or All*. Default value: | |
| 50 *PercentIdentityMatrix*. | |
| 51 | |
| 52 --outdelim *comma | tab | semicolon* | |
| 53 Output text file delimiter. Possible values: *comma, tab, or | |
| 54 semicolon*. Default value: *comma*. | |
| 55 | |
| 56 -o, --overwrite | |
| 57 Overwrite existing files. | |
| 58 | |
| 59 -p, --precision *number* | |
| 60 Precision of calculated values in the output file. Default: up to | |
| 61 *2* decimal places. Valid values: positive integers. | |
| 62 | |
| 63 -q, --quote *yes | no* | |
| 64 Put quotes around column values in output text file. Possible | |
| 65 values: *yes or no*. Default value: *yes*. | |
| 66 | |
| 67 --ReferenceSequence *SequenceID | UseFirstSequenceID* | |
| 68 Specify reference sequence ID to identify regions for performing | |
| 69 *ResidueFrequencyAnalysis* specified using -m, --mode option. | |
| 70 Default: *UseFirstSequenceID*. | |
| 71 | |
| 72 --region *StartResNum,EndResNum,[StartResNum,EndResNum...] | | |
| 73 UseCompleteSequence* | |
| 74 Specify how to perform frequency of occurrence analysis for | |
| 75 residues: use specific regions indicated by starting and ending | |
| 76 residue numbers in reference sequence or use the whole reference | |
| 77 sequence as one region. Default: *UseCompleteSequence*. | |
| 78 | |
| 79 Based on the value of -i, --IgnoreGaps option, specified residue | |
| 80 numbers *StartResNum,EndResNum* correspond to the positions in the | |
| 81 reference sequence without gaps or with gaps. | |
| 82 | |
| 83 For residue numbers corresponding to the reference sequence | |
| 84 including gaps, percent occurrence of various residues corresponding | |
| 85 to gap position in reference sequence is also calculated. | |
| 86 | |
| 87 --RegionResiduesMode *AminoAcids | NucleicAcids | None* | |
| 88 Specify how to process residues in the regions specified using | |
| 89 --region option during *ResidueFrequencyAnalysis* calculation: | |
| 90 categorize residues as amino acids, nucleic acids, or simply ignore | |
| 91 residue category during the calculation. Possible values: | |
| 92 *AminoAcids, NucleicAcids or None*. Default value: *None*. | |
| 93 | |
| 94 For *AminoAcids* or *NucleicAcids* values of --RegionResiduesMode | |
| 95 option, all the standard amino acids or nucleic acids are listed in | |
| 96 the output file for each region; Any gaps and other non standard | |
| 97 residues are added to the list as encountered. | |
| 98 | |
| 99 For *None* value of --RegionResiduesMode option, no assumption is | |
| 100 made about type of residues. Residue and gaps are added to the list | |
| 101 as encountered. | |
| 102 | |
| 103 -r, --root *rootname* | |
| 104 New sequence file name is generated using the root: | |
| 105 <Root><Mode>.<Ext> and <Root><Mode><RegionNum>.<Ext>. Default new | |
| 106 file name: <SequenceFileName><Mode>.<Ext> for | |
| 107 *PercentIdentityMatrix* value m, --mode option and | |
| 108 <SequenceFileName><Mode><RegionNum>.<Ext> for | |
| 109 *ResidueFrequencyAnalysis*. The csv, and tsv <Ext> values are used | |
| 110 for comma/semicolon, and tab delimited text files respectively. This | |
| 111 option is ignored for multiple input files. | |
| 112 | |
| 113 -w --WorkingDir *text* | |
| 114 Location of working directory. Default: current directory. | |
| 115 | |
| 116 EXAMPLES | |
| 117 To calculate percent identity matrix for all sequences in Sample1.msf | |
| 118 file and generate Sample1PercentIdentityMatrix.csv, type: | |
| 119 | |
| 120 % AnalyzeSequenceFilesData.pl Sample1.msf | |
| 121 | |
| 122 To perform residue frequency analysis for all sequences in Sample1.aln | |
| 123 file corresponding to non-gap positions in the first sequence and | |
| 124 generate Sample1ResidueFrequencyAnalysisRegion1.csv and | |
| 125 Sample1PercentResidueFrequencyAnalysisRegion1.csv files, type: | |
| 126 | |
| 127 % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis -o | |
| 128 Sample1.aln | |
| 129 | |
| 130 To perform residue frequency analysis for all sequences in Sample1.aln | |
| 131 file corresponding to all positions in the first sequence and generate | |
| 132 TestResidueFrequencyAnalysisRegion1.csv and | |
| 133 TestPercentResidueFrequencyAnalysisRegion1.csv files, type: | |
| 134 | |
| 135 % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis --IgnoreGaps | |
| 136 No -o -r Test Sample1.aln | |
| 137 | |
| 138 To perform residue frequency analysis for all sequences in Sample1.aln | |
| 139 file corresponding to non-gap residue positions 5 to 10, and 30 to 40 in | |
| 140 sequence ACHE_BOVIN and generate | |
| 141 Sample1ResidueFrequencyAnalysisRegion1.csv, | |
| 142 Sample1ResidueFrequencyAnalysisRegion2.csv, | |
| 143 SamplePercentResidueFrequencyAnalysisRegion1.csv, and | |
| 144 SamplePercentResidueFrequencyAnalysisRegion2.csv files, type: | |
| 145 | |
| 146 % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis | |
| 147 --ReferenceSequence ACHE_BOVIN --region "5,15,30,40" -o Sample1.msf | |
| 148 | |
| 149 AUTHOR | |
| 150 Manish Sud <msud@san.rr.com> | |
| 151 | |
| 152 SEE ALSO | |
| 153 ExtractFromSequenceFiles.pl, InfoSequenceFiles.pl | |
| 154 | |
| 155 COPYRIGHT | |
| 156 Copyright (C) 2015 Manish Sud. All rights reserved. | |
| 157 | |
| 158 This file is part of MayaChemTools. | |
| 159 | |
| 160 MayaChemTools is free software; you can redistribute it and/or modify it | |
| 161 under the terms of the GNU Lesser General Public License as published by | |
| 162 the Free Software Foundation; either version 3 of the License, or (at | |
| 163 your option) any later version. | |
| 164 |
