Mercurial > repos > deepakjadmin > mayatool3_test2
diff docs/scripts/txt/AnalyzeSequenceFilesData.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/scripts/txt/AnalyzeSequenceFilesData.txt Wed Jan 20 09:23:18 2016 -0500 @@ -0,0 +1,164 @@ +NAME + AnalyzeSequenceFilesData.pl - Analyze sequence and alignment files + +SYNOPSIS + AnalyzeSequenceFilesData.pl SequenceFile(s) AlignmentFile(s)... + + AnalyzeSequenceFilesData.pl [-h, --help] [-i, --IgnoreGaps yes | no] + [-m, --mode PercentIdentityMatrix | ResidueFrequencyAnalysis | All] + [--outdelim comma | tab | semicolon] [-o, --overwrite] [-p, --precision + number] [-q, --quote yes | no] [--ReferenceSequence SequenceID | + UseFirstSequenceID] [--region "StartResNum, EndResNum, [StartResNum, + EndResNum...]" | UseCompleteSequence] [--RegionResiduesMode AminoAcids | + NucleicAcids | None] [-w, --WorkingDir dirname] SequenceFile(s) + AlignmentFile(s)... + +DESCRIPTION + Analyze *SequenceFile(s) and AlignmentFile(s)* data: calculate pairwise + percent identity matrix or calculate percent occurrence of various + residues in specified sequence regions. All the sequences in the input + file must have the same sequence lengths; otherwise, the sequence file + is ignored. + + The file names are separated by spaces. All the sequence files in a + current directory can be specified by **.aln*, **.msf*, **.fasta*, + **.fta*, **.pir* or any other supported formats; additionally, *DirName* + corresponds to all the sequence files in the current directory with any + of the supported file extension: *.aln, .msf, .fasta, .fta, and .pir*. + + Supported sequence formats are: *ALN/CLustalW*, *GCG/MSF*, *PILEUP/MSF*, + *Pearson/FASTA*, and *NBRF/PIR*. Instead of using file extensions, file + formats are detected by parsing the contents of *SequenceFile(s) and + AlignmentFile(s)*. + +OPTIONS + -h, --help + Print this help message. + + -i, --IgnoreGaps *yes | no* + Ignore gaps during calculation of sequence lengths and specification + of regions during residue frequency analysis. Possible values: *yes + or no*. Default value: *yes*. + + -m, --mode *PercentIdentityMatrix | ResidueFrequencyAnalysis | All* + Specify how to analyze data in sequence files: calculate percent + identity matrix or calculate frequency of occurrence of residues in + specific regions. During *ResidueFrequencyAnalysis* value of -m, + --mode option, output files are generated for both the residue count + and percent residue count. Possible values: *PercentIdentityMatrix, + ResidueFrequencyAnalysis, or All*. Default value: + *PercentIdentityMatrix*. + + --outdelim *comma | tab | semicolon* + Output text file delimiter. Possible values: *comma, tab, or + semicolon*. Default value: *comma*. + + -o, --overwrite + Overwrite existing files. + + -p, --precision *number* + Precision of calculated values in the output file. Default: up to + *2* decimal places. Valid values: positive integers. + + -q, --quote *yes | no* + Put quotes around column values in output text file. Possible + values: *yes or no*. Default value: *yes*. + + --ReferenceSequence *SequenceID | UseFirstSequenceID* + Specify reference sequence ID to identify regions for performing + *ResidueFrequencyAnalysis* specified using -m, --mode option. + Default: *UseFirstSequenceID*. + + --region *StartResNum,EndResNum,[StartResNum,EndResNum...] | + UseCompleteSequence* + Specify how to perform frequency of occurrence analysis for + residues: use specific regions indicated by starting and ending + residue numbers in reference sequence or use the whole reference + sequence as one region. Default: *UseCompleteSequence*. + + Based on the value of -i, --IgnoreGaps option, specified residue + numbers *StartResNum,EndResNum* correspond to the positions in the + reference sequence without gaps or with gaps. + + For residue numbers corresponding to the reference sequence + including gaps, percent occurrence of various residues corresponding + to gap position in reference sequence is also calculated. + + --RegionResiduesMode *AminoAcids | NucleicAcids | None* + Specify how to process residues in the regions specified using + --region option during *ResidueFrequencyAnalysis* calculation: + categorize residues as amino acids, nucleic acids, or simply ignore + residue category during the calculation. Possible values: + *AminoAcids, NucleicAcids or None*. Default value: *None*. + + For *AminoAcids* or *NucleicAcids* values of --RegionResiduesMode + option, all the standard amino acids or nucleic acids are listed in + the output file for each region; Any gaps and other non standard + residues are added to the list as encountered. + + For *None* value of --RegionResiduesMode option, no assumption is + made about type of residues. Residue and gaps are added to the list + as encountered. + + -r, --root *rootname* + New sequence file name is generated using the root: + <Root><Mode>.<Ext> and <Root><Mode><RegionNum>.<Ext>. Default new + file name: <SequenceFileName><Mode>.<Ext> for + *PercentIdentityMatrix* value m, --mode option and + <SequenceFileName><Mode><RegionNum>.<Ext> for + *ResidueFrequencyAnalysis*. The csv, and tsv <Ext> values are used + for comma/semicolon, and tab delimited text files respectively. This + option is ignored for multiple input files. + + -w --WorkingDir *text* + Location of working directory. Default: current directory. + +EXAMPLES + To calculate percent identity matrix for all sequences in Sample1.msf + file and generate Sample1PercentIdentityMatrix.csv, type: + + % AnalyzeSequenceFilesData.pl Sample1.msf + + To perform residue frequency analysis for all sequences in Sample1.aln + file corresponding to non-gap positions in the first sequence and + generate Sample1ResidueFrequencyAnalysisRegion1.csv and + Sample1PercentResidueFrequencyAnalysisRegion1.csv files, type: + + % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis -o + Sample1.aln + + To perform residue frequency analysis for all sequences in Sample1.aln + file corresponding to all positions in the first sequence and generate + TestResidueFrequencyAnalysisRegion1.csv and + TestPercentResidueFrequencyAnalysisRegion1.csv files, type: + + % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis --IgnoreGaps + No -o -r Test Sample1.aln + + To perform residue frequency analysis for all sequences in Sample1.aln + file corresponding to non-gap residue positions 5 to 10, and 30 to 40 in + sequence ACHE_BOVIN and generate + Sample1ResidueFrequencyAnalysisRegion1.csv, + Sample1ResidueFrequencyAnalysisRegion2.csv, + SamplePercentResidueFrequencyAnalysisRegion1.csv, and + SamplePercentResidueFrequencyAnalysisRegion2.csv files, type: + + % AnalyzeSequenceFilesData.pl -m ResidueFrequencyAnalysis + --ReferenceSequence ACHE_BOVIN --region "5,15,30,40" -o Sample1.msf + +AUTHOR + Manish Sud <msud@san.rr.com> + +SEE ALSO + ExtractFromSequenceFiles.pl, InfoSequenceFiles.pl + +COPYRIGHT + Copyright (C) 2015 Manish Sud. All rights reserved. + + This file is part of MayaChemTools. + + MayaChemTools is free software; you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published by + the Free Software Foundation; either version 3 of the License, or (at + your option) any later version. +