Mercurial > repos > deepakjadmin > mayatool3_test2
diff docs/scripts/txt/ModifySDFilesDataFields.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/scripts/txt/ModifySDFilesDataFields.txt Wed Jan 20 09:23:18 2016 -0500 @@ -0,0 +1,254 @@ +NAME + ModifySDFilesDataFields.pl - Modify data fields in SDFile(s) + +SYNOPSIS + ModifySDFilesDataFields.pl SDFile(s)... + + ModifySDFilesDataFields.pl [-d, --detail infolevel] [--datafieldscommon + newfieldlabel, newfieldvalue, [newfieldlabel, newfieldvalue,...]] + [--datafieldsmap newfieldlabel, oldfieldlabel, [oldfieldlabel,...]; + [newfieldlabel, oldfieldlabel, [oldfieldlabel,...]]] + [--datafieldsmapfile filename] [--datafieldURL URLDataFieldLabel, + CGIScriptPath, CGIParamName, CmpdIDFieldLabel] [-h, --help] [-k, + --keepolddatafields all | unmappedonly | none] [-m, --mode molname | + datafields | both] [--molnamemode datafield | labelprefix] [--molname + datafieldname or prefixstring] [--molnamereplace always | empty] [-o, + --overwrite] [-r, --root rootname] [-w, --workingdir dirname] + SDFile(s)... + +DESCRIPTION + Modify molname line and data fields in *SDFile(s)*. Molname line can be + replaced by a data field value or assigned a sequential ID prefixed with + a specific string. For data fields and modification of their values, + these types of options are supported: replace data field labels by + another set of labels; combine values of multiple data fields and assign + a new label; add specific set of data field labels and values to all + compound records; and others. + + The file names are separated by space.The valid file extensions are + *.sdf* and *.sd*. All other file names are ignored. All the SD files in + a current directory can be specified either by **.sdf* or the current + directory name. + +OPTIONS + -d, --detail *infolevel* + Level of information to print about compound records being ignored. + Default: *1*. Possible values: *1, 2 or 3*. + + --datafieldscommon *newfieldlabel, newfieldvalue, [newfieldlabel, + newfieldvalue,...]* + Specify data field labels and values for addition to each compound + record. It's a comma delimited list of data field label and values + pair. Default: *none*. + + Examples: + + DepositionDate,YYYY-MM-DD + Source,www.domainname.org,ReleaseData,YYYY-MM-DD + + --datafieldsmap *newfieldlabel, oldfieldlabel, [oldfieldlabel,...]; + [newfieldlabel, oldfieldlabel, [oldfieldlabel,...]]* + Specify how various data field labels and values are combined to + generate a new data field labels and their values. All the comma + delimited data fields, with in a semicolon delimited set, are mapped + to the first new data field label along with the data field values + joined via new line character. Default: *none*. + + Examples: + + Synonym,Name,SystematicName,Synonym;CmpdID,Extreg + HBondDonors,SumNHOH + + --datafieldsmapfile *filename* + Filename containing mapping of data fields. Format of data fields + line in this file corresponds to --datafieldsmap option. Example: + + Line 1: Synonym,Name,SystematicName,Synonym;CmpdID,Extreg + Line 2: HBondDonors,SumNHOH + + --datafieldURL *URLDataFieldLabel, CGIScriptPath, CGIParamName, + CmpdIDFieldLabel* + Specify how to generate a URL for retrieving compound data from a + web server and add it to each compound record. *URLDataFieldLabel* + is used as the data field label for URL value which is created by + combining *CGIScriptPath,CGIParamName,CmpdIDFieldLabel* values: + CGIScriptPath?CGIParamName=CmpdIDFieldLabelValue. Default: *none*. + + Example: + + Source,http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID + + -h, --help + Print this help message. + + -k, --keepolddatafields *all | unmappedonly | none* + Specify how to transfer old data fields from input SDFile(s) to new + SDFile(s) during *datafields | both* value of -m, --mode option: + keep all old data fields; write out the ones not mapped to new + fields as specified by --datafieldsmap or <--datafieldsmapfile> + options; or ignore all old data field labels. For *molname* -m + --mode, old datafields are always kept. Possible values: *all | + unmappedonly | none*. Default: *none*. + + -m, --mode *molname | datafields | both* + Specify how to modify SDFile(s): *molname* - change molname line by + another datafield or value; *datafield* - modify data field labels + and values by replacing one label by another, combining multiple + data field labels and values, adding specific set of data field + labels and values to all compound, or inserting an URL for compound + retrieval to each record; *both* - change molname line and + datafields simultaneously. Possible values: *molname | datafields | + both*. Default: *molname* + + --molnamemode *datafield | labelprefix* + Specify how to change molname line for -m --mode option values of + *molname | both*: use a datafield label value or assign a sequential + ID prefixed with *labelprefix*. Possible values: *datafield | + labelprefix*. Default: *labelprefix*. + + --molname *datafieldname or prefixstring* + Molname generation method. For *datafield* value of --molnamemode + option, it corresponds to datafield label name whose value is used + for molname; otherwise, it's a prefix string used for generating + compound IDs like labelprefixstring<Number>. Default value, *Cmpd*, + generates compound IDs like Cmpd<Number> for molname. + + --molnamereplace *always | empty* + Specify when to replace molname line for -m --mode option values of + *molname | both*: always replace the molname line using --molname + option or only when it's empty. Possible values: *always | empty*. + Default: *empty*. + + -o, --overwrite + Overwrite existing files. + + -r, --root *rootname* + New SD file name is generated using the root: <Root>.<Ext>. Default + new file name: <InitialSDFileName>ModifiedDataFields.<Ext>. This + option is ignored for multiple input files. + + -w, --workingdir *dirname* + Location of working directory. Default: current directory. + +EXAMPLES + To replace empty molname lines by Cmpd<CmpdNumber> and generate a new SD + file NewSample1.sdf, type: + + % ModifySDFilesDataFields.pl -o -r NewSample1 Sample1.sdf + + To replace all molname lines by Mol_ID data field generate a new SD file + NewSample1.sdf, type: + + % ModifySDFilesDataFields.pl --molnamemode datafield + --molnamereplace always -r NewSample1 -o Sample1.sdf + + To replace all molname lines by Mol_ID data field, map Name and + CompoundName to a new datafield Synonym, and generate a new SD file + NewSample1.sdf, type: + + % ModifySDFilesDataFields.pl --molnamemode datafield + --molnamereplace always --molname Mol_ID --mode both + --datafieldsmap "Synonym,Name,CompoundName" -r + NewSample1 -o Sample1.sdf + + To replace all molname lines by Mol_ID data field, map Name and + CompoundName to a new datafield Synonym, add common fields ReleaseDate + and Source, and generate a new SD file NewSample1.sdf without keeping + any old SD data fields, type: + + % ModifySDFilesDataFields.pl --molnamemode datafield + --molnamereplace always --molname Mol_ID --mode both + --datafieldsmap "Synonym,Name,CompoundName" + --datafieldscommon "ReleaseDate,yyyy-mm-dd,Source, + www.mayachemtools.org" --keepolddatafields none -r + NewSample1 -o Sample1.sdf + + Preparing SD files PubChem deposition: + + Consider a SD file with these fields: Mol_ID, Name, Synonyms and + Systematic_Name. And Mol_ID data field uniquely identifies your + compound. + + To prepare a new SD file CmpdDataForPubChem.sdf containing only required + PUBCHEM_EXT_DATASOURCE_REGID field, type: + + % ModifySDFilesDataFields.pl --m datafields + --datafieldsmap + "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID" + -r CmpdDataForPubChem -o Sample1.sdf + + To prepare a new SD file CmpdDataForPubChem.sdf containing only required + PUBCHEM_EXT_DATASOURCE_REGID field and replace molname line with Mol_ID, + type: + + % ModifySDFilesDataFields.pl --molnamemode datafield + --molnamereplace always --molname Mol_ID --mode both + --datafieldsmap + "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID" + -r CmpdDataForPubChem -o Sample1.sdf + + In addition to required PubChem data field, you can also add optional + PubChem data fields. + + To map your Name, Synonyms and Systematic_Name data fields to optional + PUBCHEM_SUBSTANCE_SYNONYM data field along with required ID field, type: + + % ModifySDFilesDataFields.pl --molnamemode datafield + --molnamereplace always --molname Mol_ID --mode both + --datafieldsmap + "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID; + PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName" + -r CmpdDataForPubChem -o Sample1.sdf + + To add your <domain.org> as PUBCHEM_EXT_SUBSTANCE_URL and link substance + retrieval to your CGI script + <http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID> via + PUBCHEM_EXT_DATASOURCE_REGID field along with optional and required data + fields, type: + + % ModifySDFilesDataFields.pl --molnamemode datafield + --molnamereplace always --molname Mol_ID --mode both + --datafieldsmap + "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID; + PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName" + --datafieldscommon + "PUBCHEM_EXT_SUBSTANCE_URL,domain.org" + --datafieldURL "PUBCHEM_EXT_DATASOURCE_URL, + http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID" + -r CmpdDataForPubChem -o Sample1.sdf + + And to add a publication date and request a release data using + PUBCHEM_PUBLICATION_DATE and PUBCHEM_DEPOSITOR_RECORD_DATE data fields + along with all the data fields in earlier examples, type: optional + fields, type: + + % ModifySDFilesDataFields.pl --molnamemode datafield + --molnamereplace always --molname Mol_ID --mode both + --datafieldsmap + "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID; + PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName" + --datafieldURL "PUBCHEM_EXT_DATASOURCE_URL, + http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID" + --datafieldscommon + "PUBCHEM_EXT_SUBSTANCE_URL,domain.org, + PUBCHEM_PUBLICATION_DATE,YYY-MM-DD, + PUBCHEM_DEPOSITOR_RECORD_DATE,YYYY-MM-DD" + -r CmpdDataForPubChem -o Sample1.sdf + +AUTHOR + Manish Sud <msud@san.rr.com> + +SEE ALSO + InfoSDFiles.pl, JoinSDFiles.pl, MergeTextFilesWithSD.pl, + SplitSDFiles.pl, SDFilesToHTML.pl + +COPYRIGHT + Copyright (C) 2015 Manish Sud. All rights reserved. + + This file is part of MayaChemTools. + + MayaChemTools is free software; you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published by + the Free Software Foundation; either version 3 of the License, or (at + your option) any later version. +