diff docs/scripts/txt/ModifySDFilesDataFields.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/docs/scripts/txt/ModifySDFilesDataFields.txt	Wed Jan 20 09:23:18 2016 -0500
@@ -0,0 +1,254 @@
+NAME
+    ModifySDFilesDataFields.pl - Modify data fields in SDFile(s)
+
+SYNOPSIS
+    ModifySDFilesDataFields.pl SDFile(s)...
+
+    ModifySDFilesDataFields.pl [-d, --detail infolevel] [--datafieldscommon
+    newfieldlabel, newfieldvalue, [newfieldlabel, newfieldvalue,...]]
+    [--datafieldsmap newfieldlabel, oldfieldlabel, [oldfieldlabel,...];
+    [newfieldlabel, oldfieldlabel, [oldfieldlabel,...]]]
+    [--datafieldsmapfile filename] [--datafieldURL URLDataFieldLabel,
+    CGIScriptPath, CGIParamName, CmpdIDFieldLabel] [-h, --help] [-k,
+    --keepolddatafields all | unmappedonly | none] [-m, --mode molname |
+    datafields | both] [--molnamemode datafield | labelprefix] [--molname
+    datafieldname or prefixstring] [--molnamereplace always | empty] [-o,
+    --overwrite] [-r, --root rootname] [-w, --workingdir dirname]
+    SDFile(s)...
+
+DESCRIPTION
+    Modify molname line and data fields in *SDFile(s)*. Molname line can be
+    replaced by a data field value or assigned a sequential ID prefixed with
+    a specific string. For data fields and modification of their values,
+    these types of options are supported: replace data field labels by
+    another set of labels; combine values of multiple data fields and assign
+    a new label; add specific set of data field labels and values to all
+    compound records; and others.
+
+    The file names are separated by space.The valid file extensions are
+    *.sdf* and *.sd*. All other file names are ignored. All the SD files in
+    a current directory can be specified either by **.sdf* or the current
+    directory name.
+
+OPTIONS
+    -d, --detail *infolevel*
+        Level of information to print about compound records being ignored.
+        Default: *1*. Possible values: *1, 2 or 3*.
+
+    --datafieldscommon *newfieldlabel, newfieldvalue, [newfieldlabel,
+    newfieldvalue,...]*
+        Specify data field labels and values for addition to each compound
+        record. It's a comma delimited list of data field label and values
+        pair. Default: *none*.
+
+        Examples:
+
+            DepositionDate,YYYY-MM-DD
+            Source,www.domainname.org,ReleaseData,YYYY-MM-DD
+
+    --datafieldsmap *newfieldlabel, oldfieldlabel, [oldfieldlabel,...];
+    [newfieldlabel, oldfieldlabel, [oldfieldlabel,...]]*
+        Specify how various data field labels and values are combined to
+        generate a new data field labels and their values. All the comma
+        delimited data fields, with in a semicolon delimited set, are mapped
+        to the first new data field label along with the data field values
+        joined via new line character. Default: *none*.
+
+        Examples:
+
+            Synonym,Name,SystematicName,Synonym;CmpdID,Extreg
+            HBondDonors,SumNHOH
+
+    --datafieldsmapfile *filename*
+        Filename containing mapping of data fields. Format of data fields
+        line in this file corresponds to --datafieldsmap option. Example:
+
+            Line 1: Synonym,Name,SystematicName,Synonym;CmpdID,Extreg
+            Line 2: HBondDonors,SumNHOH
+
+    --datafieldURL *URLDataFieldLabel, CGIScriptPath, CGIParamName,
+    CmpdIDFieldLabel*
+        Specify how to generate a URL for retrieving compound data from a
+        web server and add it to each compound record. *URLDataFieldLabel*
+        is used as the data field label for URL value which is created by
+        combining *CGIScriptPath,CGIParamName,CmpdIDFieldLabel* values:
+        CGIScriptPath?CGIParamName=CmpdIDFieldLabelValue. Default: *none*.
+
+        Example:
+
+            Source,http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID
+
+    -h, --help
+        Print this help message.
+
+    -k, --keepolddatafields *all | unmappedonly | none*
+        Specify how to transfer old data fields from input SDFile(s) to new
+        SDFile(s) during *datafields | both* value of -m, --mode option:
+        keep all old data fields; write out the ones not mapped to new
+        fields as specified by --datafieldsmap or <--datafieldsmapfile>
+        options; or ignore all old data field labels. For *molname* -m
+        --mode, old datafields are always kept. Possible values: *all |
+        unmappedonly | none*. Default: *none*.
+
+    -m, --mode *molname | datafields | both*
+        Specify how to modify SDFile(s): *molname* - change molname line by
+        another datafield or value; *datafield* - modify data field labels
+        and values by replacing one label by another, combining multiple
+        data field labels and values, adding specific set of data field
+        labels and values to all compound, or inserting an URL for compound
+        retrieval to each record; *both* - change molname line and
+        datafields simultaneously. Possible values: *molname | datafields |
+        both*. Default: *molname*
+
+    --molnamemode *datafield | labelprefix*
+        Specify how to change molname line for -m --mode option values of
+        *molname | both*: use a datafield label value or assign a sequential
+        ID prefixed with *labelprefix*. Possible values: *datafield |
+        labelprefix*. Default: *labelprefix*.
+
+    --molname *datafieldname or prefixstring*
+        Molname generation method. For *datafield* value of --molnamemode
+        option, it corresponds to datafield label name whose value is used
+        for molname; otherwise, it's a prefix string used for generating
+        compound IDs like labelprefixstring<Number>. Default value, *Cmpd*,
+        generates compound IDs like Cmpd<Number> for molname.
+
+    --molnamereplace *always | empty*
+        Specify when to replace molname line for -m --mode option values of
+        *molname | both*: always replace the molname line using --molname
+        option or only when it's empty. Possible values: *always | empty*.
+        Default: *empty*.
+
+    -o, --overwrite
+        Overwrite existing files.
+
+    -r, --root *rootname*
+        New SD file name is generated using the root: <Root>.<Ext>. Default
+        new file name: <InitialSDFileName>ModifiedDataFields.<Ext>. This
+        option is ignored for multiple input files.
+
+    -w, --workingdir *dirname*
+        Location of working directory. Default: current directory.
+
+EXAMPLES
+    To replace empty molname lines by Cmpd<CmpdNumber> and generate a new SD
+    file NewSample1.sdf, type:
+
+        % ModifySDFilesDataFields.pl -o -r NewSample1 Sample1.sdf
+
+    To replace all molname lines by Mol_ID data field generate a new SD file
+    NewSample1.sdf, type:
+
+        % ModifySDFilesDataFields.pl --molnamemode datafield
+        --molnamereplace always -r NewSample1 -o Sample1.sdf
+
+    To replace all molname lines by Mol_ID data field, map Name and
+    CompoundName to a new datafield Synonym, and generate a new SD file
+    NewSample1.sdf, type:
+
+        % ModifySDFilesDataFields.pl --molnamemode datafield
+          --molnamereplace always --molname Mol_ID --mode both
+          --datafieldsmap "Synonym,Name,CompoundName" -r
+          NewSample1 -o Sample1.sdf
+
+    To replace all molname lines by Mol_ID data field, map Name and
+    CompoundName to a new datafield Synonym, add common fields ReleaseDate
+    and Source, and generate a new SD file NewSample1.sdf without keeping
+    any old SD data fields, type:
+
+        % ModifySDFilesDataFields.pl --molnamemode datafield
+          --molnamereplace always --molname Mol_ID --mode both
+          --datafieldsmap "Synonym,Name,CompoundName"
+          --datafieldscommon "ReleaseDate,yyyy-mm-dd,Source,
+          www.mayachemtools.org" --keepolddatafields none -r
+          NewSample1 -o Sample1.sdf
+
+    Preparing SD files PubChem deposition:
+
+    Consider a SD file with these fields: Mol_ID, Name, Synonyms and
+    Systematic_Name. And Mol_ID data field uniquely identifies your
+    compound.
+
+    To prepare a new SD file CmpdDataForPubChem.sdf containing only required
+    PUBCHEM_EXT_DATASOURCE_REGID field, type:
+
+        % ModifySDFilesDataFields.pl --m datafields
+          --datafieldsmap
+          "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID"
+          -r CmpdDataForPubChem -o Sample1.sdf
+
+    To prepare a new SD file CmpdDataForPubChem.sdf containing only required
+    PUBCHEM_EXT_DATASOURCE_REGID field and replace molname line with Mol_ID,
+    type:
+
+        % ModifySDFilesDataFields.pl --molnamemode datafield
+          --molnamereplace always --molname Mol_ID --mode both
+          --datafieldsmap
+           "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID"
+          -r CmpdDataForPubChem -o Sample1.sdf
+
+    In addition to required PubChem data field, you can also add optional
+    PubChem data fields.
+
+    To map your Name, Synonyms and Systematic_Name data fields to optional
+    PUBCHEM_SUBSTANCE_SYNONYM data field along with required ID field, type:
+
+        % ModifySDFilesDataFields.pl --molnamemode datafield
+          --molnamereplace always --molname Mol_ID --mode both
+          --datafieldsmap
+          "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
+          PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName"
+          -r CmpdDataForPubChem -o Sample1.sdf
+
+    To add your <domain.org> as PUBCHEM_EXT_SUBSTANCE_URL and link substance
+    retrieval to your CGI script
+    <http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID> via
+    PUBCHEM_EXT_DATASOURCE_REGID field along with optional and required data
+    fields, type:
+
+        % ModifySDFilesDataFields.pl --molnamemode datafield
+          --molnamereplace always --molname Mol_ID --mode both
+          --datafieldsmap
+          "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
+          PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName"
+          --datafieldscommon
+          "PUBCHEM_EXT_SUBSTANCE_URL,domain.org"
+          --datafieldURL "PUBCHEM_EXT_DATASOURCE_URL,
+          http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID"
+          -r CmpdDataForPubChem -o Sample1.sdf
+
+    And to add a publication date and request a release data using
+    PUBCHEM_PUBLICATION_DATE and PUBCHEM_DEPOSITOR_RECORD_DATE data fields
+    along with all the data fields in earlier examples, type: optional
+    fields, type:
+
+        % ModifySDFilesDataFields.pl --molnamemode datafield
+          --molnamereplace always --molname Mol_ID --mode both
+          --datafieldsmap
+          "PUBCHEM_EXT_DATASOURCE_REGID,Mol_ID;
+          PUBCHEM_SUBSTANCE_SYNONYM,Name,CompoundName"
+          --datafieldURL "PUBCHEM_EXT_DATASOURCE_URL,
+          http://www.yourdomain.org/GetCmpd.pl,Reg_ID,Mol_ID"
+          --datafieldscommon
+          "PUBCHEM_EXT_SUBSTANCE_URL,domain.org,
+          PUBCHEM_PUBLICATION_DATE,YYY-MM-DD,
+          PUBCHEM_DEPOSITOR_RECORD_DATE,YYYY-MM-DD"
+          -r CmpdDataForPubChem -o Sample1.sdf
+
+AUTHOR
+    Manish Sud <msud@san.rr.com>
+
+SEE ALSO
+    InfoSDFiles.pl, JoinSDFiles.pl, MergeTextFilesWithSD.pl,
+    SplitSDFiles.pl, SDFilesToHTML.pl
+
+COPYRIGHT
+    Copyright (C) 2015 Manish Sud. All rights reserved.
+
+    This file is part of MayaChemTools.
+
+    MayaChemTools is free software; you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published by
+    the Free Software Foundation; either version 3 of the License, or (at
+    your option) any later version.
+