# HG changeset patch # User ric # Date 1475143755 14400 # Node ID e54d14bed3f589c298d9df40b45999b272a3c0d7 Uploaded diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/biosample.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/biosample.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,195 @@ + + import BioSample definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $move_to_common_space + --move-to-common-space + #end if + #if $blocking_validation + --blocking-validator + #end if + biosample + #if str($study) != 'use_provided' + --study ${study} + #end if + #if str($source_type) != 'use_provided' + --source-type ${source_type} + #end if + #if str($vessel_type_selector.vessel_type) != 'use_provided' + --vessel-type ${vessel_type_selector.vessel_type} + #end if + #if str($vessel_content) != 'use_provided' + --vessel-content=${vessel_content} + #end if + #if str($vessel_status) != 'use_provided' + --vessel-status=${vessel_status} + #end if + #if str($vessel_type_selector) == 'IlluminaBeadChipArray' + #if str($vessel_type_selector.assay_type) != 'use_provided' + --bead-chip-assay-type=${vessel_type_selector.assay_type} + #end if + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +A biosample record will have, at least, the following fields:: + + label source + I001-bs-2 V932814892 + I002-bs-2 V932814892 + I003-bs-2 None + +Where label is the label of the biosample container. If a 'None' value +has been passed in the source column, the biosample will be imported +as a new unlinked object into the biobanks. Another example, this time +involving DNA samples:: + + label source used_volume current_volume activation_date + I001-dna V932814899 0.3 0.2 17/03/2007 + I002-dna V932814900 0.22 0.2 21/01/2004 + +A special case is when records refer to biosamples contained in plate +wells. In this case, an additional column must be present with the VID +of the corresponding TiterPlate object. For instance:: + + plate label source + V39030 A01 V932814892 + V39031 A02 V932814893 + V39032 A03 V932814894 + +where the label column is now the label of the well position. + +If row and column (optional) are provided, the program will use them; +if they are not provided, it will infer them from label (e.g., J01 -> +row=10, column=1). Missing labels will be generated as:: + + '%s%03d' % (chr(row+ord('A')-1), column) + +A badly formed label will result in the rejection of the record; the +same will happen if label, row and column are inconsistent. The well +will be filled by current_volume material produced by removing +used_volume material taken from the bio material contained in the +vessel identified by source. row and column are base 1. + +If the sample is a IlluminaBeadChipArray the plate column used in the +PlateWell case will become a illumina_array column and a new column, named +bead_chip_assay_type, is required:: + + illumina_array label source bead_chip_assay_type + V1351235 R01C01 V412441 HUMANEXOME_12V1_B + V1351235 R01C02 V351151 HUMANEXOME_12V1_B + V1351235 R02C01 V345115 HUMANEXOME_12V1_B + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/birth_data.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/birth_data.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,57 @@ + + import birth data within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + birth_data + #if str($study) != 'use_provided' + --study ${study} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + study individual timestamp birth_date birth_place + ASTUDY V1234 1310057541608 12/03/1978 006171 + ASTUDY V14112 1310057541608 25/04/1983 006149 + ASTUDY V1241 1310057541608 12/03/2001 006172 + ..... + +where birth_place is a valid ISTAT code for an Italian city or a +foreign Country and birth_date must have the dd/mm/YYYY format. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/data_collection.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/data_collection.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,120 @@ + + import DataCollection definitions within + OMERO.biobank + + launcher.sh + --interpreter=python + --runner=importer.py + #if $omero_configuration.level == 'advanced' + --host=$omero_configuration.vl_host + --user=$omero_configuration.vl_user + --passwd=$omero_configuration.vl_passwd + #else + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + #end if + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + data_collection + #if str($study) != 'use_provided' + --study ${study} + #end if + #if str($data_sample_type) != 'use_provided' + --data_sample-type=${data_sample_type} + #end if + #if str($label) + --label=${label} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + study label data_sample + BSTUDY dc-01 V0390290 + BSTUDY dc-01 V0390291 + BSTUDY dc-02 V0390292 + BSTUDY dc-02 V390293 + ... + +This will create new DataCollection(s), whose label is defined by the +label column, and link to it, using DataCollectionItem objects, +the DataSample object(s) identified by data_sample (a VID). + +Records that point to an unknown DataSample will abort the data +collection loading. Previously seen collections will be noisily +ignored. It is not legal to use the importer to add items to a +previously known collection. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/data_object.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/data_object.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,82 @@ + + import DataObject definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + data_object + #if str($study) != 'use_provided' + --study ${study} + #end if + #if str($mimetype) != 'use_provided' + --mimetype=${mimetype} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + study path data_sample mimetype size sha1 + + TEST01 file:/share/fs/v039303.cel V2902 x-vl/affymetrix-cel 39090 E909090 + .... + +Records that point to an unknown data sample will be noisily +ignored. The same will happen to records that have the same path of a +previously seen data_object + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/data_sample.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/data_sample.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,165 @@ + + import DataSample definitions within OMERO.biobank + + launcher.sh + --interpreter=python + --runner=importer.py + #if $omero_configuration.level == 'advanced' + --host=$omero_configuration.vl_host + --user=$omero_configuration.vl_user + --passwd=$omero_configuration.vl_passwd + #else + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + #end if + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + data_sample + #if str($study) != 'use_provided' + --study ${study} + #end if + #if str($source_type) != 'use_provided' + --source-type=${source_type} + #end if + #if str($device_type) != 'use_provided' + --device-type=${device_type} + #end if + #if str($scanner) != 'use_provided' + --scanner=${scanner} + #end if + #if str($data_sample_type) != 'use_provided' + --data-sample-type=${data_sample_type} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + study label source device device_type scanner options + ASTUDY foo01 v03909 v9309 Chip v99020 celID=0009099090 + ASTUDY foo02 v03909 v99022 Scanner v99022 conf1=...,conf2=... + .... + +In this example, the first line corresponds to a dataset obtained by +using chip v9309 on scanner v99020, while the second datasample has +been obtained using a technology directly using a scanner, e.g., an +Illumina HiSeq 2000. The '''scanner''' column is there as a +convenience to support a more detailed description of a chip-based +acquisition. + +The general strategy is to decide what data objects should be +instantiated by looking at the chip column and at its corresponding +maker,model,release. + +The optional column '''scanner''', the vid of the scanner device, is +used in cases, such as Affymetrix genotyping, where it is relevant. + +It is also possible to import DataSample(s) that are the results of +processing other DataSample(s). Here is an example:: + + study label source device device_type options + ASTUDY foo01 v03909 v99021 SoftwareProgram conf1=...,conf2=... + ASTUDY foo02 v03909 v99021 SoftwareProgram conf1=...,conf2=... + .... + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/device.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/device.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,99 @@ + + import Device definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + device + #if str($study) != 'use_provided' + --study ${study} + #end if + #if str($device_type) != 'use_provided' + --device-type=${device_type} + #end if + #if str($maker) + --maker=${maker} + #end if + #if str($model) + --model=${model} + #end if + #if str($release) + --relese=${release} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + study device_type label barcode maker model release location + BSTUDY Scanner pula01 8989898 Affymetrix GeneChip Scanner 3000 7G Pula bld. 5 + BSTUDY Chip chip001 8329482 Affymetrix Genome-Wide Human SNP Array 6.0 None + +All devices have a type, a label, an optional barcode, a maker, a +model, a release and an optional physical location. In the example +above, in the first line we have defined a scanner, which is +physically located in the building 5 lab in Pula. The second line +defines a chip. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/diagnosis.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/diagnosis.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,53 @@ + + import diagnosis data within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + diagnosis + #if str($study) != 'use_provided' + --study ${study} + #end if + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + study individual timestamp diagnosis + ASTUDY V899 1310057541608 icd10-cm:E10 + ASTUDY V899 1310057541608 icd10-cm:G35 + ASTYDY V1806 1310057541608 exclusion-problem_diagnosis + ... + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/enrollment.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/enrollment.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,78 @@ + + Create new enrollmnents for existing individuals within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + enrollment + #if str($study_label) != 'use_provided' + --study=$study_label + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Import of new enrollments related to existing individuals. +An enrollment is characterized by the following fields:: + + source study label + V044DE795E7F9F42FEB9855288CF577A77 xxx id1 + V06C59B915C0FD47DABE6AE02C731780AF xxx id2 + V01654DCFC5BB640C0BB7EE088194E629D xxx id3 + +where source must be the VID of an existing Individual object, study a +label of an existing Study object and label the enrollment code for +the patient in the study. + +The enrollment sub-operation will retrieve the source individual from +the DB, create a new enrollment related to it and output the VIDs of +newly created enrollments. It is not possible to create two +enrollments with the same code related to the same study, nor is it +possible to enroll a patient twice in the same study, even with +different codes. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/group.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/group.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,53 @@ + + Create a new group within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + group + #if str($group_label) != '' + --group=$group_label + #end if + + + + + + + + + + + + + + + + + + + +Will create a new group of individuals from a file with the following columns:: + + study label individual + foo I0000 V06C59B915C0FD47DABE6AE02C731780AF + foo I0001 V0B718B77691B145BFA8901FCCF6B37998 + ... + +where the column study is optional (it can be provided via the +group_label param). Labels should be unique within the file and the +individual field should contain VIDs of existing (within omero/vl) +Individual objects. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/illumina_bead_chip_measures.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/illumina_bead_chip_measures.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,90 @@ + + import IlluminaBeadChipMeasures definitions within OMERO + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + illumina_bead_chip_measures + #if str($study) != 'use_provided' + --study=${study} + #end if + #if str($source_type) != 'use_provided' + --source_type=${source_type} + #end if + #if str($action_category) != 'use_provided' + --action_category=${action_category} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read a tsv file with the following columns:: + + study label red_channel green_channel source source_type + ASTUDY CHIP_01_R01C01 V1415151235513 V135135661356161 V351351351551 IlluminaBeadChipArray + ASTUDY CHIP_01_R01C02 V2346262462462 V112395151351623 V135113513223 IlluminaBeadChipArray + ASTUDY CHIP_01_R02C01 V1351362899135 V913977551235981 V100941215192 IlluminaBeadChipArray + +This will create new IlluminaBeadChipMeasures whose labels are defined in the +label column. + + + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/importer.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/importer.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,7 @@ +# BEGIN_COPYRIGHT +# END_COPYRIGHT + +import sys +from bl.vl.app.importer.main import main + +main(sys.argv[1:]) diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/individual.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/individual.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,71 @@ + + import individual definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=$input + --ofile=$output + --report_file=$report + --logfile=$logfile + #if $blocking_validation + --blocking-validator + #end if + individual + #if str($study) != 'use_provided' + --study $study + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will import a stream of new individual definitions defined by the +following columns:: + + label gender father mother + id2 male None None + id3 female None None + .... + +It is not possible to import the same individual twice: the related +file rows will be noisily ignored. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/laneslot.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/laneslot.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,101 @@ + + import LaneSlot definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + laneslot + #if str($study) != 'use_provided' + --study=${study} + #end if + #if str($source_type) != 'use_provided' + --source_type=${source_type} + #end if + #if str($content) != 'use_provided' + --content=${content} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +A lane slot record will have the following fields:: + + lane tag content source + V123411 ATCACG DNA V4512415 + V123411 CGATGT DNA V1415512 + V412511 DNA V1909012 + V661251 TGACCA DNA V1123111 + V661251 CTTGTA DNA V1211141 + .... + +the content column can be option if passed as script's input value, +tag column is optional too. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/launcher.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/launcher.sh Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,56 @@ +#!/bin/sh + +CMD="" +PYTH_PATH="PYTHONPATH=/SHARE/USERFS/els7/users/galaxy/develop/usr-cluster/lib/p\ +ython2.7/site-packages/:/SHARE/USERFS/els7/users/biobank/lib/" +runner="$(dirname ${BASH_SOURCE[0]})/" +until [ -z $1 ] + do + + opt_host='--host=' + opt_user='--user=' + opt_passwd='--passwd=' + opt_interpreter='--interpreter=' + opt_runner='--runner=' + if [[ $1 == $opt_host* ]]; then + host=`echo $1 | cut -d '=' -f2 | cut -d '.' -f1` + if [ -z $host -o $host == 'None' ]; then + echo 'ERROR. Missing omero host parameter. Please, set Omero Host in your user preferences' > /dev/null >&2 + exit -1 + fi + PYTH_PATH+=$host + HOST=`echo $1 | cut -d '=' -f2` + CMD+=' '$1 + elif [[ $1 == $opt_user* ]]; then + user=`echo $1 | cut -d '=' -f2` + if [ -z $user -o $user == 'None' ]; then + echo 'ERROR. Missing omero user parameter. Please, set Omero User in your user preferences' > /dev/null >&2 + exit -1 + fi + CMD+=' '$1 + elif [[ $1 == $opt_passwd* ]]; then + passwd=`echo $1 | cut -d '=' -f2` + if [ -z $passwd -o $passwd == 'None' ]; then + echo 'ERROR. Missing omero password parameter. Please, set Omero Password in your user preferences' > /dev/null >&2 + exit -1 + fi + CMD+=' '$1 + elif [[ $1 == $opt_runner* ]]; then + runner+=`echo $1 | cut -d '=' -f2` + elif [[ $1 == $opt_interpreter* ]]; then + interpreter=`echo $1 | cut -d '=' -f2` + else + CMD+=' '$1 + fi + shift +done +export $PYTH_PATH/:$PYTHONPATH +profile="/SHARE/USERFS/els7/users/biobank/lib/${HOST}.biobank.profile" +if [ -f $profile ]; then + source $profile + CMD=$interpreter' '$runner$CMD + $CMD +else + echo "ERROR. Biobank profile file doesn't exist. Please, check Omero Host in your user preferences" > /dev/null >&2 + exit -1 +fi diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/marker_alignment.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/marker_alignment.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,113 @@ + + import marker aligments within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + marker_alignment + #if str($study) != 'use_provided' + --study ${study} + #end if + #if str($ref_genome) + --ref-genome ${reg_genome} + #end if + #if str($message) + --message ${message} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + marker_vid ref_genome chromosome pos strand allele copies + V0909090 hg18 10 82938938 True A 1 + V0909091 hg18 1 82938999 True A 2 + V0909092 hg18 1 82938938 True B 2 + ... + +Since pos is relative to 5', if the marker has been aligned on the +other strand, it is the responsibility of the aligner app to report +the actual distance from 5', while, at the same time, registering that +the SNP has actually been aligned on the other strand. + +The chromosome field is an integer field with values in the [1, 26] +range, with 23-26 representing, respectively, the X chromosome, the Y +chromosome, the pseudoautosomal regions (XY) and the mitochondrial DNA +(MT). + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/marker_definition.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/marker_definition.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,92 @@ + + import Marker definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + marker_definition + #if str($study) != 'use_provided' + --study ${study} + #end if + --source ${source} + --context ${context} + --release ${release} + --ref-genome ${ref_genome} + --dbsnp-build ${dbsnp_build} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + label rs_label mask strand allele_a allele_b + SNP_A-1780419 rs6576700 [A/G] TOP A G + ... + +Where label is supposed to be the unique label for this marker in the +(source, context, release) context, rs_label is the dbSNP db label for +this snp (it could be the string ``None`` if not defined or not +known). The column mask contains the SNP definition. The strand column +could either be the actual 'illumina style' strand used to define the +alleles in the alleles columns, or the string 'None', which means that +the alleles in the allele column are defined wrt the mask in the +mask column. + +It will, for each row, convert the mask to the TOP strand following +Illumina conventions and then save a record for it in VL. The saved +tuple is (source, context, release, label, rs_label, TOP_mask). There +are no collision controls. + +It will output a a tsv file with the following columns:: + + study label type vid + ASTUDY SNP_A-xxx Marker V000002222 + ... + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/markers_set.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/markers_set.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,74 @@ + + import Marker definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + markers_set + #if str($study) != 'use_provided' + --study ${study} + #end if + #if str($label) + --label ${label} + #end if + #if str($maker) != 'use_provided' + --maker ${maker} + #end if + #if str($model) != 'use_provided' + --model ${model} + #end if + #if str($release) + --release ${release} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read in a tsv file with the following columns:: + + marker_vid marker_indx allele_flip + V902909090 0 False + V902909091 1 False + V902909092 2 True + ... + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/samples_container.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/samples_container.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,221 @@ + + import samples container definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + samples_container + #if str($study) != 'use_provided' + --study=${study} + #end if + #if str($container_type_selector.container_type) != 'use_provided' + --container-type=${container_type_selector.container_type} + #if str($container_type_selector.container_type) == 'TiterPlate' + #if str($container_type_selector.plate_shape) != 'use_provided' + --plate-shape=${container_type_selector.plate_shape} + #end if + #elif str($container_type_selector.container_type) == 'FlowCell' + #if str($container_type_selector.flow_cell_slots) != 'use_provided' + --number-of-slots=${container_type_selector.flow_cell_slots} + #end if + #elif str($container_type_selector.container_type) == 'IlluminaArrayOfArrays' + #if str($container_type_selector.ill_shape) != 'use_provided' + --plate-shape=${container_type_selector.ill_shape} + #end if + #if str($container_type_selector.ill_slots) != 'use_provided' + --number_of_slots=${container_type_selector.ill_slots} + #end if + #if str($container_type_selector.array_type) != 'use_provided' + --illumina-array-type=${container_type_selector.array_type} + #end if + #if str($container_type_selector.array_class) != 'use_provided' + --illumina-array-class=${container_type_selector.array_class} + #end if + #if str($container_type_selector.assay_type) != 'use_provided' + --illumina-assay-type=${container_type_selector.assay_type} + #end if + #end if + #end if + #if str($container_status) != 'use_provided' + --container-status=${container_status} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +A container record will have the following fields:: + + label container_status creation_date + A_CONTAINER USABLE 13/02/2012 + B_CONTAINER INSTOCK 12/01/2001 + C_CONTAINER USABLE 25/04/2012 + .... + +the creation_date column is optional, if not specified current date +will be set as the object's creation date, also the container_status +column can be optional if this values is passed as input parameter. + + +When importing new containers, special fields can be included in the +CSV file depending on the type of the objects that you want to +import. + +For TITER PLATES objects the syntax can be the following:: + + label barcode container_status rows columns + A_TITERPLATE XXYYZZ111 INSTOCK 8 12 + B_TITERPLATE XXYYZZ112 INSTOCK 8 12 + C_TITERPLATE XXYYZZ113 READY 8 12 + .... + +rows and columns values can be optional if these values are passed as +input parameters, barcode column is optional. + +For ILLUMINA ARRAY OF ARRAYS objects the syntax can be the following:: + + label barcode container_status rows columns illumina_array_type illumina_array_class illumina_assay_type + A_ILLARRAY XXYYZZ111 INSTOCK 4 2 BeadChip_12x1Q Slide Infinium_HD + B_ILLARRAY XXYYZZ112 INSTOCK 4 2 BeadChip_12x1Q Slide Infinium_HD + C_ILLARRAY XXYYZZ113 INSTOCK 4 2 BeadChip_12x1Q Slide Infinium_HD + +rows, columns, illumina_array_type, illumina_array_class and illumina_assay_type +can be optional if these values are passed as input parameters, barcode column +is optional. + +For FLOW CELL objects the syntax can be the following:: + + label barcode container_status number_of_slots + A_FLOWCELL XXYYZZ221 INSTOCK 8 + B_FLOWCELL XXYYZZ222 INSTOCK 8 + C_FLOWCELL XXYYZZ223 INSTOCK 8 + .... + +number_of_slots column can be optional if this value is passed as +input paramter, barcode column is optional. + +For LANE objects the syntax can be the following:: + + flow_cell slot container_status + V112441441 1 INSTOCK + V112441441 2 INSTOCK + V112441441 3 INSTOCK + V351145519 1 INSTOCK + V351145519 2 INSTOCK + .... + +for Lane objects, no label column has to be provided, the importer +will automatically calculate the labels for each imported object. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/sequencing_data_sample.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/sequencing_data_sample.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,184 @@ + + + Import sequencing related DataSample definitions within OMERO>biobank + + + launcher.sh + --interpreter=python + --runner=importer.py + #if $omero_configuration.level == 'advanced' + --host=$omero_configuration.vl_host + --user=$omero_configuration.vl_user + --passwd=$omero_configuration.vl_passwd + #else + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + #end if + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + seq_data_sample + #if str($study) != 'use_provided' + --study=${study} + #end if + #if str($source_type) != 'use_provided' + --source-type=${source_type} + #end if + #if str($seq_dsample_type) != 'use_provided' + --seq-dsample-type=${seq_dsample_type} + #end if + #if str($dsample_status) != 'use_provided' + --status=${dsample_status} + #end if + #if str($device) != 'use_provided' + --device=${device} + #end if + #if str($history) != 'None' + --history=${history} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will read a tsv file with the following columns:: + + study label source source_type seq_dsample_type status device + FOOBAR seq_out_1 V012141 FlowCell SequencerOutput USABLE V123141 + FOOBAR seq_out_2 V012141 FlowCell SequencerOutput USABLE V123141 + FOOBAR seq_out_3 V1AD124 FlowCell SequencerOutput USABLE V123141 + ... + +where + * seq_dsample_type can assume one of the following values: SequencerOutput, RawSeqDataSample, SeqDataSample + * source_type can assume one of the following values: FlowCell, SequencerOutput, RawSeqDataSample + +study, source_type, seq_dsample_type, status and device columns can be +overwritten by using command line options. + +A special case of the previous file is when seq_dsample_type is +SeqDataSample, in this case a mandatory sample column is required, +this column has to contain IDs of Tube objects. +The file will look like this + + study label source source_type seq_dsample_type status device sample + FOOBAR seq_dsample_1 V041241 SequencerOutput SeqDataSample USABLE VBB2351 V124AA41 + FOOBAR seq_dsample_2 V051561 SequencerOutput SeqDataSample USABLE VBB2351 V4151AAE + FOOBAR seq_dsample_3 V151561 SequencerOutput SeqDataSample USABLE VBB2351 V15199CD + ... + +A file containing ax export of the Galaxy history that produced the +data that are going to be imported can be passed as input parameter, +history details must represented as a string serialized in JSON +format. + + + + + + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/study.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/study.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,59 @@ + + import study definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + study + + + + + + + + + + + + + + + + + + + + + + + + + + + +Will import a stream of new study definitions defined by the following +tab-separated columns. A typical file will look like the following:: + + label description + BSTUDY A basically empty description of BSTUDY + CSTUDY A basically empty description of CSTUDY + .... + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/unauthorized_access.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/unauthorized_access.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,6 @@ +# BEGIN_COPYRIGHT +# END_COPYRIGHT + +import sys + +sys.exit("You are not authorized to use this tool") diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/importer/vessels_collection.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/importer/vessels_collection.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,94 @@ + + import VesselsCollection definitions within omero/vl + + launcher.sh + --interpreter=python + --runner=importer.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --ifile=${input} + --ofile=${output} + --report_file=${report} + --logfile=${logfile} + #if $blocking_validation + --blocking-validator + #end if + vessels_collection + #if str($study) != 'use_provided' + --study ${study} + #end if + #if str($vessel_type) != 'use_provided' + --vessel_type=${vessel_type} + #end if + #if str($label) + --label=${label} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +The input file to correctly import collections of vessels must have the following +format:: + + label vessel vessel_type + COLLECTION-A V1234545 Tube + COLLECTION-A V1212434 Tube + COLLECTION-A V3434176 Tube + COLLECTION-B V2321001 Tube + COLLECTION-B V1210402 Tube + .... + +Column 'label' contains the names of the collections to be imported, while 'vessel' +contains the VID of the tubes or the plates being part of the collections. + + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/all_enrollments.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/all_enrollments.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,89 @@ +import csv, os, sys, argparse + +from bl.vl.kb import KnowledgeBase as KB +import bl.vl.utils.ome_utils as vlu +from bl.vl.utils import LOG_LEVELS, get_logger + + +def make_parser(): + parser = argparse.ArgumentParser(description='Retrieve all enrollments') + parser.add_argument('--logfile', type=str, help='log file (default=stderr)') + parser.add_argument('--loglevel', type=str, choices = LOG_LEVELS, + help='logger level', default='INFO') + parser.add_argument('--host', type=str, help='omero hostname') + parser.add_argument('--user', type=str, help='omero user') + parser.add_argument('--passwd', type=str, help='omero password') + parser.add_argument('--ofile', type=str, help='output file path', + required=True) + return parser + + +def main(argv): + parser = make_parser() + args = parser.parse_args(argv) + + # This is a temporary hack!!! + to_be_ignored = ['IMMUNOCHIP_DISCARDED', 'CASI_MS_CSM_TMP', + 'CASI_MS_CSM_CODES'] + + logger = get_logger('all_enrollments', level=args.loglevel, + filename=args.logfile) + + try: + host = args.host or vlu.ome_host() + user = args.user or vlu.ome_user() + passwd = args.passwd or vlu.ome_passwd() + except ValueError, ve: + logger.critical(ve) + sys.exit(ve) + + try: + out_file_path = args.ofile + except IndexError: + logger.error('Mandatory field missing.') + parser.print_help() + sys.exit(2) + + # Create the KnowledgeBase object + kb = KB(driver='omero')(host, user, passwd) + + # Retrieve all studies from omero + studies = kb.get_objects(kb.Study) + studies = [s for s in studies if s.label not in to_be_ignored] + logger.info('Retrieved %d studies from database' % len(studies)) + + csv_header = ['individual_uuid'] + enrolls_map = {} + + # For each study, retrieve all enrollments + for s in studies: + logger.info('Retrieving enrollments for study %s' % s.label) + enrolls = kb.get_enrolled(s) + logger.info('%s enrollments retrieved' % len(enrolls)) + if len(enrolls) > 0: + logger.debug('Building lookup dictionary....') + for e in enrolls: + enrolls_map.setdefault(e.individual.omero_id, {})['individual_uuid'] = e.individual.id + enrolls_map[e.individual.omero_id].setdefault('studies', {}) + enrolls_map[e.individual.omero_id]['studies'].setdefault(s.label,[]) + enrolls_map[e.individual.omero_id]['studies'][s.label].append(e.studyCode) + label = "{0} #{1}".format(s.label,len(enrolls_map[e.individual.omero_id]['studies'][s.label])) + enrolls_map[e.individual.omero_id][label] = e.studyCode + if label not in csv_header: + csv_header.append(label) # Add study label to CSV header + else: + logger.debug('No enrollments found, skip study %s' % s.label) + + # Write to CSV file + logger.debug('Writing CSV file %s' % out_file_path) + with open(out_file_path, 'w') as f: + writer = csv.DictWriter(f, csv_header, + delimiter='\t', quotechar='"', + restval = 'None') + writer.writeheader() + for k, v in enrolls_map.iteritems(): + v.pop("studies",{}) + writer.writerow(v) + +if __name__ == '__main__': + main(sys.argv[1:]) diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/all_enrollments.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/all_enrollments.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,29 @@ + + + Retrieve all enrollments codes from Omero server + + + launcher.sh + --interpreter=python + --runner=all_enrollments.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --ofile=${output1} + --logfile=${logfile} + + + + + + + + + + + + + It will output a tsv files with the following columns: + + + \ No newline at end of file diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/build_miniped.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/build_miniped.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,216 @@ +# BEGIN_COPYRIGHT +# END_COPYRIGHT + +""" +A rough example of basic pedigree info generation. +""" + +import argparse +import collections +import csv +import os +import sys +import yaml + +from bl.vl.kb import KnowledgeBase as KB +from bl.vl.kb.drivers.omero.ehr import EHR +import bl.vl.individual.pedigree as ped +import bl.vl.utils.ome_utils as vlu +from bl.vl.utils import LOG_LEVELS, get_logger + +PLINK_MISSING = -9 +PLINK_UNAFFECTED = 1 +PLINK_AFFECTED = 2 + +FIELDS = ["fam_label", "ind_label", "fat_label", "mot_label", "gender"] + + +def load_config(config_file): + with open(config_file) as cfg: + conf = yaml.load(cfg) + return conf + + +class Diagnosis(object): + def __init__(self, logger, yaml_file): + self.logger = logger + if os.path.isfile(yaml_file): + self.conf = load_config(yaml_file) + else: + self.logger.critical('The config file {} does not exists'.format( + yaml_file)) + sys.exit() + + def get_openehr_data(self): + return self.conf['openEHR']['archetype'], self.conf['openEHR']['field'] + + def get_diagnosis_label(self): + return [l for l in self.get_diagnosis().iterkeys()] + + def get_diagnosis(self): + results = collections.OrderedDict() + diagnosis = collections.OrderedDict(sorted( + self.conf['diagnosis'].items())) + for v in diagnosis.itervalues(): + results[v['label']] = v['values'] + return results + + def get_unaffected_diagnosis_dictionary(self): + labels = self.get_diagnosis_label() + d = {} + for k in labels: + d[k] = 1 + return d + + +def make_parser(): + parser = argparse.ArgumentParser( + description='build the first columns of a ped file from VL') + parser.add_argument('--logfile', type=str, help='log file (default=stderr)') + parser.add_argument('--loglevel', type=str, choices=LOG_LEVELS, + help='logging level', default='INFO') + parser.add_argument('--configfile', type=str, default=os.path.join( + os.path.dirname(os.path.realpath(__file__)), 'build_miniped.yaml'), + help='config file (yaml) with diagnosis dictionary') + parser.add_argument('-H', '--host', type=str, help='omero hostname') + parser.add_argument('-U', '--user', type=str, help='omero user') + parser.add_argument('-P', '--passwd', type=str, help='omero password') + parser.add_argument('-S', '--study', type=str, required=True, + help="a list of comma separated studies used to " + "retrieve individuals that will be written to " + "ped file") + parser.add_argument('--ofile', type=str, help='output file path', + required=True) + parser.add_argument('--write_header', action='store_true', default=False, + help='Write header into the output file') + return parser + + +def build_families(individuals, logger): + # Individuals with only one parent will be considered like founders + # for i in individuals: + # if ((i.mother is None) or (i.father is None)): + # i.mother = None + # i.father = None + logger.info("individuals: %d" % len(individuals)) + # logger.info("individuals: with 0 or 2 parents: %d" % len(not_one_parent)) + logger.info("analyzing pedigree") + founders, non_founders, dangling, couples, children = ped.analyze( + individuals + ) + logger.info("splitting into families") + return ped.split_disjoint(individuals, children) + + +def main(argv): + parser = make_parser() + args = parser.parse_args(argv) + + logger = get_logger('build_miniped', level=args.loglevel, + filename=args.logfile) + + dobj = Diagnosis(logger, args.configfile) + logger.debug("l {}".format(dobj.get_diagnosis_label())) + + try: + host = args.host or vlu.ome_host() + user = args.user or vlu.ome_user() + passwd = args.passwd or vlu.ome_passwd() + except ValueError, ve: + logger.critical(ve) + sys.exit(ve) + + kb = KB(driver='omero')(host, user, passwd) + logger.debug('Loading all individuals from omero') + all_inds = kb.get_objects(kb.Individual) # store all inds to cache + logger.debug('%d individuals loaded' % len(all_inds)) + studies = [kb.get_study(s) for s in args.study.split(',')] + # Removing None values + studies = set(studies) + try: + studies.remove(None) + except KeyError: + pass + studies = list(studies) + # Sorting studies + studies = sorted(studies, key=lambda k: k.label.lower()) + if len(studies) == 0: + logger.error( + 'No matches found for labels %s, stopping program' % args.study) + sys.exit(2) + enrolled_map = {} + for study in studies: + logger.info('Loading enrolled individuals for study %s' % study.label) + enrolled = kb.get_enrolled(study) + logger.debug('%d individuals loaded' % len(enrolled)) + for en in enrolled: + if en.individual.id not in enrolled_map: + enrolled_map[en.individual.id] = ( + '%s:%s' % (en.study.label, en.studyCode), + en.individual) + else: + logger.debug('Individual %s already mapped' % en.individual.id) + logger.debug('Loading EHR records') + ehr_records = kb.get_ehr_records() + logger.debug('%s EHR records loaded' % len(ehr_records)) + ehr_records_map = {} + for r in ehr_records: + ehr_records_map.setdefault(r['i_id'], []).append(r) + affection_map = {} + arch, field = dobj.get_openehr_data() + for ind_id, ehr_recs in ehr_records_map.iteritems(): + affection_map[ind_id] = dobj.get_unaffected_diagnosis_dictionary() + ehr = EHR(ehr_recs) + for k, v in dobj.get_diagnosis().iteritems(): + for d in v: + if ehr.matches(arch, field, d): + affection_map[ind_id][k] = PLINK_AFFECTED + + immuno_inds = [i for (ind_id, (st_code, i)) in enrolled_map.iteritems()] + families = build_families(immuno_inds, logger) + logger.info("found %d families" % len(families)) + + def resolve_label(i): + try: + return enrolled_map[i.id][0] + except KeyError: + return i.id + + def resolve_pheno(i): + try: + immuno_affection = affection_map[i.id] + except KeyError: + return [(d, PLINK_MISSING) for d in dobj.get_diagnosis_label()] + return [(d, immuno_affection[d]) for d in dobj.get_diagnosis_label()] + + kb.Gender.map_enums_values(kb) + gender_map = lambda x: 2 if x.enum_label() == kb.Gender.FEMALE.enum_label() \ + else 1 + + for d in dobj.get_diagnosis_label(): + FIELDS.append("_".join([d, "status"])) + with open(args.ofile, "w") as f: + writer = csv.DictWriter(f, FIELDS, delimiter="\t", lineterminator="\n") + if args.write_header: + writer.writeheader() + families_data = [] + logger.info("building families data") + for k, fam in enumerate(families): + fam_label = "FAM_%d" % (k + 1) + for i in fam: + r = {"fam_label": fam_label, + "ind_label": resolve_label(i), + "fat_label": 0 if (i.father is None or i.father not in fam) + else resolve_label(i.father), + "mot_label": 0 if (i.mother is None or i.mother not in fam) + else resolve_label(i.mother), + "gender": gender_map(i.gender)} + for p in resolve_pheno(i): + r["_".join([p[0], "status"])] = p[1] + families_data.append(r) + logger.info("writing miniped") + writer.writerows(families_data) + + +if __name__ == "__main__": + main(sys.argv[1:]) diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/build_miniped.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/build_miniped.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,58 @@ + + + Build a reduced ped file from Omero server + + + launcher.sh + --interpreter=python + --runner=build_miniped.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + #if $study + --study=${study} + #end if + --ofile=${output1} + --logfile=${logfile} + #if $write_header + --write_header + #end if + + + + + + + + + + + + + + + It will output a tsv files with a column of codes for each groups of samples. + + The labels of the columns are: + + family + + individual enrollment code (STUDY:CODE) + + father enrollment code (STUDY:CODE) + + mother enrollment code (STUDY:CODE) + + gender + + T1D affection status + + MS affection status + + Nefro affection status + + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/build_miniped.yaml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/build_miniped.yaml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,18 @@ +openEHR: + archetype: openEHR-EHR-EVALUATION.problem-diagnosis.v1 + field: at0002.1 +diagnosis: + 1: + label: t1d + values: + - icd10-cm:E10 + 2: + label: ms + values: + - icd10-cm:G35 + 3: + label: nefro + values: + - icd10-cm:E23.2 + - icd10:N00-N08 + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/kb_query.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/kb_query.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,9 @@ +# BEGIN_COPYRIGHT +# END_COPYRIGHT + +import sys +from bl.vl.app.kb_query.main import main as kb_query + +kb_query(sys.argv[1:]) + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/launcher.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/launcher.sh Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,56 @@ +#!/bin/sh + +CMD="" +PYTH_PATH="PYTHONPATH=/SHARE/USERFS/els7/users/galaxy/develop/usr-cluster/lib/p\ +ython2.7/site-packages/:/SHARE/USERFS/els7/users/biobank/lib/" +runner="$(dirname ${BASH_SOURCE[0]})/" +until [ -z $1 ] + do + + opt_host='--host=' + opt_user='--user=' + opt_passwd='--passwd=' + opt_interpreter='--interpreter=' + opt_runner='--runner=' + if [[ $1 == $opt_host* ]]; then + host=`echo $1 | cut -d '=' -f2 | cut -d '.' -f1` + if [ -z $host -o $host == 'None' ]; then + echo 'ERROR. Missing omero host parameter. Please, set Omero Host in your user preferences' > /dev/null >&2 + exit -1 + fi + PYTH_PATH+=$host + HOST=`echo $1 | cut -d '=' -f2` + CMD+=' '$1 + elif [[ $1 == $opt_user* ]]; then + user=`echo $1 | cut -d '=' -f2` + if [ -z $user -o $user == 'None' ]; then + echo 'ERROR. Missing omero user parameter. Please, set Omero User in your user preferences' > /dev/null >&2 + exit -1 + fi + CMD+=' '$1 + elif [[ $1 == $opt_passwd* ]]; then + passwd=`echo $1 | cut -d '=' -f2` + if [ -z $passwd -o $passwd == 'None' ]; then + echo 'ERROR. Missing omero password parameter. Please, set Omero Password in your user preferences' > /dev/null >&2 + exit -1 + fi + CMD+=' '$1 + elif [[ $1 == $opt_runner* ]]; then + runner+=`echo $1 | cut -d '=' -f2` + elif [[ $1 == $opt_interpreter* ]]; then + interpreter=`echo $1 | cut -d '=' -f2` + else + CMD+=' '$1 + fi + shift +done +export $PYTH_PATH/:$PYTHONPATH +profile="/SHARE/USERFS/els7/users/biobank/lib/${HOST}.biobank.profile" +if [ -f $profile ]; then + source $profile + CMD=$interpreter' '$runner$CMD + $CMD +else + echo "ERROR. Biobank profile file doesn't exist. Please, check Omero Host in your user preferences" > /dev/null >&2 + exit -1 +fi diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/map_vid.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/map_vid.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,25 @@ +# BEGIN_COPYRIGHT +# END_COPYRIGHT + +import sys +from bl.vl.app.kb_query.main import main as kb_query + +def main(argv): + selected_column, new_column_name, input_file = argv[:3] + selected_column = int(selected_column) - 1 + new_column_name = new_column_name.strip() + + # with open(input_file) as f: + # l = f.readline().strip() + # Backport to 2.6 + fi = open(input_file) + l = fi.readline().strip() + fi.close() + + column_names = l.split('\t') + column_name = column_names[selected_column] + + argv = argv[3:] + ['--column=%s,%s' % (column_name, new_column_name)] + kb_query(argv) + +main(sys.argv[1:]) diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/map_vid.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/map_vid.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,145 @@ + + + Map labels of objects known to OMERO.biobank to their VID + + + launcher.sh + --interpreter=python + --runner=map_vid.py + ${selected_column} + ${new_column_name} + ${input1} + #if $omero_configuration.level == 'advanced' + --host=$omero_configuration.vl_host + --user=$omero_configuration.vl_user + --passwd=$omero_configuration.vl_passwd + #else + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + #end if + --operator=$__user_email__ + --ofile=${output1} + --logfile=${logfile} + map_vid + --ifile=${input1} + --source-type=${source_type.source_type} + #if $source_type.source_type == 'Individual' + #if str($source_type.study) != 'use_provided' + --study=${source_type.study} + #end if + #end if + #if $strict_mapping + --strict-mapping + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +The tool resolves VIDs for the given column and rename the column +iteself with a new label. Usually to map the items' VIDs the simple +item label is necessary but in some cases a special syntax is needed: + +* for Individual items, if no default study is provided, the pattern + to be used is **STUDY:STUDY_LABEL**. If a default study is provided, + the column must contain only the STUDY_LABEL + +* for PlateWell items the pattern is **PLATE_LABEL:WELL_LABEL** + +* for DataCollectionItem items the pattern is + **DATA_COLLECTION_LABEL:ITEM_LABEL** + + + + + + + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/tools/plate_dsamples_details.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/tools/plate_dsamples_details.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,86 @@ + + + Retrieve wells and connected data samples related to a known plate + + + launcher.sh + --interpreter=python + --runner=kb_query.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --logfile=${logfile} + --ofile=${output} + plate_data_samples + #if str($plate) != 'select_one' + --plate=${plate} + #end if + #if $fetch_all + --fetch_all + #end if + #if str($vcoll_label) != 'no_collection' + --vessels_collection=${vcoll_label} + #end if + #if $vessel_types + --ignore_types=${vessel_types} + #end if + #if str($study_label) != 'no_study' + --map_study=${study_label} + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + +Using of the the selectable plates barcode, the tool will generate a +report file for the plate like:: + + PLATE_barcode PLATE_label WELL_label WELL_status DATA_SAMPLE_label + XXYYZZKK test_plate A01 CONTENTUSABLE a01_test_sample + XXYYZZKK test_plate A02 CONTENTUSABLE X + XXYYZZKK test_plate A03 UNKNOWN OR EMPTY X + XXYYZZKK test_plate A04 CONTENTUSABLE a04_test_sample + XXYYZZKK test_plate A05 DISCARDED X + ... + +For each plate, all wells will be generated in the output file, even +the ones not actually recorded into the system, these wells will be +marked with a 'UNKOWN OR EMPTY' status. + +For each well, the tool performs a query in order to find if at least +one data sample is directly connected to the well itself; if at least +one is found, the label of the data sample will be placed in the +DATA_SAMPLE_label column, if no data sample is connected to the well a +'X' will be placed. + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/change_source_item.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/change_source_item.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,258 @@ +# The tool changes the source of an object inside the system. +# Expected input file format is +# +# target new_source +# V1415515 V1241441 +# V1351124 V1511141 +# ..... +# +# Where target is the object whose source will be changed with the +# new_source object. New source type will be specified using the +# command line option. + +import csv, argparse, sys, os, json, time + +from bl.vl.kb import KnowledgeBase as KB +import bl.vl.utils.ome_utils as vlu +from bl.vl.utils import get_logger, LOG_LEVELS +import omero +import omero.model + + +def make_parser(): + parser = argparse.ArgumentParser(description='change the source for given items') + parser.add_argument('--logfile', type=str, help='log file (default=stderr)') + parser.add_argument('--loglevel', type=str, choices=LOG_LEVELS, + help='logger level', default='INFO') + parser.add_argument('-H', '--host', type=str, help='omero hostname') + parser.add_argument('-U', '--user', type=str, help='omero user') + parser.add_argument('-P', '--passwd', type=str, help='omero password') + parser.add_argument('--operator', type=str, required=True, + help='operator username') + parser.add_argument('--in_file', type=str, required=True, + help='list of items with new sources') + parser.add_argument('--target_type', type=str, required=True, + help='type of the target objects') + parser.add_argument('--source_type', type=str, required=True, + help='type of the new source objects') + return parser + + +def do_check(records, targets, sources, + target_type, source_type, + kb, logger): + logger.info('Starting consistency checks') + src_map = dict([(s.id, s) for s in sources]) + trg_map = dict([(t.id, t) for t in targets]) + good_records = [] + targets = {} + sources = {} + for i, r in enumerate(records): + if r['target'] not in trg_map: + logger.warning('No %s with ID %s, rejecting record %d' % (target_type, + r['target'], i)) + continue + if r['new_source'] not in src_map: + logger.warning('No %s with ID %s, rejecting record %d' % (source_type, + r['new_source'], i)) + continue + targets[r['target']] = trg_map[r['target']] + sources[r['new_source']] = src_map[r['new_source']] + good_records.append(r) + logger.info('Done with consistency checks') + return good_records, targets, sources + + +def update_data(records, targets, sources, operator, act_conf, + kb, logger, batch_size = 500): + def get_chunk(batch_size, records): + offset = 0 + while len(records[offset:]) > 0: + yield records[offset:offset+batch_size] + offset += batch_size + dev = get_device(kb, logger) + for i, recs in enumerate(get_chunk(batch_size, records)): + logger.info('Updating batch %d' % i) + batch_to_save = [] + edges_to_delete = [] + for r in recs: + target = targets[r['target']] + # Build the ActionOnAction backup object + if not target.lastUpdate: + last_action = target.action + else: + last_action = target.lastUpdate + old_action = target.action + asconf = {'backup' : {'action' : old_action.id}} + aslabel = 'updater.update_source_item-%f' % time.time() + backup = build_action(operator, old_action.context, + dev, last_action, aslabel, + asconf, kb, logger) + target.lastUpdate = backup + # Build the Action in order to attach the new source to + # the target object + new_source = sources[r['new_source']] + if new_source.is_mapped: + new_source.unload() + asconf = act_conf + aslabel = 'updater.update_source_item-%f' % time.time() + new_act = build_action(operator, old_action.context, + dev, new_source, aslabel, + asconf, kb, logger) + target.action = new_act + if old_action.OME_TABLE == 'Action': + # no old source, just save the new action + batch_to_save.append(target) + else: + # check if the old target and the new one are different + if new_source != old_action.target: + batch_to_save.append(target) + edges_to_delete.append((old_action.target, target)) + if len(batch_to_save) > 0: + kb.save_array(batch_to_save) + else: + logger.info('No record need to be updated') + for vert in edges_to_delete: + kb.dt.destroy_edge(*vert) + + +def build_action(operator, context, device, target, + action_setup_label, action_setup_conf, + kb, logger): + if action_setup_label: + asetup = get_action_setup(action_setup_label, action_setup_conf, + kb, logger) + else: + asetup = None + aconf = { + 'device' : device, + 'actionCategory' : kb.ActionCategory.IMPORT, + 'operator' : 'operator', + 'context' : context, + 'target' : target, + } + if asetup: + aconf['setup'] = asetup + action = kb.factory.create(retrieve_action_type(target, kb), aconf) + return action + + +def retrieve_action_type(target, kb): + tklass = target.ome_obj.__class__.__name__ + for i, k in enumerate(target.ome_obj.__class__.__mro__): + if k is omero.model.IObject: + tklass = target.ome_obj.__class__.__mro__[i-1].__name__ + if tklass == 'Vessel': + return kb.ActionOnVessel + elif tklass == 'Individual': + return kb.ActionOnIndividual + elif tklass == 'DataSample': + return kb.ActionOnDataSample + elif tklass == 'DataCollectionItem': + return kb.ActionOnDataCollectionItem + elif tklass == 'Action': + return kb.ActionOnAction + # elif tklass == 'VLCollection': + # return kb.ActionOnCollection + else: + raise ValueError('No Action related to %s klass' % tklass) + + +def get_action_setup(label, conf, kb, logger): + asetup_conf = { + 'label' : label, + 'conf' : json.dumps(conf), + } + asetup = kb.factory.create(kb.ActionSetup, asetup_conf) + return asetup + + +def get_device(kb, logger): + dev_model = 'UPDATE' + dev_maker = 'CRS4' + dev_release = '0.1' + dev_label = 'updater-%s.update_source_item' % dev_release + device = kb.get_device(dev_label) + if not device: + logger.debug('No device with label %s, creating one' % dev_label) + conf = { + 'maker' : dev_maker, + 'model' : dev_model, + 'release' : dev_release, + 'label' : dev_label, + } + device = kb.factory.create(kb.Device, conf).save() + return device + + +def find_action_setup_conf(args): + action_setup_conf = {} + for x in dir(args): + if not (x.startswith('_') or x.startswith('func')): + action_setup_conf[x] = getattr(args, x) + if 'passwd' in action_setup_conf: + action_setup_conf.pop('passwd') # Storing passwords into an + # Omero obj is not a great idea... + return action_setup_conf + + +def main(argv): + parser = make_parser() + args = parser.parse_args(argv) + + logger = get_logger('change_source_item', level=args.loglevel, + filename=args.logfile) + + try: + host = args.host or vlu.ome_host() + user = args.user or vlu.ome_user() + passwd = args.passwd or vlu.ome_passwd() + except ValueError, ve: + logger.critical(ve) + sys.exit(ve) + + kb = KB(driver='omero')(host, user, passwd) + logger.info('Loading data from input file') + with open(args.in_file) as f: + reader = csv.DictReader(f, delimiter='\t') + records = list(reader) + logger.info('Loaded %d records' % len(records)) + + logger.info('Loading %s type objects' % args.target_type) + targets = kb.get_objects(getattr(kb, args.target_type)) + logger.info('Loaded %d objects' % len(targets)) + if len(targets) == 0: + msg = 'No targets loaded from the system, nothing to do' + logger.critical(msg) + sys.exit(msg) + + logger.info('Loading %s type objects' % args.source_type) + sources = kb.get_objects(getattr(kb, args.source_type)) + logger.info('Loaded %d objects' % len(sources)) + if len(sources) == 0: + msg = 'No sources loaded from the system, nothing to do' + logger.critical(msg) + sys.exit(msg) + + logger.info('Loading Action type objects') + acts = kb.get_objects(kb.Action) + logger.info('Loaded %d objects' % len(acts)) + + records, targets, sources = do_check(records, targets, sources, + args.target_type, args.source_type, + kb, logger) + if len(records) == 0: + msg = 'No records passed consistency checks, nothing to do' + logger.critical(msg) + sys.exit(msg) + + aconf = find_action_setup_conf(args) + + update_data(records, targets, sources, args.operator, + aconf, kb, logger) + + logger.info('Job completed') + + +if __name__ == '__main__': + main(sys.argv[1:]) diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/change_source_item.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/change_source_item.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,41 @@ + + + Change source items for given objects + + + launcher.sh + --interpreter=python + --runner=change_source_item.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --logfile=${logfile} + --in_file=${infile} + --target_type=${target_type} + --source_type=${source_type} + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/discard_from_collection.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/discard_from_collection.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,91 @@ +import csv, argparse, sys, os + +from bl.vl.kb import KnowledgeBase as KB +import bl.vl.utils.ome_utils as vlu +from bl.vl.utils import get_logger, LOG_LEVELS + +COLLECTION_TYPES = {'VesselsCollection' : 'VesselsCollectionItem', + 'DataCollection' : 'DataCollectionItem'} + +def make_parser(): + parser = argparse.ArgumentParser(description='remove elements from a Vessels or Data Collection') + parser.add_argument('--logfile', type=str, help='log file (default=stderr)') + parser.add_argument('--loglevel', type=str, choices=LOG_LEVELS, + help='logger level', default='INFO') + parser.add_argument('-H', '--host', type=str, help='omero hostname') + parser.add_argument('-U', '--user', type=str, help='omero user') + parser.add_argument('-P', '--passwd', type=str, help='omero password') + parser.add_argument('-I', '--ifile', type=str, required=True, + help='list of collection items that will be removed') + parser.add_argument('--collection_type', type=str, required=True, + choices=COLLECTION_TYPES.keys(), + help='type of the collection') + parser.add_argument('--collection_label', type=str, required=True, + help='label of the collection') + + return parser + +def load_collection(coll_type, coll_label, kb): + query = 'SELECT coll FROM %s coll WHERE coll.label = :coll_label' % coll_type + coll = kb.find_all_by_query(query, {'coll_label' : coll_label}) + return coll[0] if len(coll) > 0 else None + +def load_collection_items(collection, coll_type, kb): + if COLLECTION_TYPES[coll_type] == 'VesselsCollectionItem': + citems = kb.get_vessels_collection_items(collection) + elif COLLECTION_TYPES[coll_type] == 'DataCollectionItem': + citems = kb.get_data_collection_items(collection) + else: + raise ValueError('Unknown data collection type %s' % COLLECTION_TYPES[coll_type]) + ci_map = {} + for ci in citems: + ci_map[ci.id] = ci + return ci_map + + +def main(argv): + parser = make_parser() + args = parser.parse_args(argv) + + logger = get_logger('discard_from_collection', level=args.loglevel, + filename=args.logfile) + + try: + host = args.host or vlu.ome_host() + user = args.user or vlu.ome_user() + passwd = args.passwd or vlu.ome_passwd() + except ValueError, ve: + logger.critical(ve) + sys.exit(ve) + + kb = KB(driver='omero')(host, user, passwd) + logger.info('Loading collection %s from %s' % (args.collection_label, + args.collection_type)) + coll = load_collection(args.collection_type, args.collection_label, kb) + if not coll: + msg = 'No %s found with label %s' % (args.collection_type, + args.collection_label) + logger.error(msg) + sys.exit(msg) + logger.info('Loading items from collection') + coll_items = load_collection_items(coll, args.collection_type, kb) + logger.info('Fetched %d elements' % len(coll_items)) + + with open(args.ifile) as infile: + reader = csv.DictReader(infile, delimiter='\t') + to_be_deleted = [row['collection_item'] for row in reader] + logger.info('Found %d items to be deleted' % len(to_be_deleted)) + + for tbd in to_be_deleted: + try: + kb.delete(coll_items[tbd]) + logger.info('%s with ID %s deleted' % (COLLECTION_TYPES[args.collection_type], + tbd)) + except KeyError, ke: + logger.warning('No %s related to ID %s' % (COLLECTION_TYPES[args.collection_type], + ke)) + logger.info('Job completed') + + +if __name__ == '__main__': + main(sys.argv[1:]) diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/discard_from_collection.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/discard_from_collection.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,79 @@ + + Discard input elements from the selected collection + + launcher.sh + --interpreter=python + --runner=discard_from_collection.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --logfile=${logfile} + --ifile=${ifile} + #if str($collection_selector.collection_type) != 'no_coll_selected' + --collection_type=${collection_selector.collection_type} + #if str($collection_selector.collection_type) == 'DataCollection' + #if str($collection_selector.dcoll_label) != 'no_label_selected' + --collection_label=${collection_selector.dcoll_label} + #end if + #elif str($collection_selector.collection_type) == 'VesselsCollection' + #if str($collection_selector.vcoll_label) != 'no_label_selected' + --collection_label=${collection_selector.vcoll_label} + #end if + #end if + #end if + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +This tool discard from a DataCollection or a VesselCollection one or +more items. + +The expected input file must be like + ++---------------+ +|collection_item| ++---------------+ +|V013AFF22311 | ++---------------+ +|V0ABB3451516 | ++---------------+ +|V012441AAEEC | ++---------------+ + +Input file rows must be VIDs obtained using the **map_vid** tool. + +Collection must be selected using the specific selection lists that +show only the ones imported into the system. + + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/launcher.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/launcher.sh Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,56 @@ +#!/bin/sh + +CMD="" +PYTH_PATH="PYTHONPATH=/SHARE/USERFS/els7/users/galaxy/develop/usr-cluster/lib/p\ +ython2.7/site-packages/:/SHARE/USERFS/els7/users/biobank/lib/" +runner="$(dirname ${BASH_SOURCE[0]})/" +until [ -z $1 ] + do + + opt_host='--host=' + opt_user='--user=' + opt_passwd='--passwd=' + opt_interpreter='--interpreter=' + opt_runner='--runner=' + if [[ $1 == $opt_host* ]]; then + host=`echo $1 | cut -d '=' -f2 | cut -d '.' -f1` + if [ -z $host -o $host == 'None' ]; then + echo 'ERROR. Missing omero host parameter. Please, set Omero Host in your user preferences' > /dev/null >&2 + exit -1 + fi + PYTH_PATH+=$host + HOST=`echo $1 | cut -d '=' -f2` + CMD+=' '$1 + elif [[ $1 == $opt_user* ]]; then + user=`echo $1 | cut -d '=' -f2` + if [ -z $user -o $user == 'None' ]; then + echo 'ERROR. Missing omero user parameter. Please, set Omero User in your user preferences' > /dev/null >&2 + exit -1 + fi + CMD+=' '$1 + elif [[ $1 == $opt_passwd* ]]; then + passwd=`echo $1 | cut -d '=' -f2` + if [ -z $passwd -o $passwd == 'None' ]; then + echo 'ERROR. Missing omero password parameter. Please, set Omero Password in your user preferences' > /dev/null >&2 + exit -1 + fi + CMD+=' '$1 + elif [[ $1 == $opt_runner* ]]; then + runner+=`echo $1 | cut -d '=' -f2` + elif [[ $1 == $opt_interpreter* ]]; then + interpreter=`echo $1 | cut -d '=' -f2` + else + CMD+=' '$1 + fi + shift +done +export $PYTH_PATH/:$PYTHONPATH +profile="/SHARE/USERFS/els7/users/biobank/lib/${HOST}.biobank.profile" +if [ -f $profile ]; then + source $profile + CMD=$interpreter' '$runner$CMD + $CMD +else + echo "ERROR. Biobank profile file doesn't exist. Please, check Omero Host in your user preferences" > /dev/null >&2 + exit -1 +fi diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/merge_individuals.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/merge_individuals.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,223 @@ +#======================================= +# This tool moves all informations related to an individual (source) to +# another (target). Moved informations are: +# * children (Individual objects) +# * ActionOnInvidual +# * Enrollments +# * EHR records +# +# The tool expects as input a TSV file like this +# source target +# V0468D2D96999548BF9FC6AD24C055E038 V060BAA01C662240D181BB98A51885C498 +# V029CC0A614E2D42D0837602B15193EB58 V01B8122A7C75A452E9F80381CEA988557 +# V0B20C93E8A88D43EFB87A7E6911292A05 V0BED85E8E76A54AA7AB0AFB09F95798A8 +# ... +# +# NOTE WELL: +# * Parents of the "source" indivudal WILL NOT BE ASSIGNED +# to the "target" individual +# * For the Enrollmnent objects, if +# "target" individual has already a code in the same study of "source" +# individual, the script will try to move the Enrollment to the +# "duplicated" study (this will be fixed when a proper ALIASES +# manegement will be introduced) +# ======================================= + +import sys, argparse, csv, time, json, os + +from bl.vl.kb import KnowledgeBase as KB +from bl.vl.kb import KBError +import bl.vl.utils.ome_utils as vlu +from bl.vl.utils import get_logger, LOG_LEVELS + + +def make_parser(): + parser = argparse.ArgumentParser(description='merge informations related to an individual ("source") to another one ("target")') + parser.add_argument('--logfile', type=str, help='log file (default=stderr)') + parser.add_argument('--loglevel', type=str, choices = LOG_LEVELS, + help='logging level (default=INFO)', default='INFO') + parser.add_argument('-H', '--host', type=str, help='omero hostname') + parser.add_argument('-U', '--user', type=str, help='omero user') + parser.add_argument('-P', '--passwd', type=str, help='omero password') + parser.add_argument('-O', '--operator', type=str, help='operator', + required=True) + parser.add_argument('--in_file', type=str, required = True, + help='input TSV file') + return parser + + +def update_object(obj, backup_values, operator, kb, logger): + logger.debug('Building ActionOnAction for object %s::%s' % + (obj.get_ome_table(), + obj.id) + ) + act_setup = build_action_setup('merge-individuals-%f' % time.time(), + backup_values, kb) + aoa_conf = { + 'setup': act_setup, + 'actionCategory' : kb.ActionCategory.UPDATE, + 'operator': operator, + 'target': obj.lastUpdate if obj.lastUpdate else obj.action, + 'context': obj.action.context + } + logger.debug('Updating object with new ActionOnAction') + obj.lastUpdate = kb.factory.create(kb.ActionOnAction, aoa_conf) + + +def build_action_setup(label, backup, kb, logger): + logger.debug('Creating a new ActionSetup with label %s and backup %r' % (label, backup)) + conf = { + 'label': label, + 'conf': json.dumps({'backup' : backup}) + } + asetup = kb.factory.create(kb.ActionSetup, conf) + return asetup + + +def update_children(source_ind, target_ind, operator, kb, logger): + if source_ind.gender.enum_label() == kb.Gender.MALE.enum_label(): + parent_type = 'father' + elif source_ind.gender.enum_label() == kb.Gender.FEMALE.enum_label(): + parent_type = 'mother' + else: + raise ValueError('%s is not a valid gender value' % (source_ind.gender.enum_label())) + query = ''' + SELECT ind FROM Individual ind + JOIN ind.{0} AS {0} + WHERE {0}.vid = :parent_vid + '''.format(parent_type) + children = kb.find_all_by_query(query, {'parent_vid' : source_ind.id}) + logger.info('Retrieved %d children for source individual' % len(children)) + for child in children: + backup = {} + logger.debug('Changing %s for individual %s' % (parent_type, + child.id)) + backup[parent_type] = getattr(child, parent_type).id + setattr(child, parent_type, target_ind) + update_object(child, backup, operator, kb) + kb.save_array(children) + + +def update_action_on_ind(source_ind, target_ind, operator, kb, logger): + query = '''SELECT act FROM ActionOnIndividual act + JOIN act.target AS ind + WHERE ind.vid = :ind_vid + ''' + src_acts = kb.find_all_by_query(query, {'ind_vid' : source_ind.id}) + logger.info('Retrieved %d actions for source individual' % len(src_acts)) + connected = kb.dt.get_connected(source_ind, direction=kb.dt.DIRECTION_OUTGOING, + query_depth=1) + if source_ind in connected: + connected.remove(source_ind) + for sa in src_acts: + logger.debug('Changing target for action %s' % sa.id) + sa.target = target_ind + logger.debug('Action %s target updated' % sa.id) + kb.save_array(src_acts) + for conn in connected: + kb.dt.destroy_edge(source_ind, conn) + kb.dt.create_edge(conn.action, target_ind, conn) + + +def update_enrollments(source_ind, target_ind, operator, kb, logger): + query = '''SELECT en FROM Enrollment en + JOIN en.individual AS ind + WHERE ind.vid = :ind_vid + ''' + enrolls = kb.find_all_by_query(query, {'ind_vid' : source_ind.id}) + logger.info('Retrieved %d enrollments for source individual' % len(enrolls)) + for sren in enrolls: + try: + sren.individual = target_ind + logger.debug('Changing individual for enrollment %s in study %s' % (sren.studyCode, + sren.study.label)) + kb.save(sren) + logger.info('Changed individual for enrollment %s (study code %s -- study %s)' % (sren.id, + sren.studyCode, + sren.study.label)) + except KBError, kbe: + logger.warning('Unable to update enrollment %s (study code %s -- study %s)' % (sren.id, + sren.studyCode, + sren.study.label)) + move_to_duplicated(sren, operator, kb, logger) + + +def update_ehr_records(source_ind, target_ind, kb): + kb.update_table_rows(kb.eadpt.EAV_EHR_TABLE, '(i_vid == "%s")' % source_ind.id, + {'i_vid' : target_ind.id}) + + +# This method should be considered as a temporary hack that will be +# used untill a proper ALIAS management will be introduced into the +# system +def move_to_duplicated(enrollment, operator, kb, logger): + old_st = enrollment.study + dupl_st = kb.get_study('%s_DUPLICATI' % old_st.label) + if not dupl_st: + logger.warning('No "duplicated" study ({0}_DUPLICATI) found for study {0}'.format(old_st.label)) + return + enrollment.study = dupl_st + try: + kb.save(enrollment) + logger.info('Enrollmnet %s moved from study %s to study %s' % (enrollment.studyCode, + old_st.label, dupl_st.label)) + except: + logger.error('An error occurred while moving enrollment %s from study %s to %s' % (enrollment.studyCode, + old_st.label, + dupl_st.label)) + + +def main(argv): + parser = make_parser() + args = parser.parse_args(argv) + + logger = get_logger('merge_individuals', level=args.loglevel, + filename=args.logfile) + + try: + host = args.host or vlu.ome_host() + user = args.user or vlu.ome_user() + passwd = args.passwd or vlu.ome_passwd() + except ValueError, ve: + logger.critical(ve) + sys.exit(ve) + + kb = KB(driver='omero')(host, user, passwd) + + logger.debug('Retrieving Individuals') + individuals = kb.get_objects(kb.Individual) + logger.debug('Retrieved %d Individuals' % len(individuals)) + ind_lookup = {} + for i in individuals: + ind_lookup[i.id] = i + + with open(args.in_file) as in_file: + reader = csv.DictReader(in_file, delimiter='\t') + for row in reader: + try: + source = ind_lookup[row['source']] + logger.info('Selected as source individual with ID %s' % source.id) + target = ind_lookup[row['target']] + logger.info('Selected as destination individual with ID %s' % target.id) + except KeyError, ke: + logger.warning('Unable to retrieve individual with ID %s, skipping row' % ke) + continue + + logger.info('Updating children connected to source individual') + update_children(source, target, args.operator, kb, logger) + logger.info('Children update complete') + + logger.info('Updating ActionOnIndividual related to source individual') + update_action_on_ind(source, target, args.operator, kb, logger) + logger.info('ActionOnIndividual update completed') + + logger.info('Updating enrollments related to source individual') + update_enrollments(source, target, args.operator, kb, logger) + logger.info('Enrollments update completed') + + logger.info('Updating EHR records related to source individual') + update_ehr_records(source, target, kb) + logger.info('EHR records update completed') + +if __name__ == '__main__': + main(sys.argv[1:]) diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/merge_individuals.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/merge_individuals.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,52 @@ + + + Merge individuals' data + + + launcher.sh + --interpreter=python + --runner=dmerge_individuals.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --logfile=${logfile} + --ifile=${ifile} + + + + + + + + + + + + +This tool moves all informations related to an individual (source) to +another (target). Moved informations are: + + * children (Individual objects) + * ActionOnInvidual + * Enrollments + * EHR records + +The tool expects as input a TSV file like this:: + + source target + V0468D2D96999548BF9FC6AD24C055E038 V060BAA01C662240D181BB98A51885C498 + V029CC0A614E2D42D0837602B15193EB58 V01B8122A7C75A452E9F80381CEA988557 + V0B20C93E8A88D43EFB87A7E6911292A05 V0BED85E8E76A54AA7AB0AFB09F95798A8 + ... + +NOTE WELL: + * Parents of the "source" indivudal WILL NOT BE ASSIGNED + to the "target" individual + * For the Enrollmnent objects, if + "target" individual has already a code in the same study of "source" + individual, the script will try to move the Enrollment to the + "duplicated" study + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/update_parents.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/update_parents.py Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,110 @@ +import sys, csv, argparse, time, json + +from bl.vl.kb import KnowledgeBase as KB +import bl.vl.utils.ome_utils as vlu +from bl.vl.utils import get_logger, LOG_LEVELS + + +def make_parser(): + parser = argparse.ArgumentParser(description='update parents') + parser.add_argument('--logfile', type=str, help='log file (default=stderr)') + parser.add_argument('--loglevel', type=str, choices=LOG_LEVELS, + help='logging level (default=INFO)', default='INFO') + parser.add_argument('-H', '--host', type=str, help='omero hostname') + parser.add_argument('-U', '--user', type=str, help='omero user') + parser.add_argument('-P', '--passwd', type=str, help='omero password') + parser.add_argument('-O', '--operator', type=str, help='operator', + required=True) + parser.add_argument('--in_file', type=str, required=True, + help='input file with individual, father and mother') + return parser + + +def update_parents(individual, father, mother, operator, kb, logger): + backup = {} + logger.info('Updating parents for individual %s', individual.id) + if individual.father != father: + backup['father'] = individual.father.id if individual.father else None + logger.info('Setting father to %s (old value %s)' % (father.id if father else None, + backup['father'])) + individual.father = father + if individual.mother != mother: + backup['mother'] = individual.mother.id if individual.mother else None + logger.info('Setting mother to %s (old value %s)' % (mother.id if mother else None, + backup['mother'])) + individual.mother = mother + if len(backup.items()) > 0: + update_object(individual, backup, operator, kb, logger) + return individual + else: + logger.info('No update needed for individual %s' % individual.id) + return None + + +def update_object(obj, backup_values, operator, kb, logger): + logger.debug('Building ActionOnAction for object %s' % obj.id) + act_setup = build_action_setup('update-parents-%f' % time.time(), + backup_values, kb, logger) + aoa_conf = { + 'setup': act_setup, + 'actionCategory': kb.ActionCategory.UPDATE, + 'operator': operator, + 'target': obj.lastUpdate if obj.lastUpdate else obj.action, + 'context': obj.action.context + } + logger.debug('Updating object with new ActionOnAction') + obj.lastUpdate = kb.factory.create(kb.ActionOnAction, aoa_conf) + + +def build_action_setup(label, backup, kb, logger): + logger.debug('Creating a new ActionSetup with label %s and backup %r' % (label, + backup)) + conf = { + 'label': label, + 'conf': json.dumps({'backup': backup}) + } + asetup = kb.factory.create(kb.ActionSetup, conf) + return asetup + + +def main(argv): + parser = make_parser() + args = parser.parse_args(argv) + + logger = get_logger('update_parents', level=args.loglevel, + filename=args.logfile) + + try: + host = args.host or vlu.ome_host() + user = args.user or vlu.ome_user() + passwd = args.passwd or vlu.ome_passwd() + except ValueError, ve: + logger.critical(ve) + sys.exit(ve) + + kb = KB(driver='omero')(host, user, passwd) + + logger.info('Retrieving individuals') + inds = kb.get_objects(kb.Individual) + logger.info('Retrieved %d individuals' % len(inds)) + inds_lookup = {} + for i in inds: + inds_lookup[i.id] = i + + with open(args.in_file) as in_file: + to_be_updated = [] + reader = csv.DictReader(in_file, delimiter='\t') + for row in reader: + ind = inds_lookup[row['individual']] + father = inds_lookup[row['father']] if row['father'] != 'None' else None + mother = inds_lookup[row['mother']] if row['mother'] != 'None' else None + ind = update_parents(ind, father, mother, args.operator, kb, logger) + if ind: + to_be_updated.append(ind) + + logger.info('%d individuals are going to be updated' % len(to_be_updated)) + kb.save_array(to_be_updated) + logger.info('Update complete') + +if __name__ == '__main__': + main(sys.argv[1:]) diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank/updater/update_parents_data.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank/updater/update_parents_data.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,35 @@ + + + Update parental info of individuals + + + launcher.sh + --interpreter=python + --runner=update_parents.py + --host=$__user_omero_host__ + --user=$__user_omero_user__ + --passwd=$__user_omero_password__ + --operator=$__user_email__ + --logfile=${logfile} + --in_file=${input1} + + + + + + + + + + + +It will update parental info of individual using informations from a file like this:: + + individual father mother + V4C5363 V0A3AC5 V0CF6C8 + V0EE642 V0A3AC5 V0CF6C8 + V027BA1 V0DE514 V0C3A91 + + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/biobank_tool_conf.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/biobank_tool_conf.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,84 @@ + + + +

+ + + + + + + + + + + + + + + + + + + +

+ + + +

+ + + + + + + + + + +

+ +

+ + + -- + +

+ + + diff -r 000000000000 -r e54d14bed3f5 galaxy-tools/orione_biobank_tool_conf.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-tools/orione_biobank_tool_conf.xml Thu Sep 29 06:09:15 2016 -0400 @@ -0,0 +1,41 @@ + + + + +

+ + + + + + + + + + + + + + + + + + + +

+ +

+ + + + + +

+ +

+ + + + +