0
|
1 <tool id="wtavg" name="Assign weighted-average" version="1.0.0">
|
|
2 <description> of the values of features overlapping an interval </description>
|
1
|
3 <requirements>
|
|
4 <requirement type="package" version="1.0.0">galaxy-ops</requirement>
|
|
5 <requirement type="package" version="0.7.1">bx-python</requirement>
|
|
6 </requirements>
|
0
|
7 <command interpreter="python">WeightedAverage.py $genomic_interval $genomic_feature $out_file1 -1 ${genomic_interval.metadata.chromCol},${genomic_interval.metadata.startCol},${genomic_interval.metadata.endCol},${genomic_interval.metadata.strandCol} -2 ${genomic_feature.metadata.chromCol},${genomic_feature.metadata.startCol},${genomic_feature.metadata.endCol},${genomic_feature.metadata.strandCol},${genomic_feature.metadata.nameCol}</command>
|
|
8
|
|
9 <inputs>
|
|
10 <param format="interval" name="genomic_interval" type="data" label="Genomic intervals (first dataset)" help="Dataset missing? See Note below."/>
|
|
11 <param format="interval" name="genomic_feature" label="Genomic features (second dataset)" type="data" help="Make sure the value column is specified. See Note below." />
|
|
12 </inputs>
|
|
13 <outputs>
|
|
14 <data format="input" name="out_file1" metadata_source="genomic_interval" />
|
|
15 </outputs>
|
|
16 <tests>
|
|
17 <!-- Test data with valid values -->
|
|
18 <test>
|
|
19 <param name="genomic_interval" value="interval_interpolate.bed"/>
|
|
20 <param name="genomic_feature" value="value_interpolate.bed"/>
|
|
21 <output name="out_file1" file="interpolate_result.bed"/>
|
|
22 </test>
|
|
23
|
|
24 </tests>
|
|
25 <help>
|
|
26
|
|
27
|
|
28 .. class:: infomark
|
|
29
|
|
30 **What it does**
|
|
31
|
|
32 For each interval in your first dataset, this tool calculates the weighted average value of the overlapping features in your second dataset.
|
|
33
|
|
34 - When a genomic interval partially or totally overlaps a single genomic feature, the value of that genomic feature is assigned to the genomic interval.
|
|
35 - When a genomic interval partially or totally overlaps with more than one genomic features, the average of the values of the overlapping genomic features weighted by the corresponding number of overlapping bases is assigned to the genomic interval.
|
|
36 - When a genomic interval does not overlap with any genomic feature, 'NA' will be assigned as it's value.
|
|
37
|
|
38 -----
|
|
39
|
|
40 .. class:: warningmark
|
|
41
|
|
42 **Note**
|
|
43
|
|
44 The input datasets should be in **bed** or **interval** format. Please use "edit attributes"/pencil icon to specify the column containing the values for the features in the second dataset as **name/identifier** column.
|
|
45
|
|
46 The output will contain all the columns in the first input plus a new column containing the assigned value for each interval.
|
|
47
|
|
48 -----
|
|
49
|
|
50 **Example**
|
|
51
|
|
52 - Suppose our first dataset contains the following **genomic intervals**::
|
|
53
|
|
54 chr start stop
|
|
55 chr1 1000 2000
|
|
56 chr1 3000 5000
|
|
57 chr1 8000 9000
|
|
58
|
|
59 - and our second dataset contains the following **genomic features** each having an associated value (in fourth column) ::
|
|
60
|
|
61 chr start stop name
|
|
62 chr1 900 1200 0.5
|
|
63 chr1 2900 3100 0.2
|
|
64 chr1 4800 5100 0.8
|
|
65
|
|
66 - For each **genomic interval** in our first dataset, this tool calculates the weighted average value of the overlapping **genomic features** in our second dataset ::
|
|
67
|
|
68 chr1 1000 2000 0.5
|
|
69 chr1 3000 5000 0.6
|
|
70 chr1 8000 9000 NA
|
|
71
|
|
72
|
|
73
|
|
74 </help>
|
|
75 </tool> |