Mercurial > repos > jeremyjliu > region_motif_enrichment
comparison region-motif-compare/README.md @ 17:7afdfd4f4c1b draft
Uploaded
author | jeremyjliu |
---|---|
date | Wed, 12 Nov 2014 15:21:11 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
16:9a84f76db861 | 17:7afdfd4f4c1b |
---|---|
1 # Region-Motif-Compare Tools | |
2 Version 1.1 Released 2014 | |
3 Park Laboratory | |
4 Center for Biomedical Informatics | |
5 Harvard University | |
6 | |
7 Contact | |
8 Jeremy Liu (jeremy.liu@yale.edu) | |
9 Nils Gehlenborg (nils@hms.harvard.edu) | |
10 | |
11 ## Overview | |
12 ### Structure | |
13 The tool suite consists of: | |
14 | |
15 1. Two Rscripts: region_motif_compare.r and region_motif_intersect.r | |
16 2. Two Xml Files: region_motif_compare.xml and region_motif_intersect.xml | |
17 3. Motif Database Directory: region_motif_db | |
18 4. Dependency Library Directory: region_motif_lib | |
19 5. Galaxy Workflows: Files with suffix ".ga" that can be imported into the local | |
20 Galaxy instance after installation of the tool. | |
21 | |
22 ### Description | |
23 1. **region_motif_intersect.r** (1 bed -> 1 tsv): | |
24 Takes one bed file of regions as input. Then it calculates | |
25 the number of intersections of the regions and the motifs. region_motifs_intersect.r | |
26 outputs a tab separated values (tsv) file of motif names and intersection counts. | |
27 **Important Note:** region_motif_intersect.r makes no assumptions about the nature | |
28 of the input regions. For example, if overlapping regions are inputted, motifs that | |
29 intersect the overlap will be double counted. Thus, it is recommended that regions | |
30 be merged before using this tool, using the merge tool in the Galaxy toolshed. | |
31 | |
32 2. **region_motif_compare.r** (2 tsv -> 2 tsv & 1 png): | |
33 Takes as input two tsv files of motifs / regions intersection | |
34 counts. These generally originate from running region_motif_intersect.r on two sets | |
35 of different regions with the same query motif database. Based on the counts, | |
36 region_motif_compare.r then determines the enrichment (or depletion) of certain | |
37 motifs across the two regions. This is done by a correcting for the size and gc | |
38 content of the region, and applying a Poisson test to the counts. | |
39 Then, region_motif_compare.r outputs the most significant enriched or depleted | |
40 motifs as a tsv. In addition, the tool outputs a diagnostic plot containing | |
41 graphical representations of the motif counts, gc correction curves, and significant | |
42 motifs that distinguish the two regions (selected via p value). | |
43 | |
44 3. **region_motif_db**: Contains motif positions as compressed, indexed tabix files. | |
45 | |
46 4. **region_motif_lib**: Contains dependencies (i.e. plotting.r) for region_motif_compare.r | |
47 | |
48 ## Installation | |
49 Directions for installing the region-motif-compare tools into a personal computer | |
50 and a local Galaxy instance. | |
51 | |
52 1. Follow the online directions to install a local instance of Galaxy (getgalaxy.org). | |
53 Optionally, follow the directions to install Refinery (refinery-platform.readthedocs.org) | |
54 | |
55 2. Clone the github repository to your local computer | |
56 ```` | |
57 git clone https://github.com/parklab/refinery-galaxy-tools.git | |
58 cd refinery-galaxy-tools/region-motif-compare | |
59 ```` | |
60 | |
61 3. Make a directory for the tools in Galaxy instance. This serves as a category | |
62 for the tool in the tools sidebar. You can also place the tools in an existing | |
63 or alternatively named directory, but remember to update tool_conf.xml to reflect this. | |
64 ```` | |
65 cd ~/galaxy-dist/tools/ | |
66 mkdir my_tools | |
67 cd my_tools | |
68 ```` | |
69 | |
70 4. Copy over ".r" and ".xml" files, as well as `region_motif_db` and `region_motif_lib` | |
71 ```` | |
72 cd refinery-galaxy-tools/region-motif-compare | |
73 cp *.r ~/galaxy-dist/tools/my_tools | |
74 cp *.xml ~/galaxy-dist/tools/my_tools | |
75 cp -r region_motif_db ~/galaxy-dist/tools/my_tools | |
76 cp -r region_motif_lib ~/galaxy-dist/tools/my_tools | |
77 ```` | |
78 | |
79 5. Edit `~/galaxy-dist/tool_conf.xml` to reflect the addition of the new tools. | |
80 Add the following lines within the `<toolbox>` tags. If in Step 3 you copied | |
81 the tools to a different directory than `my_tools`, edit the code snippet | |
82 to reflect the correct path name. | |
83 ```` | |
84 <section id="mTools" name="My Tools"> | |
85 <tool file="my_tools/region_motif_intersect.xml" /> | |
86 <tool file="my_tools/region_motif_compare.xml" /> | |
87 </section> | |
88 ```` | |
89 | |
90 6. Download the motif databases and place them into `region_motif_db` | |
91 ```` | |
92 cd ~/galaxy-dist/tools/my_tools/region_motif_db | |
93 wget ????/pouya_motifs.bed.bgz | |
94 wget ????/pouya_motifs.bed.bgz.tbi | |
95 wget ????/jaspar_jolma_motifs.bed.bgz | |
96 wget ????/jaspar_jolma_motifs.bed.bgz.tbi | |
97 wget ????/mm9_motifs.bed.bgz | |
98 wget ????/mm9_motifs.bed.bgz.tbi | |
99 ```` | |
100 | |
101 7. Install the Bioconductor R package Rsamtools for dealing with tabix files | |
102 ``` | |
103 $ R | |
104 > source("http://bioconductor.org/biocLite.R") | |
105 > biocLite("Rsamtools") | |
106 ```` | |
107 | |
108 8. If in Step 3 you copied the tools to an existing directory or an alternatively | |
109 named directory, you must edit the following file paths. | |
110 In `region_motif_intersect.r` and `region_motif_compare.r` edit `commonDir`: | |
111 ```` | |
112 # Replace this line | |
113 commonDir = concat(workingDir, "/tools/my_tools") | |
114 # With this edited line | |
115 commonDir = concat(workingDir, "<relative_path_from_galaxy_root>/<tool_directory>") | |
116 ```` | |
117 In addition, edit `region_motif_intersect.xml` and `region_motif_compare.xml` to | |
118 reflect the path of the tools relative to the galaxy root directory. | |
119 ```` | |
120 <command interpreter="bash"> | |
121 /usr/bin/R --slave --vanilla -f $GALAXY_ROOT_DIR/<path_to_tools>/region_motif_intersect.r --args $GALAXY_ROOT_DIR $db_type $in_bed $out_tab | |
122 </command> | |
123 ```` | |
124 ```` | |
125 <command interpreter="bash"> | |
126 /usr/bin/R --slave --vanilla -f $GALAXY_ROOT_DIR/<path_to_tools>/region_motif_compare.r --args $GALAXY_ROOT_DIR $db_type $in_tab_1 $in_tab_2 $out_enriched $out_depleted $out_plots | |
127 </command> | |
128 ```` | |
129 | |
130 ## Running the Tools | |
131 ### Running from Galaxy | |
132 1. To run the tools as workflows, import the .ga workflows included in the github | |
133 via the Galaxy workflow user interface. Then, upload and select two input BED files. | |
134 | |
135 2. To run the tools individually, select the tool from the tools toolbar, provide | |
136 a BED file (Region Motif Intersect) or two tsv files (Region Motif Compare), and | |
137 select a query database from the dropdown menu. | |
138 | |
139 ### Running from Refinery | |
140 1. Import the .ga workflows into a local Galaxy instance. These workflows have | |
141 already been annotated for Refinery. | |
142 | |
143 2. Add the local Galaxy instance to the Refinery installation. | |
144 ```` | |
145 python manage.py create_workflowengine <instance_id> "<group_name>" | |
146 ```` | |
147 | |
148 3. Import the Galaxy workflows into Refinery. | |
149 ```` | |
150 python manage.py import_workflows | |
151 ```` | |
152 4. Run the tools from the Refinery user interface. | |
153 | |
154 ### Running as Command Line Tools | |
155 You can also run the tools from the command line, an example of which is shown below. | |
156 More information is found in the headers of the r source files. | |
157 ```` | |
158 cd ~/galaxy-dist/tools/my_tools | |
159 R --slave --vanilla -f region_motif_intersect.r --args ~/galaxy-dist p <path_to_bed_file> <path_to_output_tsv> | |
160 R --slave --vanilla -f region_motif_compare.r --args ~/galaxy-dist p <path_to_region1_counts> <path_to_region2_counts> <enriched_motifs_output_tsv> <depleted_motifs_output_tsv> <plots_png> | |
161 ```` | |
162 | |
163 ## Interpreting Results | |
164 ### Motif Database and Result Notation | |
165 TF motif positions for hg19 and mm9 were curated from three databases: | |
166 ENCODE TF motif database "Pouya" (http://compbio.mit.edu/encode-motifs/) | |
167 JASPAR database "Jaspar" (http://jaspar.genereg.net/) | |
168 DNA binding specificities of human transciption factors "Jolma" (http://www.ncbi.nlm.nih.gov/pubmed/23332764) | |
169 | |
170 For ENCODE TF motifs, the genomic locations were taken straight from the database. | |
171 In addition, position weight matrices (pwms) were obtained by averaging the | |
172 sites in the genome for a motif. These are labeled with "\_8mer\_". | |
173 Fake motifs were also generated, by shuffling the pwms of actual motifs and | |
174 mapping to the genome and are labeled with "_8mer_C". | |
175 | |
176 For JASPAR and Jolma motifs, mast was run to determine genomic locations from the | |
177 provided pwms. The motif alignmment thresholds were set to the top 5k, 20k, 100k, and | |
178 250k sites and the redundant maps removed with the top 30k sites have the same score. | |
179 These are labeled with "_t5000" and likewise. | |
180 | |
181 | |
182 ## Motif Tabix File Creation | |
183 Starting with a BED file of motif positions (minimal chr, start, end), follow | |
184 below to generate a tabix file that can be placed in `region_motif_db` and | |
185 used by the tools. | |
186 | |
187 1. Download Tabix (http://sourceforge.net/projects/samtools/files/tabix/) and install. | |
188 Add `tabix` and `bgzip` binaries to your file path. | |
189 ```` | |
190 tar -xvjf tabix-0.2.6.tar.bz2 | |
191 cd tabix-0.2.6 | |
192 make | |
193 ```` | |
194 | |
195 2. Construct bgzip files and index files. | |
196 ```` | |
197 cd ~/galaxy-dist/tools/my_tools/region_motif/db | |
198 (grep ^"#" jaspar_motifs.bed; grep -v ^"#" jaspar_motifs.bed | sort -k1,1 -k2,2n) | bgzip > jaspa_motifs.bed.bgz | |
199 tabix -p bed jaspar_motifs.bed.bgz # this generates jaspar_motifs.bed.bgz.tbi | |
200 ```` | |
201 | |
202 3. Add the path to `jaspar_motifs.bed.bgz` to the selection options for the variable | |
203 `motifDB` in `region_motif_intersect.r` and `region_motif_compare.r`. To enable | |
204 the new database in Galaxy, you will have to edit the xml files for both tools. |