annotate README @ 2:5eb99d21ef0d

Add trinityrnaseq_norm and transcriptsToOrfs tools
author Jim Johnson <jj@umn.edu>
date Thu, 05 Sep 2013 08:08:21 -0500
parents d4ce07eb63bd
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
1 http://trinityrnaseq.sourceforge.net/
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
2
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
3 Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
4
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
5 Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
6
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
7 Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
8
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
9 Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
10
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
11
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
12 Trinity can be referenced as:
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
13
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
14 Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat Biotechnol. 2011 May 15;29(7):644-52. doi: 10.1038/nbt.1883. PubMed PMID: 21572440.
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
15
d4ce07eb63bd Uploaded
jjohnson
parents:
diff changeset
16