A3 mRNA Variation

From Geuvadis MediaWiki
Jump to: navigation, search

Contents

2012-04-30 Gene Discovery Across A Population

by M.Sammeth


We used the Flux Capacitor [1] approach to estimate the fraction of genes expressed across a population as a function of sequencing depth and number of samples, respectively. Of the ~54,000 genes annotated in the GENCODE v12 annotation [2], we found-dependent primarily on the sequencing depth-in each sample 13,670 ± 721 genes expressed above 0.5 RPKM (reads per kilobase per million mapped reads [3]), an adhoc chosen threshold to filter out spurious quantifications. Emplyoing these paradigms, we next assessed the behaviour of the gene discovery rate within each of the populations (i.e., YRB, CEU, FIN, GBR and TSI); this discovery rate exhibits a point of inflection at ~16,000 genes detected genes which is reached with <1 billion reads sequencing depth in each population. When exceeding 1 billion sequenced reads, we observe a slow but steadily increasing gene discovery rate that corresponds to ~4 genes per 10,000,000 sequenced reads, respectively, to ~20 genes per additionally considered sample.

Gene Discovery Rates in the 5 Populations

Template:Pdf

Genes discovered by sequencing more individuals from the 5 populations

References

[1] Transcriptome genetics using second generation sequencing in a Caucasian population (2010) Montgomery SB, Sammeth M, Guiterrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET. Nature
[2] GENCODE: producing a reference annotation for ENCODE (2006) Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG,Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE and Guigo R. Genome Biology
[3] Mapping and quantifying mammalian transcriptomes byRNA-Seq (2008) Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Nature

2012-05-24 New Plots

by M.Sammeth

Permutation Test

To limit dependencies of the slope on the order of samples, we repeated the above analysis by permuting (in the test 30 times for the CEU dataset) the individuals and representing the cumulative gain by each additional individual as a boxplot.

CEU p30 test.png

Personal tools
Namespaces

Variants
Actions
Navigation
RNAseq Data and Analysis
Admin and info
Public
Toolbox