A3 mRNA Variation
2012-04-30 Gene Discovery Across A Population
We used the Flux Capacitor  approach to estimate the fraction of genes expressed across a population as a function of sequencing depth and number of samples, respectively. Of the ~54,000 genes annotated in the GENCODE v12 annotation , we found-dependent primarily on the sequencing depth-in each sample 13,670 ± 721 genes expressed above 0.5 RPKM (reads per kilobase per million mapped reads ), an adhoc chosen threshold to filter out spurious quantifications. Emplyoing these paradigms, we next assessed the behaviour of the gene discovery rate within each of the populations (i.e., YRB, CEU, FIN, GBR and TSI); this discovery rate exhibits a point of inflection at ~16,000 genes detected genes which is reached with <1 billion reads sequencing depth in each population. When exceeding 1 billion sequenced reads, we observe a slow but steadily increasing gene discovery rate that corresponds to ~4 genes per 10,000,000 sequenced reads, respectively, to ~20 genes per additionally considered sample.
Gene Discovery Rates in the 5 Populations
 Transcriptome genetics using second generation sequencing in a Caucasian population (2010) Montgomery SB, Sammeth M, Guiterrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET. Nature
 GENCODE: producing a reference annotation for ENCODE (2006) Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG,Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE and Guigo R. Genome Biology
 Mapping and quantifying mammalian transcriptomes byRNA-Seq (2008) Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Nature
2012-05-24 New Plots
To limit dependencies of the slope on the order of samples, we repeated the above analysis by permuting (in the test 30 times for the CEU dataset) the individuals and representing the cumulative gain by each additional individual as a boxplot.