Percentage Splicing Index

From Geuvadis MediaWiki
Jump to: navigation, search


Percentage Splicing Index

In this section we explore the PSI = Percent Splicing Index, that provides the inclusion level of each exon. This measure was computed according to [1,2].


  • Calculating the PSI values:

To determine exon inclusion levels, for internal exons, we used an approach similar to the one used in [1,2], where three types of read information per exon were used: A) number of reads that map to the exon body; B) number of split-mapped reads that map to splice-junctions that support the inclusion of the read; C) number of split-mapped reads that map to splice-junctions that support the exclusion of the read (any junction in the gene that skips the exon);


Figure 1: Schematic representation of inclusion and exclusion reads; Considering the internal exon in this plot we have in red the inclusion reads comprising the reads that map to the exons (A) and the reads that fall in the splice-junctions that support the exon inclusion (B); in green the split-mapped reads that span this exon and therefore support its exclusion (C );

According to the scheme in Figure 1, for the internal exons we will have 7 reads of type A) mapping in the exon; 2+3 split-reads of type B) that fall in splice-junctions that support exon inclusion and 4 reads of type C) that support exon exclusion.

We can then calculate the Percentage Splice Index (PSI), as the ratio between inclusion reads and inclusion reads plus exclusion reads,

PSI = # inclusion_reads / (# inclusion_reads + # excusion_reads) or

PSI = A + B / (A +B + C)

A PSI value of 1 means that the exons is fully included and the other extreme a value of 0 means that the exon is not included.

NOTE: For some exons in some samples, sometimes exon read counts or junctions read counts are not available. In that case a label is added (no_gene, no_junction). Below we discuss some analysis when those cases were filtered out.

  • Differentially Included Exons:

We also explored differential exon inclusion between the different populations. For this we have done population pair-wise comparisons of the PSI values. For each exon we applied a Mann-Whitney test between the two sets followed by BH correction. To consider significantly differiantly included exons we considered the following three conditions:

1) exon length > 150bp (with this we guarantee a minimum number of reads within the exon body);

2) adjusted p-value < 0.05 (after BH correction);

3) median diff ≥ 0.1 (absolute difference in the median of the two sets is at least 0.1);


Files with the PSI values can be found here:



From a total of 199798 internal exons, 175210 exons have at least one PSI value and 64120 exons have a PSI value for all samples (464 samples) and 62689 exons with a PSI value for all samples (667 samples).

  • 175 210 exons with at least one PSI value

CEU # 15 33 15 165
FIN # # 21 8 158
GBR # # # 8 154
TSI # # # # 132

  • 64120 exons with a PSI value for all samples

CEU # 6 13 7 68
FIN # # 10 3 59
GBR # # # 4 57
TSI # # # # 57

We then selected exons with a minimum variabity (314 exons with stdv > 0.15) and performed hierarchical clustering to investigate if we find any particular inclusion/exclusion pattern on the samples.


Figure 2: Hierchical clustering on the 314 exones with a minimum variability across all samples.

Brief Discussion

By analyzing the percent splicing index of internal exons with sufficient read counts in the exon body and exon junctions we do find certain inclusion/exclusion patterns. Pair-wise population comparisons shows a relatively small number of significantly different included exons. In principle one would expect this since all the samples come from the same tissue. On the other hand the high number of tests my result in a stringent multiple test correction.


[1] Wang, E.T. et al "Alternative isoform regulation in human tissue", Nature, 2008.

[2] Shapiro, I.M. et al "An EMT-driven alternative splicing program occurs in human breast cancer and modulates cellular phenotype", Plos Genetics 7, 2011.

Personal tools

RNAseq Data and Analysis
Admin and info