Genetic variation data
Of our 465 samples, 423 are part of the 1000 genomes Phase 1 release with phased haplotypes of SNPs, indels, and large deletions. The samples that are not in Phase1 will be imputed from Omni 2.5 haplotypes (by Natalja, see the imputation page for details). The initial Geuvadis data available for use is from the January release with additional filtering if low-quality indels. The final genotype files will be from the March v3 release that will be the official dataset also for the 1000g Phase 1 paper.
We will produce a single vcf file for all the Geuvadis samples, and other formats as necessary. Additionally, Tuuli will produce additional data of allele frequencies, conservation scores, functional annotations of the variants etc. These will be uploaded on the ftp site.
SNP genotype quality in the Phase1 release is in general very good, especially for the exonic regions that have been sequenced in high coverage (the 1000g exome target region covers 96% of CCDS bases). Some warranty is adviced regarding indels and structural variants - there are more false positives.
Further information about the variants can be found from 1000 genomes.