Project Detail
Biomedical anthropological study in Arabian Peninsula based on high throughput genomics.
Verónica Fernandes (PI), Farida Alshamali, Luísa Pereira, Marisa Oliveira
Dates and Lifetime
From: 2016-07-01 To: 2019-12-31
Duration: 42 months



Project nº: 016609


Project title: Biomedical anthropological study in Arabian Peninsula based on high throughput genomis


Intervention Region:







Principal Contractor: Instituto de Patologia e Imunologia Molecular (IPATIMUP/UP)



Approval Date: 04-08-2016

Start Date: 01-06-2016

End Date: 31-12-2019 



Budget: 194.026,00€

FEDER: € 164.922,10

OE: € 29.103,90






Project description:


Biomedical anthropology (BA) was coined to define the multidisciplinary study of disease processes and their impact in populations, integrating approaches of physical anthropology, human biology, genetics and ecology, molecular medicine, nutrition and medical care. The technological developments of the 21st century, allowing high throughput characterisation of millions of polymorphisms and even whole exome (WES) and genome (WGS) sequences at a population level, are again enabling a leap in BA research, introducing a hypothesis-generating approach (HGA). Now, a broad genetic scan in healthy individuals allows to identify variants conferring susceptibility or resistance to diseases in populations. Demographic parameters are taken into account in this approach, such as: migration, expansion, bottleneck, admixture and panmixia. Together with genetic ones: mutation, selection and linkage disequilibrium (LD). And cultural ones: language, subsistence system, mating behaviour and inbreeding.


Arabian Peninsula (AP) is paradigmatic in the context of HGAs, namely in terms of the study of the past. It is the only worldwide region where a genetic-driven hypothesis motivated the first archaeological surveys. Genetics showed that AP was the first outpost of the successful out-of-Africa (OOA) migration at around 60 thousand years ago (ka). Our team has published mitochondrial DNA (mtDNA) and Y-chromosome evidences of the main role played by AP in the crossroad between Africa, Asia and Europe, being the cradle of the structure defining these main human population groups, and a continuing path for their admixture. Population structuring events persist in present day Arabia: nationals and recent immigrants form separate communities; Arabs have a strong and extensive familiar-structure, practising consanguinity between close relatives; a few population isolates persist in the region, as the nomadic Bedouins. These characteristics raise the statistical power of high throughput genomic HGA, rendering AP the best place to perform a BA study addressing the impact of the encounter between the 3 main population groups. Our approach will consist in 2 steps. The first is a genome-wide characterisation of 700,000 (0.7M) single nucleotide polymorphisms (SNPs) in 400 samples, aiming to: (1) evaluate the population structure at a fine-resolution scale across AP; (2) measure proportions of population admixture between African, Asian and European pools; (3) evaluate the level of identity-by-descent (IBD), which informs about consanguinity; (4) search for signs of gene selection across the genome. The second step is to perform WES in 100 individuals selected from the 4 populations (25 each) based on the chip results. The WES will inform about the Arab genetic diversity associated with Mendelian and complex diseases, such as hemolitic disorders associated with infectious agents and metabolic disorders as obesity, diabetes and hypertension.


The project will contribute insights into the history of this fundamental bridge between continents. Having been the first stepping stone in the OOA path, AP populations are the direct descendants of the ancestral non-African population, contributing to understand the distribution of human genetic and cultural diversity across the world, and especially in the Great Mediterranean region. This pool will reveal many SNPs that are of functional importance, leading to the discovery of new disease processes.




Sintese do projeto:


In this project, we will explore the statistical power conferred by highly structured and consanguineous populations to identify functionally important variants when performing a HGA. Populations from AP are the best candidates for such a BA study, since they are the descendents of the first non-African human population, enriched by thousands of years of admixture between the old African and new European and Asian genomic pools. Variants easily detect in AP will shed light on disease mechanisms common in the Great Mediterranean, due to shared ancestry and similar infectious endemicity due to the continuous mobility of populations in this region, since prehistoric times.


We will begin by performing a genome-wide characterisation of 0.7M SNPs in 100 individuals from each of the populations, Saudi Arabia, Yemen, Oman and UAE, summing up 400 individuals. We will calculate the Wright’s inbreeding coefficient (f), IBD and run of homozigosity (ROH) to evaluate the consanguinity in the populations. Usually, in Arab populations, more than 10% of the sample have an inbreeding coefficient higher than that of offspring of first cousins (f = 0.125). Close relatives share big identity-by-descent blocks, or homozigous blocks, which have been demonstrated to be enriched for deleterious variation. Thus, it is easier to detect functionally important variants in high-level consanguineous populations than panmitic ones. We will compare the distributions of these consanguinity measures in Arabian populations versus panmitic ones.


Then we will evaluate the population structure at a fine-scale resolution across the Peninsula, and compare it with Near Eastern, African, Asian and European populations, through ADMIXTURE, PCA, and Fst distances. This will provide: (1) definitive insights into our advanced Gulf Oasis model, as we will have a reasonable number of individuals screened all over AP; (2) check if these Arabian populations also have 3 main groups as observed in Qatar and Kuwait. We will run RFMix analysis, which infers the local ancestry along the mosaic admixed chromosomes from putative ancestral panels of haplotypes; we will identify which genes are located in regions presenting a significantly deviated ancestry from the average mixed profile in the population; for instance a known frequent haplotype in West Africa protecting against a pathogen, could also be increased in Yemini population, testifying a similar selective pressure acting in the two regions.


Another way of identifying putative functional regions is to search directly for signals of selection across the genome, identify the genes located on those regions and their functions, and compare results with the ones observed in populations from other continents. For this, we will analyse the pattern of LD, which basically means that a particular SNP allele at one site often predictably carries specific alleles at nearby variants sites. It is known that LD is variable along the genome, and it can help to identify natural selection in some genes: a decrease of diversity can be observed in regions linked to a beneficial mutation (selective sweep); while balancing selection can lead to high diversity (relatively high frequent haplotypes; as observed in the major histocompatibility complex in humans, related with infections). A different pattern of selection identified in our Arabian Peninsula populations could reveal local adaptation driven by climate or environment. Immunological genes are usually very prone to be under selection as they are essential in the adaptation to new environments as well as genes in pathways important for exploration of food resources, as the ones related with lactose intolerance, sugar and salt processing (important for obesity, diabetes and hypertension).


The results from the will be used to inform on selection of 25 individuals from each of the 4 populations to pursue for WES. A WES provides information on all exons, the parts of the genes that are transcribed into proteins, thus the most important coding regions of the genome, and containing most of the disease-causative variants. We expect to find many new variants, as most of the diversity in human populations is very recent, shared by a few individuals.


We will check if the identified coding variants were previously reported as Mendelian genetic disorder variants. The allele frequencies of these variants will be mapped by us in the Mediterranean region, based on data from our AP populations, 1000 Genomes database, HGDP database and, if needed, our screen in Portuguese, Moroccan and Tunisian samples we have in our collection.


For checking if the AP populations bear known-causing variants of complex diseases, we will apply a strategy similar to, by compiling SNPs from genome-wide association studies. Complex diseases are usually caused by several variants that individually have a low contribution to the disease, in opposition to Mendelian diseases, which are usually caused by a single variant easiest to identify as the causing one. We will construct maps of risk for complex diseases, but enlarged to the Great Mediterranean.


Our populations also have the power to allow insights into new variants potentially linked with complex diseases. The use of genome-wide data for analyses of admixed populations has been widely explored in American populations, in the context of complex diseases. The identification of selected blocks of a main ancestry input (for instance, African), instead of an expected random mix of ancestries, will be surveyed carefully for variants potentially pathogenic. We will also analyse all variants present within disease causing genes, although the function of the variants is not yet known. These variants will be evaluated in gene enrichment tools which consider their involvement in pathways, giving useful insights into disease mechanisms.


Furthermore, the identified African blocks in WES will be analysed phylogenetically, to evaluate if it is possible to disentangle between the various migrations which may have introduce them in AP, and allow to identify the “fossils” of the OOA.



Send Email