The process of domestication invariably leads to a loss of genetic diversity and a reduction in population size, leaving the animals more vulnerable to mutations. As these populations interbreed and mutations accumulate, they can cause disease, developmental disorders and infertility – making them ideal models for rare human disorders. Most studies focus on the ~2% of the genome encoding proteins, mainly due to the difficulty in identifying functional non-coding elements. This leaves aside the non-coding regions – sometimes called ‘the dark genome’ – which harbours functional regulatory sequences. The majority of disease- or trait-associated variants are present in these non-coding, often conserved, sequences.
The aim of the project is to use a unique resource consisting of 252 mammal genomes, along with population data, to assess the extent by which population sizes have been reduced and deleterious mutations accumulated. You will apply the latest developments in machine learning to transfer models and information from highly-studied organisms (human, mouse) to genomes with scarce resources, identifying functional elements and predicting the impact of mutations within them. Using recent protocol developments, you will be able to experimentally validate computational predictions.
We offer a highly collaborative PhD project between three research groups (Haerty: bioinformatics, Immler: population genetics, Patron: synthetic biology) and bringing international expertise in machine learning to combine computational and molecular biology. You will gain expertise in computational biology, large datasets analysis, machine learning, genetics, and molecular biology.
The project will be conducted at the Earlham Institute – a world leading research centre for bioinformatics and sequencing technology development. You will have access to training and career development opportunities at EI and on the Norwich Research Park as part of the Norwich Biosciences Doctoral Training Partnership.