Deep learning assisted metagenomics to explore microbiomes

HILDEBRAND_Q23DTP2

Microbial communities such as found in the human gut or soil can host 100s to 10,000s of microbial species, the vast majority still unexplored.

Metagenomics, the random sequencing of all DNA in a sample, can be used to reconstruct microbial genomes from such communities, relying on advanced assembly and machine learning algorithms.

In this project you will investigate the application of different deep neural network algorithms to a) interrogate genes representing typical functions found in soil, ocean and human gut microbiomes, b) use neural networks to reconstruct genomes from metagenomes to c) ultimately understand which gene functions typically occur together in genomes and communities, building a probabilistic pangenome graph for microbial communities in different environments and conditions.

For this project, training in machine learning, ecology, high-performance computing, computational biology, statistics, DNA sequencing and metagenomics will be provided.

The ideal candidate will have experience in computational biology (Linux environment, Python or R or C++).

Knowledge of topics like metagenomics, genomics, numerical ecology and machine learning (deep learning) are a plus but will be trained during the PhD.

Visiting international conferences and exchanges with international collaborators are part of student training.

The candidate will be supervised by Dr Hildebrand, Dr Leggett and Dr Quince (Quadram and Earlham Institutes) in Norwich. Both Institutes are part of the large, multinational Norwich Research Park, that hosts a vibrant and active research community adjacent to the University of East Anglia.

Norwich is a mid-sized historical, medieval city with a large student community, situated on the Norfolk coast.

For further information visit:
https://falk.science
https://www.earlham.ac.uk/profile/richard-leggett
https://www.earlham.ac.uk/profile/chris-quince

References

1. Kubinski, R. et al. Benchmark of Data Processing Methods and Machine Learning Models for Gut MicrobiomeBased Diagnosis of Inflammatory Bowel Disease. Frontiers in Genetics 13, 2021.05.03.442488 (2022).

2. Coelho, L. P. et al. Towards the biogeography of prokaryotic genes. Nature (2021) doi:10.1038/s41586-021-04233- 4.

3.Bahram, M. et al. Metagenomic assessment of the global diversity and distribution of bacteria and fungi. Environmental Microbiology 23, 316–326 (2021).

4. Maistrenko, O. M. et al. Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity. The ISME Journal 14, 1247–1259 (2020).

5. Frioux, C., Singh, D., Korcsmaros, T. & Hildebrand, F. From bag-of-genes to bag-of-genomes: metabolic modelling of communities in the era of metagenome-assembled genomes. Computational and Structural Biotechnology Journal 18, 1722–1734 (2020).