Using machine learning to tell the time

HALLA_E22DTP

This PhD project aims to take a machine learning approach to accurately predict complex traits in plants. Many of the phenotypes and traits that biologists measure are complex, time consuming and require a specific skill set. One powerful approach to getting around this is to develop robust proxy measurements.

The project will initially develop a model to predict internal biological time using transcriptomic data, to identify a proxy gene set for which the expression can accurately predict the biological time the sample was taken. Over the last 10 years transcriptomic analysis has become a simple, robust, and relatively cheap assay. The proxy gene set will be used across public transcriptomic data sets to investigate how genotype and environment affect biological time.

The work will build on a paper published this year (Gardiner et al. 2021), which applied machine learning to predict complex temporal circadian gene expression patterns in the model plant Arabidopsis thaliana.

The project will go on to develop proxy gene sets for other important traits, such as crop yield, resilience, disease resistance and nitrogen use efficiency. The PhD will provide training in machine learning and bioinformatics.

The student will be based at the Earlham Institute on Norwich Research Park, a centre of excellence for genomics and data-driven research with cutting edge sequencing and computing facilities and excellent scientific training.

References

Gardiner L-J, Rusholme-Pilcher R, Colmer J, Rees H, Crescente JM, Carrieri AP, Duncan S, Pyzer-Knapp EO, Krishna R & Hall A (2021) Interpreting machine learning models to investigate circadian regulation and facilitate exploration of clock function. Proc Natl Acad Sci USA 118