Modelling gene expression at single-cell resolution using Bayesian methodologies and machine learning algorithms using multi-omics data

About this project

Project description

Gene expression regulation is multifaceted, involving intricate networks that operate at various levels: RNA, protein, and post-transcriptional regulation. High-throughput studies at bulk and single-cell resolution have generated datasets that predominantly focus on isolated aspects of gene expression, such as RNA abundance or protein levels, often neglecting the complex interplay between these layers. mRNA levels do not always correlate directly with protein abundance due to post-transcriptional modifications and regulatory mechanisms such as alternative splicing and RNA degradation.

While advancements in high-throughput techniques have facilitated detailed examinations of RNA expression and the characterisation of various transcript types, there remains a significant gap in understanding the integrative interplay among these factors in the regulation of gene expression. Current approaches for multi-omic data focus on predicting gene expression levels based on epigenomic inputs like chromatin accessibility and DNA methylation. Developing methods to elucidate the interplay between protein levels and gene expression creates opportunities for identifying drug targets, biomarkers, and functional interactions like ligand-receptor binding.

To address this gap, we will collect and generate paired and unpaired measurements that encompass all three levels of gene expression regulation: gene expression, proteogenomics, and proteomics data to develop predictive models that forecast gene expression from integrated data. We aim to apply machine learning (ML) and Bayesian algorithms to model these integrated datasets and use brain-related diseases as a use case to motivate the modelling approach. Notably, proteomic datasets are currently less abundant at high-resolution single-cell levels. The ability to predict one part of the dataset based on another—such as forecasting protein expression from transcriptional data, will address a critical need in the field.

Outcomes

1. Predictive Modeling: Developing predictive models using public datasets such as Allen Brain Atlas and in-house datasets.
2. Validate the model using novel datasets that represent a range of biological and pathological conditions.

Information for applicants

Essential capabilities

Experience with cell culture and molecular biology. Can code in C/Java/Python/R. Proficient in mathematics/algorithms. Proficient in Probability and statistics. Atleast 1 reaserch publication.

Desireable capabilities

Comfortable with Unix command line and experienced with HPC environment.

Expected qualifications (Course/Degrees etc.)

Undergraduate degree with CGPA> 8.0 or >80% passing marks, Master
degree (from CFTI) with CGPA>8.0 in Life Sciences, Biotechnology,
Computational Biology (other similar degrees in Life sciences)

Project supervisors

Principal supervisors

UQ Supervisor

Professor Jessica Mar

Australian Institute for Bioengineering and Nanotechnology (AIBN)
IITD Supervisor

Assistant professor Ishaan Gupta

Department of Biochemical Engineering and Biotechnology