GenBio DTP
Admissions for for the Generative Biology DTP are now open!
The Generative Biology Doctoral Training Programme (DTP) is a 4-year DPhil course, funded by the Ellison Institute of Technology (EIT), designed to train doctoral researchers in cutting-edge areas of generative biology. This programme is hosted by the Department of Chemistry with involvement from other departments in MPLS and the Medical Sciences Division, and has fully funded scholarships available for both home and overseas applicants.
The DTP forms part of the long-term strategic alliance between the University of Oxford and EIT, which aims to develop innovative solutions and foster leaders to tackle some of the greatest challenges facing humanity.
Key topics and themes will focus on fundamentals of generative biology, reflecting the breadth and depth of the research experience of the supervisory pool consisting of EIT and University of Oxford investigators. It is anticipated that research proposals will focus on the key challenges in making biology engineerable and therefore have the potential to impact, directly or indirectly, on:
1) The ability to write in the natural language of biology, and
2) The ability to understand which DNA sequences will generate biological systems that perform the desired functions.
Course structure
Students will start work on their research project from the outset, developing their research skills through hands-on learning in the laboratory of their supervisor(s). Supervisors will guide students to take full advantage of relevant lecture courses and training modules within the University that will contribute to the academic development of their research.
To supplement the academic learning objectives of the programme, cohort-level training throughout the programme will develop the transferable skills that are essential for both effective research and the world beyond academia. Students will also have the opportunity to undertake training in leadership and innovation, provided by EIT, which can be taken at intervals most appropriate for their specific research priorities.
The DTP will host regular lunchtime seminar sessions, as well as an annual symposium, where students will present their research in a high-trust environment between peers. The DTP will also organise funded retreats where students and supervisors spend a consolidated block of time together as group, discussing science outside of the normal context and strengthening their working relationships.
How to apply
Applicants will be required to select preferred projects from two lists of projects below ('Teal' and 'Fuchsia', with at least one project chosen from each category). Projects will be listed below as they are confirmed, and by no later than 12th December 2025.
Students who are admitted to Teal Projects will primarily be based in the Generative Biology Institute (GBI) at the Ellison Institute of Technology, Oxford Science Park. Students admitted to Fuchsia Projects will primarily be based in the laboratory of their University of Oxford supervisor.
Students are encouraged to make informal contact with supervisors of the projects that they are interested in working on, before submitting their application. Students are also welcome to include a research proposal as part of their application, which can then be developed in collaboration with their supervisor(s). Areas of interest which align with the GBI’s vision and research aims are listed below.
We welcome and encourage original research proposals that are consistent with GBI’s vision and the scope of the DTP. Relevant areas include:
- Molecular and Cellular Design and Evolution
- Enzyme design
- Design of molecular assemblies and machines
- Experimental accelerated evolution
- Robotics, automation and autonomous labs
- Genome mining and informatics-based discovery
- Computational AI sequence to function
- Expanding chemistry in biology
- Scalable error free DNA and genome synthesis
- Microbial Genome Synthesis and Design
- Gb-scale Genome Synthesis for plants, human cells and animals
- Combinatorial synthetic genomics
- DNA delivery
- Predictive models of DNA sequence to function, at the scale of genes and genomes
- Human health applications and delivery mechanisms
- Lead supervisor: Jason Chin (Principal Investigator, Generative Biology Institute, EIT & Professor of Chemistry and Chemical Biology, Department of Chemistry, University of Oxford) - chin@eit.org
- Co-supervisor: Ben Davis (Professor of Chemical Biology, Department of Pharmacology, University of Oxford) – ben.davis@pharm.ox.ac.uk
The Chin group’s work pioneers: 1) the development and application of genome design and synthesis methods and 2) combines these approaches with cellular engineering for the encoded cellular synthesis of new polymers and materials. Some recent examples are exemplified in the publications below. Within these broad areas, or other areas within the scope of GBI, students are encouraged to propose their own project ideas as part of their application.
- Dunkelmann, D.L., Piedrafita, C., Dickson, A. et al. Adding α,α-disubstituted and β-linked monomers to the genetic code of an organism. Nature 625, 603–610 (2024).
- Robertson, W. E. et al. Sense codon reassignment enables viral resistance and encoded polymer synthesis. Science 372, 1057–1062 (2021).
- Robertson WE, Rehm FBH, Spinck M, et al. Escherichia coli with a 57-codon genetic code. Science. 2025;390(6771)
- Zürcher, J. F. et al. Continuous synthesis of E. coli genome sections and Mbscale human DNA assembly. Nature 619, 555–562 (2023).
- Fredens, J. et al. Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514–518 (2019).
- Wang, K. et al. Defining synonymous codon compression schemes by genome recoding. Nature 539, 59–64 (2016).
- Lead supervisors:
- Jason Chin (Principal Investigator, Generative Biology Institute, EIT & Professor of Chemistry and Chemical Biology, Department of Chemistry, University of Oxford) - chin@eit.org
- Linda van Bijsterveldt (Principal Investigator, Generative Biology Institute, EIT) - linda.vanbijsterveldt@eit.org
- Co-supervisor: KJ Patel (Director, Weatherall Institute of Molecular Medicine, University of Oxford) - ketan.patel@imm.ox.ac.uk
Since the completion of the Human Genome Project at the start of the century, researchers have sought the ability to write our genome from scratch. Unlike genome editing, genome synthesis allows for changes at a greater scale and density, with more accuracy and efficiency, and will lead to the determination of causal relationships between the organisation of the human genome and how our body functions. To date, scientists have successfully developed synthetic genomes for microbes such as E. coli (Syn57). The genomes of higher organisms, such as mammals and plants, however, are on the scale of up to billions of bases (103 times larger than microbial genomes synthesised to date), and there are currently no technologies for writing these gigabase-scale genomes.
We are developing the foundational and scalable tools, technologies and methods needed to build human chromosomes and genomes, and other gigabase-scale genomes. Genome synthesis may allow us to understand the rules underpinning natural human genomes. This will open entire new fields of research in human health and have profound impacts on biotechnology, which may include the development of safe, targeted, cell-based therapies. Methods developed for the synthesis of human genomes will also provide a foundation for the synthesis of the gigabase-scale genomes of crops, animals and other organisms with important applications in food security and sustainability, such as climate-resistant crops.
- G Petris, S Grazioli, L van Bijsterveldt. Et al., High-fidelity human chromosome transfer and elimination. Science (2025).
- Zürcher, J. F. et al. Continuous synthesis of E. coli genome sections and Mbscale human DNA assembly. Nature 619, 555–562 (2023).
- Robertson WE, Rehm FBH, Spinck M, et al. Escherichia coli with a 57-codon genetic code. Science. 2025;390(6771)
- Lead supervisor: Jason Chin (Principal Investigator, Generative Biology Institute, EIT & Professor of Chemistry and Chemical Biology, Department of Chemistry, University of Oxford) - chin@eit.org
- Co-supervisor: Frank Burmann (Principal Investigator, Department of Biochemistry, University of Oxford) - frank.burmann@bioch.ox.ac.uk
Our ability to write DNA has recently expanded to the genomic scale. The possibility of defining every single base in the genome of a cell enables manipulation of the most fundamental cellular properties, such as the genetic code.
However, current genome synthesis methods are slow, narrow in scope, and limited in scale. To date, the few versions of only two bacteria have been successfully synthesized. This project aims to develop methodologies to make the synthesis of model organism genomes (i.e. E. coli) more rapid and enable the rapid testing of many genome designs.
The ability to routinely synthesize designed genomes will enable uncovering of new to nature functionalities. Ultimately, the combination of microbial genome synthesis and artificial intelligence will enable biological design at the organism scale with implications in bioproduction, human health, agriculture, and beyond.
- Robertson WE, Rehm FBH, Spinck M, et al. Escherichia coli with a 57-codon genetic code. Science. 2025;390(6771)
- Zürcher, J. F. et al. Continuous synthesis of E. coli genome sections and Mbscale human DNA assembly. Nature 619, 555–562 (2023).
- Fredens, J. et al. Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514–518 (2019).
- Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56 (2010).
- Wang, K. et al. Defining synonymous codon compression schemes by genome recoding. Nature 539, 59–64 (2016).
- Lead supervisor: Jérôme Zürcher (Principal Investigator, Generative Biology Institute, EIT and Associate Professor of Biological Chemistry, Department of Chemistry, University of Oxford) - jeromez@eit.org
- Co-supervisor: Michael Bronstein (DeepMind Professor of Artificial Intelligence, Department of Computer Science, University of Oxford) - michael.bronstein@cs.ox.ac.uk
A direct consequence of the universality of the genetic code is the possibility for genetic information to be transferred between evolutionarily distant species. Such horizontal transfer of genetic information (as opposed to vertical genetic transfer, where information is passed on from an organism to its progeny) is common in nature and has shaped evolution over billions of years. In the context of genetic engineering, however, this type of genetic spillover is highly concerning. Prevention of interference of artificial genetic information with natural biology is critical to allow biotechnological progress to be both safe and ambitious.
Furthermore, biotechnology will play a central role in addressing pressing challenges in food security, pharmaceutical development, sustainable fuel sources, and efficient carbon fixation. Thus, essential parts of the economy will increasingly rely on bioproduction facilities harbouring tailor-made microbes. It is therefore critical that such facilities are extremely reliable. However, due to the universality of the genetic code, engineered organisms are just as susceptible to viral invasion as natural organisms. In fact, a single viral particle that finds its way into a bioproduction facility can force its operational shutdown.
Altering the genetic code of a cell provides an opportunity to render natural and synthetic genetic information incompatible. This breakthrough offers a means to protect the environment from genetically engineered organisms and, vice versa, engineered organisms critical for bioproduction from viral invasion. Through concerted efforts in genome recoding and translational engineering, it was possible to create the first organism with a synthetic genetic code. Since this organism “speaks a different language” than organisms found in nature, it is genetically isolated; it can neither give nor receive genetic information from the environment.
This projects continues the development of altered genetic codes to increase the safety of biotechnology and aims to rewrite even the most complex biological systems in alternative synthetic genetic codes.
- Zürcher, J. F. et al. Refactored genetic codes enable bidirectional genetic isolation. Science 378, 516–523 (2022).
- Robertson, W. E. et al. Sense codon reassignment enables viral resistance and encoded polymer synthesis. Science 372, 1057–1062 (2021).
- Fredens, J. et al. Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514–518 (2019).
- Zürcher, J. F. et al. Genetic codelocking confers stable virus resistance to a recoded organism. Biochemistry 64, 3093–3103 (2025).
- Lead supervisor: Jérôme Zürcher (Principal Investigator, Generative Biology Institute, EIT & Associate Professor of Biological Chemistry, Department of Chemistry, University of Oxford) - jeromez@eit.org
- Co-supervisor: Teresa Thurston (Group leader, Sir William Dunn School of Pathology, University of Oxford) - teresa.thurston@path.ox.ac.uk
Our ability to write DNA has recently expanded to the genomic scale. The possibility of defining every single base in the genome of a cell enables manipulation of the most fundamental cellular properties, such as the genetic code.
However, current genome synthesis methods are slow, narrow in scope, and limited in scale. To date, the genomes of only two bacteria have been successfully synthesized. This project aims to develop methodologies to make the synthesis of model organism genomes (i.e. E. coli) more rapid and enable the synthesis of the genomes of non-model bacteria to broaden the scope of genome synthesis.
The ability to routinely synthesize the genomes of a diverse set of organisms will not only allow reprogramming of the genetic code but also facilitate testing of generative genome designs. Ultimately, the combination of microbial genome synthesis and artificial intelligence will enable biological design at the organism scale with implications in bioproduction, human health, agriculture, and beyond.
- Robertson WE, Rehm FBH, Spinck M, et al. Escherichia coli with a 57-codon genetic code. Science. 2025;390(6771)
- Zürcher, J. F. et al. Continuous synthesis of E. coli genome sections and Mbscale human DNA assembly. Nature 619, 555–562 (2023).
- Fredens, J. et al. Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514–518 (2019).
- Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56 (2010).
- Wang, K. et al. Defining synonymous codon compression schemes by genome recoding. Nature 539, 59–64 (2016).
- Lead supervisor: Rongzhen Tian (Principal Investigator, Generative Biology Institute, EIT & Associate Professor of Biological Chemistry, Department of Chemistry, University of Oxford) - rtian@eit.org
- Co-supervisor: Eunyoung Chae (Associate Professor of Plant Pathology, Department of Biology, University of Oxford) - eunyoung.chae@biology.ox.ac.uk
Darwinian evolution has shaped all life on earth. As scientists, we would like to harness these principles to create new molecules in real time - for example, to rapidly create therapeutic molecules to tackle pandemics. However, the timescales of Darwinian evolution in living organisms are long and thus incompatible with fast discovery of new functions in the laboratory.
To overcome this limitation, we have established a synthetic orthogonal replication system in E. coli (EcORep) that enables selective and rapid mutation exclusively on the DNA of interest without increasing the host genome's mutation rate. In combination with selection pressures, this technology allows us to create proteins with new functions in a few days.
Building upon these capabilities, this project aims to develop advanced approaches and platforms enabling robust, high-throughput evolution of proteins with diverse functional properties. The first objective is to establish a reliable and versatile selection platform that incorporates a broad spectrum of selection pressures, thereby allowing different pressures to be readily applied or combined for the evolution of distinct protein functions. The second objective is to develop and optimize novel evolutionary tools that are highly orthogonal and strongly mutagenic toward target genes, with improved mutational spectra and the ability to dynamically modulate selection pressures, thereby allowing different pressures to be readily applied or combined to drive the evolution of distinct protein functions. These tools will be further integrated into the automated high-throughput cell culture platform to enable continuous, automated evolution.
This project will provide a unique opportunity to explore protein evolutionary landscapes, yield mechanistic insights into functional diversification, and generate large-scale, high-quality protein datasets to advance the training of next-generation protein design models.
- Molina, R. S. et al. In vivo hypermutation and continuous evolution. Nat. Rev. Methods Prim. 2, 36 (2022).
- Tian, R. et al. Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli. Science 383, 421–426 (2024).
- Rix, G. et al. Continuous evolution of user-defined genes at 1 million times the genomic mutation rate. Science 386, eadm9073 (2024).
- Lead supervisor: Linda van Bijsterveldt (Principal Investigator, Generative Biology Institute, EIT) - linda.vanbijsterveldt@eit.org
- Co-supervisor: Anjali Hinch (Group Leader, Sir Henry Dale and Wellcome-Beit Fellow, Dunn School of Pathology, University of Oxford) - anjali.hinch@path.ox.ac.uk
Synthetic chromosomes enable the engineering of entirely new biological functions - from nitrogen fixation in cereals to human antibody production in livestock - without disrupting host genomes. However, these artificial genetic elements are often lost when organisms reproduce. Current methods achieve inheritance rates as low as 11%, severely limiting applications in agriculture and medicine.
The challenge lies in meiosis – the specialised cell division producing eggs and sperm. Natural chromosomes pair up and exchange DNA segments, ensuring each reproductive cell receives exactly one copy. Without this pairing mechanism, synthetic chromosomes distribute randomly and are frequently lost.
We recently generated mouse embryonic stem cells carrying intact human chromosomes [1]. This system provides an ideal platform to engineer human-derived synthetic chromosomes with improved inheritance. This project aims to incorporate specialised DNA elements that enable artificial chromosomes to pair and recombine during meiosis.
Beyond improving inheritance, engineering controlled recombination opens fascinating biological questions: How do chromosomes recognise their partners? Can we program crossover locations to accelerate trait evolution? Could these mechanisms enable chromosome shuffling in non-reproductive cells for research or therapy? How might we further engineer synthetic chromosome architecture to ensure stability across different organisms and cell types? Related projects explore these questions – together reimagining how artificial chromosomes function across species.
- G Petris, S Grazioli, L van Bijsterveldt. Et al., High-fidelity human chromosome transfer and elimination. Science (2025).
- AG Hinch et al. Factors influencing meiotic recombination revealed by whole-genome sequencing of single sperm. Science (2019).
- B Davies et al. Re-engineering the zinc fingers of PRDM9 reverses hybrid sterility in mice. Nature (2016).
- Lead supervisor: Fabian Rehm (Principal Investigator, Generative Biology Institute, EIT) - fabian.rehm@eit.org
- Co-supervisor: Tobias Warnecke (Associate Professor, Department of Biochemistry, University of Oxford) - tobias.warnecke@bioch.ox.ac.uk
Evolution is nature’s optimisation algorithm. Over billions of years, it has repeatedly found solutions to challenges that human engineers still have not come close to designing from scratch. Like a neural network learning by trial and error, evolution explores vast landscapes of possibilities and gradually converges on high-performing solutions. However, natural evolution operates far too slowly to help us solve urgent modern-day problems.
We will build and utilise accelerated evolution systems that enable genes to mutate and improve continuously inside living cells. These in vivo evolution platforms dramatically speed up the evolutionary process, compressing what would normally take thousands/millions of years in nature into days or weeks in the lab (see references 1-3). So far, most continuous evolution technologies have focused on optimising single-gene phenotypes, such as improving an enzyme’s activity. But many of the most exciting biological capabilities (e.g. metabolism, signalling, motility, spatial organisation, communication, stress tolerance) emerge only when multiple genes, pathways, or cells work together. These complex phenotypes are exactly the kinds of problems that evolution is uniquely suited to optimise.
We aim to tackle the following questions:
- How can we evolve complex, multi-gene traits within a single experiment?
- How do we build selective pressures that reward the behaviours we want while minimising escape routes?
- How can we borrow principles from natural evolution, such as modularity, gene amplification, or cooperation, to help engineered organisms explore richer evolutionary pathways?
As biological problems become more complex, the limitations of rational design become obvious. Trying to manually engineer every component of a sophisticated trait is often impossible. Instead, we pursue ‘irrational design’ - building living evolutionary systems that allow the cell itself to search through immense genetic space and discover optimal solutions for user-defined problems.
We are looking for students that are excited about the idea of using evolution as a programmable problem-solving engine. You will help develop next-generation in vivo evolution technologies and apply them to both fundamental and applied questions.
- Molina, R. S. et al. In vivo hypermutation and continuous evolution. Nat. Rev. Methods Prim. (2022).
- Rix, G. et al. Continuous evolution of user-defined genes at 1 million times the genomic mutation rate. Science (2024).
- Tian, R. et al. Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli. Science (2024).
- Lead supervisor: Kiarash Jamali (Principal Investigator, Generative Biology Institute, EIT) - kiarash.jamali@eit.org
- Co-supervisor: Michael Bronstein (DeepMind Professor of Artificial Intelligence, Department of Computer Science, University of Oxford) - michael.bronstein@cs.ox.ac.uk
Large generative models trained on internet-scale data have transformed machine learning in the past few years. One of the key ingredients in their success has been the availability of high-quality and diverse data sources. While machine learning has also recently transformed biology for structure prediction
(AlphaFold2) and design (RFDiffusion), the lack of large-scale structural data has hindered further progress.
So far, most structure prediction and design models have been trained on the Protein Data Bank, a database of experimentally resolved structures that contains a few hundred thousand structures, many of which are homologues of other entries in the database. Economically and therapeutically relevant classes of complexes such as protein-RNA and protein-ligand interactions are even a smaller subset of this data. This data sparsity has limited the application of generative models in areas such as enzyme design and drug discovery.
In contrast, sequence data is abundant. Large sequence datasets have already been used for training protein and genome language models at scale. Furthermore, it has been shown that protein and genomic language models encode structures even though they are only trained to predict sequences, with a computationally efficient training run afterwards capable of extracting unsupervised structural features learned by these networks.
The hypothesis of the PhD project is that training models by mixing large sequence data with sparse supervision from structure data will enable learning structure-aware representations that would improve generalisation compared to current structure-only approaches. Concretely, this PhD project will consist of designing semi-supervised and unsupervised approaches to jointly embed sequence and structure, focused on its applicability to modelling protein-ligand interactions. These representations will then be used for training new structure prediction and design models that will be more robust to overfitting on complexes with scarce structural data. Throughout this project, the student will not only develop expertise in representation learning and unsupervised learning approaches for biology, they will also learn how to train machine learning models at large scale on the Generative Biology Institute’s GPU cluster. Furthermore, they will gain in-depth knowledge about protein structures, their interactions with other molecules, and the relevance of this for biology.
- Jumper et al., 2021. Highly accurate protein structure prediction with AlphaFold. Nature.
- Abramson et al., 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature.
- Lin et al., 2023. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science.
- Zhang et al., 2025. Predicting protein-protein interactions in the human proteome. Science.
- Watson et al., 2023. De novo design of protein structure and function with RFdiffusion. Nature.
- He et al., 2021. Masked Autoencoders Are Scalable Vision Learners. CVPR
- Lead supervisor: Kiarash Jamali (Principal Investigator, Generative Biology Institute, EIT) - kiarash.jamali@eit.org
- Co-supervisor: Charlotte Deane (Professor of Structural Bioinformatics, Department of Statistics, University of Oxford) - deane@stats.ox.ac.uk
Generative modelling has recently transformed protein design. Proteins are now being routinely designed and validated in the wet lab using the same core generative modelling tools used to generate new images, video, and text (e.g. transformer-based diffusion models). This has allowed a wide variety of new-to-nature applications, such as binder and enzyme design.
However, current methods for generative modelling of proteins are still at an early stage. The advent of diffusion modelling in proteins has not been followed with the same type of algorithmic improvements that have occurred in other parts of machine learning, such as image generation. Symptoms of this include reduced diversity of generated proteins compared to natural proteins, a relative lack of controllability of these designed proteins, and the need for sampling tricks (such as decreased temperatures) for acceptable performance. The pipeline of protein design using generative models typically follows a two-step process: (1) protein backbone design using a generative model (such as RFDiffusion) and (2) designing a sequence to fold to this structure using an inverse folding neural network (such as ProteinMPNN). In this pipeline, sequence and structure are predicted sequentially, which makes atom-level control and joint optimisation more difficult.
Recently, models have been proposed to generate the sequence and structure at the same time. While exciting, these models often struggle with generating sequences that correspond to the structure of the designed protein. Nevertheless, this is a promising line of research as it enables the introduction of atomic-level inductive biases, such as those learned from molecular dynamics simulations, into these generative models.
The lab is focused on designing and training state-of-the-art generative models that will co-design the sequence and structure of proteins for a wide variety of experimental applications such as binding target proteins, catalysing important reactions, or interacting with nucleic acids for gene editing. This will include rigorous machine learning algorithm development encompassing architecture design, data sampling strategies, the use of protein-inspired priors, and synthetic data generation. Successful designs in silico will then be tested in the wet lab. Throughout this project, the student will not only develop expertise in diffusion modelling approaches for biology, they will also learn to train machine learning models at scale on the Generative Biology Institute’s GPU cluster. Furthermore, they will gain strong intuition about proteins and how their interactions with partners shape their biological activity.
- Jumper et al., 2021. Highly accurate protein structure prediction with AlphaFold. Nature.
- Abramson et al., 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature.
- Lin et al., 2023. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science.
- Zhang et al., 2025. Predicting protein-protein interactions in the human proteome. Science.
- Watson et al., 2023. De novo design of protein structure and function with RFdiffusion. Nature.
- Yi*, Jamali* & Scheres, 2025. All-atom inverse protein folding through discrete flow matching. ICML
- Lead supervisor: Leo Parts (Principal Investigator, Generative Biology Institute, EIT) - leopold.parts@eit.org
- Co-supervisor: Mira Kassouf (RDM Principal Investigator, Weatherall Institute of Molecular Medicine, University of Oxford) - mira.kassouf@imm.ox.ac.uk
Methods for designing protein sequences to a prescribed outcome are making strides, and it is straightforward to transfer the results into cellular genomes. It is far less obvious what additional DNA needs to be written to obtain a mammalian chromosome that performs to a design. The non-coding DNA has roles in regulation, structure, and protection of genetic information, but in our own genomes, these are sparsely scattered over an expanse of gigabases of sequence. Understanding how these functions are encoded requires experimenting on the non-coding DNA at the scale of billions of base pairs.
To efficiently interrogate all the gigabases of our genome, we use long, targeted genomic deletions. These alleles allow observing the consequences of DNA removal on gene expression, phenotype, and cellular functions of kilobases of sequence at a time. We have already validated successful efficient generation of deletions of hundreds of kilobases in HEK293T and HAP1 cell lines using pooled prime editing screens [1]. We can generate deletions of up to 4kb efficiently, and have demonstrated ones over 1Mb in size; we can also pool CRISPR/Cas3-based reagents that generate targeted, but stochastic deletions of tens of kilobases. The next challenge is to apply this new ability to remove kilobases of sequence in massively parallel experiments to perform large-scale pooled screens of thousands of variants covering all human chromosomes, to ultimately understand human genome composition, and function of non-coding elements.
The aim of this Ph.D project is to interrogate human non-coding DNA at scale. To do so, you will design and execute systematic gene editing experiments for deleting sequences across the human genome at different resolutions in a haploid cell line. Computationally, you will learn to design CRISPR prime editing libraries for generating deletions, aiming to systematically remove all DNA in a region (e.g. one chromosome at a time) at 1kb resolution, as well as to profile non-coding elements (parts of promoter, proximal enhancers, distal enhancers, splice regions, UTRs, open chromatin regions, conserved regions, etc) at higher resolution. Experimentally, you will then make and use the designed library to generate data on human non-coding DNA function at unprecedented scale to understand the building principles of our genomes, and inform predictive and generative AI models.
- Generating long deletions across the genome with pooled paired prime editing screens. Weller et al. (2025) bioRxiv. https://www.biorxiv.org/content/10.1101/2025.11.03.686307v1.abstract
- Lead supervisor: Leo Parts (Principal Investigator, Generative Biology Institute, EIT) - leopold.parts@eit.org
- Co-supervisor: Mira Kassouf (RDM Principal Investigator, Weatherall Institute of Molecular Medicine, University of Oxford) - mira.kassouf@imm.ox.ac.uk
Writing DNA to function requires choosing a chromosome structure, the genes within with their order and orientation, and the corresponding regulatory elements. The impact of these choices that shape the content and organization of chromosomes remains poorly understood. This gap is due to a lack of tools for combinatorially engineering configurations of DNA sequence at the scale needed to deduce the limits, rules, and biases that govern the outcomes of the design decisions.
To address this gap, we have generated a toolbox to create deletions, inversions, translocations, and duplications at scale by randomizing DNA sequence within the cells using recombinases [1,2]. Briefly, we use prime editing to write several recombinase recognition sequences into the genome, and then use the stochastically generated structural variants as a source of random sequence configurations. For example, enhancer clusters recruit transcription factors and activate genes from a distance of one million base pairs and more. However, how individual enhancers within a cluster interact and how their spacing and relative orientations drive gene expression is not well understood. In a single experiment, we created over 100 variants of these enhancer cluster regions in their endogenous context and mapped the resulting gene expression levels using CRISPR prime editing, identifying the configurations that can drive wild type gene expression, and the responsible enhancers [2].
The goal of this Ph.D project is to apply the randomization toolbox on different problems and at different lengthscales. On the chromosome scale, we want to understand the flexibility of configuring the broadest chromosomal domains spanning multiple megabases. On the gene cluster scale, we are interested in understanding the scenarios that render gene order and relative position impactful on the expression outcome. On a single gene scale, we want to understand enhancer action and cooperativity for regulating gene expression at a distance. For each of these projects, you will learn genome engineering using CRISPR prime editing and recombinases, combined with large scale screening, genomic technologies, and statistical as well as machine learning modeling of the outcomes, all targeted to answer the question of how should we configure the genomes we write for them to function as intended.
- Engineering structural variants to interrogate genome function. Koeppel et al (2024). Nature Genetics. https://doi.org/10.1038/s41588-024-01981-7
- Enhancer scrambling: systematic randomization of mammalian regulatory landscapes via CRISPR prime editing and recombinases. Murat, Koeppel, et al. (2025). bioRxiv. https://doi.org/10.1101/2025.01.14.632548
- Lead supervisor: Leo Parts (Principal Investigator, Generative Biology Institute, EIT) - leopold.parts@eit.org
- Co-supervisor: Mira Kassouf (RDM Principal Investigator, Weatherall Institute of Molecular Medicine, University of Oxford) - mira.kassouf@imm.ox.ac.uk
Before we develop the ability to write entire chromosomes, we can engineer the ones that already exist. The current CRISPR-Cas based genome editing tools allow exploring the boundaries of genome function by radically changing their composition. For example, constructing minimal chromosomes via iterative deletion of non-essential regions can reveal the requirements for chromosome segregation and replication, as well as inform whether most of human DNA really is “junk”, or required in some configuration. Alternatively, loading the genome with risk alleles from association studies enables testing hypotheses on mutational load for common diseases, or removing xenoantigens from genomes of other species could make them safer transplant donors for humans.
It is already possible to apply biased mutational processes to steer chromosome composition towards desired regions that are not accessible for normal evolution. For example, we recently created the most engineered human genomes to date by iteratively installing recombinase recognition sites into repetitive sequences in the human HEK293T and HAP1 cells [1]. By applying the same editing process for a year, we could generate over 1,600 sequence insertions into one cell line. We are looking to apply similar ideas and principles to create chromosomes with exceptional properties over the course of multiple years.
The goal of this combined wetlab and computational Ph.D project is to apply a mutation process on a human cell line with the aim of radically changing its genome towards a chosen target. To do so, you will pick an outcome (a natural one is to iteratively minimize a chromosome), design reagents to edit the genome towards it, develop processes to rapidly iterate maximally mutagenic editing rounds, implement it with the aid of automation and technical staff at the EIT, and thoroughly characterize the resulting genome. This ability to mutate genomes towards a desired outcome is a natural complement to fully rewriting the entire DNA, and can become a robust tool for writing new functions into the genome.
- Randomizing the human genome by engineering recombination between repeat elements. Koeppel, Ferreira, et al. (2025) Science. https://doi.org/10.1126/science.ado3979
- Lead supervisor: Leo Parts (Principal Investigator, Generative Biology Institute, EIT) - leopold.parts@eit.org
- Co-supervisor: Peter Minary (Associate Professor of Computer Science, Department of Computer Science, University of Oxford) - peter.minary@cs.ox.ac.uk
The ability to write DNA at scale, ultimately designing and synthesizing full chromosomes, will fundamentally change our ability to engineer biology. DNA synthesis has become cheaper and faster, and we are increasingly capable of writing long stretches of DNA into genomes. The next challenge is not just technical synthesis, but predictive design: given we are able to write DNA, how do we ensure that the sequences we compose will function as we intended inside the cells? As demonstrated in recent internal experiments, substantial departures from human genomes greatly lower the performance of predictive models. For new genomes, we need new computational methods that generalize beyond the standard set of chromosomes all models are currently trained on.
Substantial progress in predictive models has already been made, with Enformer and AlphaGenome models as a latest crop of sequence-based predictors, building on datasets from ENCODE, GTEX, and other large-scale data collection efforts. These and similar models can predict DNA methylation, gene expression, chromatin states and accessibility, transcription factor binding and chromatin conformation for human and some model organism genomes. These datasets have enabled deep learning models like Enformer and AlphaGenome, which predict functional readouts such as expression, accessibility, and transcription factor binding directly from DNA sequence. These models are a valuable prediction baseline, but are trained on a fundamentally limited set of cellular contexts.
This computational project aims to use in-house data to develop models capable of forecasting the function of new chromosomes, in collaboration with the AI efforts at the EIT. Starting from pre-trained architectures like AlphaGenome, the work will explore few-shot learning and fine-tuning on internal datasets of engineered genomes, incorporating the synthetic biology experiments that probe sequence-function relationships. The goal is a model that can generalize beyond the human training genome to accurately predict expression patterns, chromatin structure, and viability for new sequence designs. Such a predictive framework would establish a foundation for rational chromosome engineering that we could build on to turn the rapidly evolving DNA synthesis technology into the ability to write biological function.
- AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model. Avsec et al (2025). bioRxiv. https://doi.org/10.1101/2025.06.25.661532
Project information to follow
- Lead supervisor: To Be Announced (Principal Investigator, Generative Biology Institute, EIT)
- Co-supervisor: Harrison Steel (Associate Professor of Engineering Science, Department of Engineering Science, University of Oxford) - harrison.steel@eng.ox.ac.uk
Modern protein design is supported by the three pillars of data, algorithms, and compute. This project, which will be part of a multi-person collaborative effort, focuses on the use of continuous evolution systems to generate high volumes of labeled protein evolutionary data whose distributions are strategically shaped for training protein design models. Particular protein classes of interest include antibodies and enzymes. This project may also admit the inclusion of genetically encoded unnatural amino acids in continuous evolution experiments and will likely involve automation.
- Alcantar et al. Mapping the evolution of computationally designed protein binders. bioRxiv (2025). https://doi.org/10.1101/2025.10.04.680454
- Bennett et al. Atomically accurate de novo design of antibodies with RFdiffusion. Nature (2025). https://doi.org/10.1038/s41586-025-09721-5
- Rix et al. Continuous evolution of user-defined genes at 1-million-times the genomic mutation rate. Science 386, eadm9073 (2024).
- Lead supervisors:
- Prof Dirk Aarts (Professor of Chemistry, Department of Chemistry, University of Oxford) - dirk.aarts@chem.ox.ac.uk
- Prof John Frater (Professor of Infectious Diseases, Nuffield Department of Medicine, University of Oxford) - john.frater@ndm.ox.ac.uk
- Co-supervisor: Dr Kiarash Jamali (Principal Investigator, Generative Biology Institute, EIT) - kiarash.jamali@eit.org
Upon infection, vaccination or in autoimmune pathology, only a tiny fraction of our immune response becomes activated and subsequently expanded. Given the enormous diversity of the immune repertoire (e.g. ~ 10^15 B-cell receptors, or BCRs) and the unique sequences encoded in each individual, measuring and understanding the body’s response is a fundamental unsolved challenge impacting a wide range of therapies.
The Aarts and Frater groups have developed a microfluidics platform to perform paired BCR or T-cell receptor (TCR) single-cell sequencing. This technology is built on state-of-the-art microfluidic droplet/hydrogel barcoding. It has been validated for single-channel operation, achieving ~250,000 paired reads per sample, far beyond what is feasible with conventional droplet single-cell platforms. To understand and ultimately use the measured data sets, the most powerful way forward is to develop a Protein Language Model (PLM) for antibodies, which builds on the expertise of the Jamali group.
We propose to (i) measure extensive data sets of single cell, paired protein immune repertoire sequences for specific diseases, and (ii) build and train a PLM, able to perform tasks such as predicting protein stability, binding affinity, sequence recovery, expression, and paired antibody chain generation. This will ultimately lead to a general model of antibodies across health and different diseases, with the ability to generate disease-specific repertoires.
The first half of the DPhil project will focus on the measurement of large data sets. This requires training in soft lithography and microscopy techniques as well as familiarization with the biochemical assays used in the experimental pipeline. The Aarts and Frater labs have a dedicated microfluidics facility set up to process human samples. We will initially obtain sequences from healthy donors, and then work on viral infections caused by common pathogens and HIV, the latter being the core research area of the Frater lab. As data are obtained, we will train small PLMs to learn the scaling laws for these models. This will allow us to predict how much data to gather.
In the second half of the project, we will develop more advanced PLMs that explicitly model the sequence and structure of paired antibody chains. This is lacking in existing PLMs but may be essential in properly capturing the hypervariable complementarity-determining regions (CDRs), which are currently hard to model. Moreover, we will employ this experimental pipeline to autoimmune diseases, where often the target antigen is unknown. Preliminary data shows convergence between patient samples in GAD associated autoimmune TCRs2, which we hypothesize to also exist in the antibody repertoire for certain autoimmune diseases. Here, we expect the PLM to be crucial in extracting this information and, ultimately, helping to find targets for immunotherapies.
- Lead supervisor: Prof KJ Patel (Director, Weatherall Institute of Molecular Medicine, University of Oxford) - ketan.patel@imm.ox.ac.uk
- Co-supervisor: Dr Linda Van Bijsterveld (Principal Investigator, Generative Biology Institute, EIT) - linda.vanbijsterveldt@eit.org
The human bone marrow is arguably the most productive factory in the body, maintaining life by generating an astonishing billion new blood cells every single day. This continuous and tightly regulated process, known as hematopoiesis, is driven by a small, rare population of Hematopoietic Stem and Progenitor Cells (HSPCs), which possess the unique and essential capacity for lifelong self-renewal and multilineage differentiation. The theoretical principle of indefinite blood production is clearly established in vivo, famously demonstrated by the ability to perform sequential bone marrow transplantation in mice across multiple generations without loss of function. This indicates that immortal blood production is biologically achievable. However, applying this principle to human cells has proven critically challenging. While murine HSPCs can be successfully expanded in culture, human HSPCs are fundamentally limited: recent breakthroughs allow for the brief expansion of human blood stem cells in culture, but they are unable to sustain self-renewal and quickly differentiate or undergo senescence. This inability to maintain primary human HSPCs ex vivo indefinitely prevents long-term mechanistic studies of blood disorders, prolonged drug screening campaigns, and scalable cell production for clinical therapies.
This proposal aims to overcome this translational roadblock by establishing a universally applicable strategy for the immortalization of human hematopoiesis. The overall objective is to generate stably immortalized, yet functionally competent, human hematopoietic progenitor cell lines that faithfully replicate the expansive potential of their murine counterparts. The specific aims are to identify and validate a minimal, safe, and potentially reversible set of genetic factors (e.g., specific transcription factors or telomerase activators) capable of inducing long-term self-renewal in primary human HSPCs; to characterize the resulting immortalized lines for genetic stability and progenitor marker expression; and finally, to demonstrate the functional competence of the immortalized lines by inducing in vitro multilineage differentiation (myeloid and lymphoid) under defined conditions. This project will employ state-of-the-art genetic engineering and cell culture techniques, prioritizing the use of inducible or non-integrating vector systems (e.g., Tet-On systems or Episomal vectors) to introduce a focused library of immortalizing factors into human HSPCs. Stable, proliferating clones will be meticulously selected and subjected to rigorous validation, including karyotyping for chromosomal integrity and Colony Forming Unit (CFU) assays to confirm their multilineage potential. Success will yield an unlimited, genetically stable resource for high-throughput drug screening and personalized medicine. Crucially, the achievement of stable immortalization opens the door to creating a platform for advanced cellular engineering.
This potentially allows for the development of universal transplantable HSPCs by subsequently utilizing gene-editing tools to introduce individual self-immune (HLA) codes or render the cells immunologically inert. Such engineered cells could circumvent the need for tissue matching in hematopoietic stem cell transplantation (HSCT), fundamentally transforming the landscape of curative hematologic and immunologic therapies.
- Yamamoto R, Wilkinson AC, Ooehara J, Lan X, Lai CY, Nakauchi Y, Pritchard JK, Nakauchi H. Large-Scale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell. 2018 Apr 5;22(4):600-607.e4. doi: 10.1016/j.stem.2018.03.013. PMID: 29625072;
- akurai M, Ishitsuka K, Ito R, Wilkinson AC, Kimura T, Mizutani E, Nishikii H, Sudo K, Becker HJ, Takemoto H, Sano T, Kataoka K, Takahashi S, Nakamura Y, Kent DG, Iwama A, Chiba S, Okamoto S, Nakauchi H, Yamazaki S. Chemically defined cytokine-free expansion of human haematopoietic stem cells. Nature. 2023 Mar;615(7950):127-133. doi: 10.1038/s41586-023-05739-9. Epub 2023 Feb 22. PMID: 36813966.
- Wilkinson AC, Ishida R, Kikuchi M, Sudo K, Morita M, Crisostomo RV, Yamamoto R, Loh KM, Nakamura Y, Watanabe M, Nakauchi H, Yamazaki S. Long-term ex vivo haematopoietic-stem-cell expansion allows nonconditioned transplantation. Nature. 2019 Jul;571(7763):117-121. doi: 10.1038/s41586-019-1244-x. Epub 2019 May 29. PMID: 31142833
- Lead Supervisor: Prof. Dame Molly Stevens (John Black Professor of Bionanoscience, Institute of Biomedical Engineering / Department of Physiology, Anatomy & Genetics, University of Oxford) - molly.stevens@dpag.ox.ac.uk
- Co-supervisor: Fabian Rehm (Principal Investigator, Generative Biology Institute, EIT) - fabian.rehm@eit.org
3D culture systems, particularly organoids, have revolutionised our ability to study human development and disease. Despite this progress, current organoid methodologies fall short in modelling complex, heterogeneous tissues such as the brain and heart. These limitations stem largely from the inability to establish stable morphogen axes and from size constraints that prevent full tissue-level organisation. Yet, the formation of multi-axis morphogen gradients is essential for guiding differentiation and achieving organ-wide patterning in vitro.
Existing approaches, including microfluidic gradient generators, have not delivered the precision, throughput, or multiplexing capacity required to build more advanced tissue models. To address this gap, this project will investigate the integration of biomaterials and synthetic biology to enable spatially and temporally controlled morphogen production.
The student will develop biomaterials (e.g. semi-permeable membranes or scaffolds) capable of encapsulating engineered bacteria that can remain viable in stem-cell culture conditions. A critical component of this work is ensuring that the encapsulated bacteria cannot proliferate and do not release any toxic by-products that could compromise sensitive stem-cell populations. The engineered bacteria will be designed to produce specific small molecules or proteins (e.g. inhibitors, morphogens) in response to external stimuli, ideally with both spatial and temporal control. If successful, this strategy would represent a step-change for in vitro developmental models. Precise spatial and temporal control over protein release would unlock experimental capabilities that are currently unattainable. In particular, generating controlled concentration gradients of growth factors across an organoid during culture, an essential feature of natural development. This remains an extraordinarily challenging with existing technologies and this project seeks to overcome that barrier.
This approach could be further developed into the orthogonal release of multiple growth factors (e.g. Wnt and Shh) or closed-loop systems which respond to natural fluctuations in extracellular components.
- Lead Supervisors: Michael Bonsall (Professor of Mathematical Biology, Department of Biology, University of Oxford) - michael.bonsall@biology.ox.ac.uk
- Co-supervisor: Rongzhen Tian Rongzhen Tian (Principal Investigator, Generative Biology Institute, EIT & Associate Professor of Biological Chemistry, Department of Chemistry, University of Oxford) - rtian@eit.org
Virus diversity is remarkable. Understanding the dynamics of viruses has both applied and fundamental relevance to biology, science and society. While considerable advances have been made in the epidemiology and spread of virus, their evolutionary origin remains unclear. Discovering the origin of viruses is key to understanding their evolutionary history and the mechanisms driving their interactions with host cells. This knowledge could further unlock new potential for using viruses in therapeutic applications, such as developing novel antiviral strategies or creating therapeutic molecular delivery tools.
Several theories have been proposed regarding the origin of viruses, with one of the most widely discussed being the "escape hypothesis." This theory suggests that viruses may have evolved from bits of genetic material that broke away from living cells that gained the ability to survive on their own and infect other cells. While this hypothesis remains compelling, there is currently limited experimental evidence to support it.
This PhD project will broadly focus on how engineering evolution, dynamics of viruses and mathematical modelling can be combined to quantify the likely evolutionary origins of viruses.
With the development of directed evolution, significant advancements have been made in accelerating evolutionary processes within controlled laboratory settings. Among the most powerful tools is the orthogonal replication system, which allows for rapid, selective mutation and selection of specific DNA sequences without altering the overall mutation rate of the host genome. This system enables the rapid creation of vast libraries of genetic variations. Building on these capabilities, this project aims to combine experimental continuous evolution of E. coli and its bacteriophages, together with computational simulations and machine learning approaches of viral evolutionary dynamics, to observe how genetic elements evolve under specific selective pressures.
This approach will provide a unique opportunity to explore the evolutionary steps that could have led to the emergence of viruses, offering valuable insights into their origins and the forces that continue to shape viral evolution.
- Cobb, R.E., Chao, R. & Zhao, H. (2013) Directed Evolution: Past, Present and Future. AIChE J., 59, 1432-1440.
- O’Brien, J. T., George, A.M., & Bonsall, M.B. (2025). The origins of viruses: evolutionary dynamics of the escape hypothesis. Frontiers in Virology, 5, 1555137.
- Tian, R., Rehm, F.B.H., Czernecki, D., Gu, Y., Zürcher, J.F., Liu, K.C. & Chin, J.W. (2024) Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli. Science, 383, 421-426.
Project information to follow
Admissions are now open!
The deadline for applications will be 8th January 2026 and interviews are expected to take place during mid-February.
Enquiries
For enquiries about the programme, please contact genbio-dtp@chem.ox.ac.uk