publications | Micah Olivas

2026

uSortM: scalable isolation of user-defined sequences from pooled gene libraries

Micah Olivas^*, Patrick Almhjell^*, Jack Shanahan, and Polly Fordyce

2026

Abstract GitHub Preprint

High-throughput sequencing and computational protein design have created a growing gap between the discovery of new proteins and their functional characterization. In many instances, functional characterization requires one-to-one measurements—such as when detailed biochemical insights are desired or pooled selections are not possible—necessitating that individual variants be isolated and assayed. A major barrier to closing this gap is the cost to directly synthesize individual genes, which remains prohibitively expensive ($10–100 per sequence) and restricts these studies to small subsets of relevant variants, leaving many sequences without functional annotation. To address this, we developed user-defined Sorted Mutants (uSort-M), which combines pooled DNA synthesis, automated cell sorting of transformed E. coli, and long-read sequencing to rapidly isolate and identify variants from diverse libraries. uSort-M can isolate, sequence, and validate individual variants from pooled libraries produced via diverse existing methods including multiplex assembly, error-prone PCR, or pooled site-directed mutagenesis. Sorting single bacterial clones into 384-well plates is efficient: eight plates (3,072 wells) can be filled in 1–2 hours, with up to 90% of wells yielding monoclonal cultures. Commercial long-read sequencing enables accessible, fast, and cost-effective identification of individual sequences from isolated clones while tolerating wide variation in fragment length and diversity across the library. Applying this workflow to a 328-member scanning mutagenesis library of a 300-bp gene recovered 96% of desired variants at fivefold lower cost than traditional synthesis. Numerical simulations identify key parameters governing library recovery and enable accurate prediction of the sampling effort required to achieve target coverage. As library size increases, this workflow offers substantial savings over traditional gene synthesis or cloning. Due to its generalizability, efficiency, and reliance on standard instrumentation, uSort-M removes a key barrier to large-scale protein functional characterization.

2022

Designing active and thermostable enzymes with sequence-only predictive models

Clara Fannjiang^*, Micah Olivas^*, and others

In NeurIPS LMRL Workshop, 2022

Abstract

Data-driven models of protein fitness can be useful in designing novel proteins with improved properties, but many questions remain regarding how and in what settings they should be used. Here, we ask: How can we use predictive models of protein fitness, whose predictions we might not always trust, to design protein sequences enhanced for multiple fitness functions? We propose a general approach for doing so, and apply it to design novel variants of eight different acylphosphatase and lysozyme wild types, intended to be more thermostable and at least as catalytically active as the wild types. Our method does not require a structure, experimental measurements of activity, curation of homologous sequences, or family-specific thermostability data. Experimental characterizations of our designed sequences, as well as sequences designed by PROSS, a competitive baseline method for improving protein thermostability, are currently underway and forthcoming.

2020

CRISPR screens in cancer spheroids identify 3D growth-specific vulnerabilities

Kyuho Han, Sarah E. Pierce, Amy Li, Kaitlyn Spees, Gray R. Anderson, Jose A. Seoane, Yuan-Hung Lo, Michael Dubreuil, Micah Olivas, and others

Nature, 2020

Abstract

Cancer genomics studies have identified thousands of putative cancer driver genes1. Development of high-throughput and accurate models to define the functions of these genes is a major challenge. Here we devised a scalable cancer-spheroid model and performed genome-wide CRISPR screens in 2D monolayers and 3D lung-cancer spheroids. CRISPR phenotypes in 3D more accurately recapitulated those of in vivo tumours, and genes with differential sensitivities between 2D and 3D conditions were highly enriched for genes that are mutated in lung cancers. These analyses also revealed drivers that are essential for cancer growth in 3D and in vivo, but not in 2D. Notably, we found that carboxypeptidase D is responsible for removal of a C-terminal RKRR motif2 from the α-chain of the insulin-like growth factor 1 receptor that is critical for receptor activity. Carboxypeptidase D expression correlates with patient outcomes in patients with lung cancer, and loss of carboxypeptidase D reduced tumour growth. Our results reveal key differences between 2D and 3D cancer models, and establish a generalizable strategy for performing CRISPR screens in spheroids to reveal cancer vulnerabilities.