publications

Publications are listed in reverse chronological order. For an up-to-date list, see Google Scholar.

preprints

2026

uSort-M: Scalable isolation of user-defined sequences from diverse pooled libraries

Micah B. Olivas^*, Patrick J. Almhjell^*, Lillian K. Brixi, Jack D. Shanahan, and Polly M. Fordyce

bioRxiv, 2026

Abstract PDF GitHub Website Preprint

Advances in high-throughput sequencing and computational protein design are expanding the catalog of known protein sequences far more rapidly than they can be functionally characterized. Functional characterization through biochemical and biophysical assays often requires isolating variants to profile them individually, a process which is often laborious and expensive. To address this, we developed user-defined Sorted Mutants (uSort-M), which can rapidly isolate and identify individual variants from widely available pools of genes by leveraging automated cell sorting and long-read sequencing technologies. To develop the uSort-M pipeline, we first parsed a 328-member scanning mutagenesis library of a 300-bp gene. Direct comparison between short-read and long-read sequencing demonstrated that both methods resolve variants with high fidelity, enabling recovery of 96% of desired library members by sorting eight 384-well plates at fivefold lower cost than traditional synthesis. After optimizing for long-read sequencing, we demonstrated uSort-M’s generalizability to complex libraries by parsing a sequence- and length-diverse 500-member library, recovering 88% of variants from shallow oversampling (<3-fold). Library recoveries could be accurately predicted by simulating library uniformity, transformation number, sorting efficiency, and per-base error rates, and we extrapolated these simulations to predict the sampling depth needed to process large libraries containing thousands of members. To facilitate adoption, these simulations are packaged alongside data analysis and workflow management tools in an open-source Python toolkit with an interactive dashboard. By implementing standard instrumentation in a generalizable workflow, uSort-M provides an efficient and cost-effective solution for large library generation, thereby removing a key barrier to large-scale protein functional characterization.

2022

Designing active and thermostable enzymes with sequence-only predictive models

Clara Fannjiang^*, Micah Olivas^*, and others

In NeurIPS LMRL Workshop, 2022

Abstract PDF OpenReview

Data-driven models of protein fitness can be useful in designing novel proteins with improved properties, but many questions remain regarding how and in what settings they should be used. Here, we ask: How can we use predictive models of protein fitness, whose predictions we might not always trust, to design protein sequences enhanced for multiple fitness functions? We propose a general approach for doing so, and apply it to design novel variants of eight different acylphosphatase and lysozyme wild types, intended to be more thermostable and at least as catalytically active as the wild types. Our method does not require a structure, experimental measurements of activity, curation of homologous sequences, or family-specific thermostability data. Experimental characterizations of our designed sequences, as well as sequences designed by PROSS, a competitive baseline method for improving protein thermostability, are currently underway and forthcoming.

2020

CRISPR screens in cancer spheroids identify 3D growth-specific vulnerabilities

Kyuho Han, Sarah E. Pierce, Amy Li, Kaitlyn Spees, Gray R. Anderson, Jose A. Seoane, Yuan-Hung Lo, Michael Dubreuil, Micah Olivas, and others

Nature, 2020

Abstract PDF

Cancer genomics studies have identified thousands of putative cancer driver genes1. Development of high-throughput and accurate models to define the functions of these genes is a major challenge. Here we devised a scalable cancer-spheroid model and performed genome-wide CRISPR screens in 2D monolayers and 3D lung-cancer spheroids. CRISPR phenotypes in 3D more accurately recapitulated those of in vivo tumours, and genes with differential sensitivities between 2D and 3D conditions were highly enriched for genes that are mutated in lung cancers. These analyses also revealed drivers that are essential for cancer growth in 3D and in vivo, but not in 2D. Notably, we found that carboxypeptidase D is responsible for removal of a C-terminal RKRR motif2 from the α-chain of the insulin-like growth factor 1 receptor that is critical for receptor activity. Carboxypeptidase D expression correlates with patient outcomes in patients with lung cancer, and loss of carboxypeptidase D reduced tumour growth. Our results reveal key differences between 2D and 3D cancer models, and establish a generalizable strategy for performing CRISPR screens in spheroids to reveal cancer vulnerabilities.