Skip to main content

Research

Overview

  • In this page, I highlight my research on how machine learning and computational models enable understanding, mapping, and predicting cellular biology.
  • Before 2018, as part of my AI program master thesis, I developed Deep Packet, the first neural network architecture for network traffic classification, and it has become one of the seminal papers in the field.
  • Papers selected as cover:

Generative AI for Modeling Single-Cell Perturbation

During my doctoral studies, I developed a series of generative AI algorithms to predict out-of-distribution cellular behaviors in response to perturbations (e.g., diseases, drugs, CRISPR KOs). Using these models, one can predict and answer counterfactual questions such as, "What would the gene expression of this cell have looked like if it had been treated differently?"

Alt text

The first approach is called the single-cell generator (scGen). It models perturbation effects using simple arithmetic in the latent space. Later, we formulated the problem as a distribution matching scenario, known as trVAE, where we aim to move cells from a control distribution to a perturbed condition.

Finally, during my time at Facebook AI, we developed the composition perturbation autoencoder (CPA), which can predict combinatorial perturbations such as drug combinations or double CRISPR KOs. We also extended CPA to predict unseen drugs (chemCPA) and to support multiple modalities (multiCPA).

In addition to the above, we have written a perspective about the challenges and opportunities in this emerging field.

scGen predicts single-cell perturbation responses.

Lotfollahi, M., Wolf, F. A. & Theis, F. J.

[Nature Methods (2019)], [code], [press].


Conditional out-of-distribution generation for unpaired data using transfer VAE.

Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A.

[Bioinformatics (2020)], [code], [talk at ECCB 2020 (21.18% accept rate)].


Predicting cellular responses to complex perturbations in high-throughput screens.

Lotfollahi, M+., Klimovskaia Susmelj+, A., De Donno, C+., Hetzel, L., Ji, Y., Ibarra, I. L., ... & Theis, F. J.

[Molecular Systems Biology (2023)], [code], [Facebook AI blogpost], [state of AI report 2021], [featured cover].


Predicting single-cell perturbation responses for unseen drugs.

Hetzel, L., Böhm, S., Kilbertus, N., Günnemann, S., Lotfollahi, M., & Theis, F. (2022).

[NeurIPS (2022)], [code].


Machine learning for perturbational single-cell omics.

Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J.

[Cell Systems (2021)], [data resource].


MultiCPA: Multimodal Compositional Perturbation Autoencoder. Inecik, K., Uhlmann, A., Lotfollahi, M.*, & Theis, F*.

[ICML Workshop on Computational Biology (WCB) 2022], [bioRxv], [code].


Out-of-distribution prediction with disentangled representations for single-cell RNA sequencing data

Lotfollahi, M.+, Dony, L.+, Agarwala, H.+, & Theis, F. J.

[ICML Workshop on Computational Biology (WCB) 2020], [spotlight talk ICML WCB 2020], [code]

Generative AI for modeling high content Microscopy image

Advancements in high-throughput screening, particularly in high-content microscopy, have accelerated drug target identification and mode of action studies by allowing the exploration of complex phenotypic data. However, scaling these experiments to encompass a wide range of drug or genetic manipulations is challenging because only a limited number of compounds exhibit activity in screenings. Alt text

Predicting Cell Morphological Responses to Perturbations Using Generative Modeling

To address this, we developed a generative model, the Image Perturbation Autoencoder (IMPA), which predicts cellular morphological effects of chemical and genetic perturbations using untreated cells as input.

Palma, A., Theis, F. J.*, Lotfollahi, M.*.

[bioRxiv (2023)], [code].

Modeling Tissue and Spatial Biology

Alt text

Spatial omics holds great potential to elucidate tissue architecture by dissecting underlying cell niches and cellular interactions. However, we lack an end-to-end computational framework that can effectively integrate different spatial omics tissue samples, quantitatively characterize cell niches based on biological knowledge of cell-cell communication and transcriptional regulation pathways, and discover spatial molecular programs of cells. We present NicheCompass, a graph deep learning method designed based on the principles of cellular communication. It utilizes existing knowledge of inter- and intracellular interaction pathways to learn an interpretable latent space of cells across multiple tissue samples, enabling the construction and querying of spatial reference atlases.

Birk, S., Bonafonte-Pardàs, I., Feriz, A. M.,, ... & Lotfollahi, M.*.

[code] [bioRxv (2024)].

Single-cell Reference Mapping

Alt text

The availability of single-cell reference datasets and mapping algorithms transforms analytical workflows for single-cell sequencing datasets. These reference atlases are generated with the intention of helping individual labs in the field understand their own data. Single-cell reference mapping addresses the question of how this can be done efficiently and in a reusable fashion, enabling information accumulated from multiple prior experiments to help interpret new data. The ultimate goal is to transition from an expert-centric and tedious pipeline to a rapid, accessible, and accurate procedure for beginners and experts alike.

We introduced the first deep learning algorithm to map single-cell datasets into pretrained reference building methods called single-cell architecture surgery (scArches). scArches receives a pretrained model and a query dataset, and maps the query data to the reference without retraining the reference model. scArches is now widely used by the community to understand disease, development, in vivo/vitro differences, imputing missing modalities, and transferring cell-type annotations from reference to query by mapping those query datasets onto a reference atlas. We later introduced treeArches to not just update the reference but also cell-type hierarchies.

We extended reference mapping to support multiple data modalities such as RNA/ATAC using Multigrate. Additionally, we developed expiMap to learn novel gene programs for query datasets. Furthermore, we improved technical aspects by leveraging continuous embeddings with scPoli and using continual learning strategies through continual surgery. The scArches repository now serves as a unified framework integrating many applications of single-cell reference mapping including the above.

Mapping Single-cell Data to Reference Atlases by Transfer Learning

Lotfollahi, M., Naghipourfar, M., Luecken, M. D., Khajavi, M., Büttner, M., Wagenstetter, M., Avsec, Ž., Gayoso, A., Yosef, N., Interlandi, M., & Others.

[Nature Biotechnology (2022)], [code], [MDSI best paper award], [featured cover in Nature Biotechnology].


Single-cell Reference Mapping to Construct and Extend Cell Type Hierarchies

Michielsen, L+., Lotfollahi, M.+, Strobl, D., Sikkema, L., Reinders, M. J. T., Theis, F. J., Mahfouz, A.

[NAR Genomics (2024)], [code].

Multimodal Modeling of Single-Cell Data

The integration and simultaneous analysis of genomic, epigenomic, transcriptomic, proteomic, and metabolomic data at the single-cell level are revolutionizing our understanding of cell biology in both normal and diseased states.

Alt text

We have developed two innovative generative models to facilitate this integration:

  1. Multigrate: This model enables the integration of partially overlapping single-cell modalities to construct a comprehensive multimodal reference atlas. It incorporates single-cell chromatin accessibility, transcriptomics, and surface protein abundance.

  2. mvTCR: This model is designed to integrate T-cell receptor sequences with single-cell RNA-seq data.

Multigrate: Single-Cell Multi-Omic Data Integration.

Lotfollahi, M+, Litinetskaya, A+, and Theis, F. J.

[Contributed talk Award at ICML Workshop on Computational Biology 2021], [code], [bioRxv (2022)].


Integrating T-cell receptor and transcriptome for large-scale single-cell immune profiling analysis.

Drost, F., An, Y., Dratva, L. M., Lindeboom, R. G. H., Haniffa, M., Teichmann, S. A., Theis, F., Lotfollahi, M.*, Schubert, B*.

[ICML Workshop on Computational Biology 2021], [code], [bioRxv (2021)].

Biologically Informed Deep Learning for Single-Cell Genomics

The availability of large-scale single-cell atlases has provided us with detailed insights into cell states. At the same time, advancements in deep learning have facilitated the rapid analysis of query datasets by mapping them into reference atlases. However, the existing data transformations learned by these methods lack interpretability in terms of biologically known concepts such as genes or pathways.

Alt text

To address this limitation, we introduced two methods: expiMap and intercode. These methods embed single-cell data within a biologically meaningful space that captures the activity of gene programs. Additionally, we demonstrate the feasibility of learning novel gene programs using expiMap.

Biologically Informed Deep Learning to Query Gene Programs in Single-Cell Atlases

Lotfollahi, M, M+, Rybakov, S+, Hrovatin, K., Hediyeh-Zadeh, S., Talavera-López, C., Misharin, A. V., & Theis, F. J.

[Nature Cell Biology (2023)], [code].


Learning Interpretable Latent Autoencoder Representations with Annotations of Feature Sets

S. Rybakov, M. Lotfollahi, F.J. Theis*, F.A. Wolf*.

[Machine Learning in Computational Biology (2020)], [code].

Population-level Integration of Single-Cell Datasets

The increasing generation of population-level single-cell atlases with hundreds or thousands of samples has the potential to link demographic and technical metadata with high-resolution cellular and tissue data in homeostasis and disease. Constructing such comprehensive references requires large-scale integration of heterogeneous cohorts with varying metadata capturing demographic and technical information.

Alt text

We introduced scPoli, which learns both sample and cell representations, is aware of cell-type annotations, and can integrate and annotate newly generated query datasets while providing an uncertainty mechanism to identify unknown populations. It

De Donno, C., Hediyeh-Zadeh, S., Wagenstetter, M., Moinfar, A. A., Zappia, L.,

Lotfollahi, M.* , & Theis, F. J *.

[code], [Nature Methods (2023)].