- In this page, I highlight my research on how machine learning abd computational models enable understaning, mapping and predicting cellular biology.
- Before 2018 as part of my AI programm master thesis, I developed Deep Packet , the first neural netowrk architecure on network traffic classification and has become one of the seminal paper in the field.
- Papers selected as cover:
- See a full list of papers on Google Scholar.
Generative AI for Modeling Single-Cell Perturbation
During my doctoral studies, I developed a series of generative AI algorithms to predict out-of-distribution cellular behaviors in response to perturbations (e.g., diseases, drugs, CRISPR KOs). Using these models, one can predict and answer counterfactual questions such as, "What would the gene expression of this cell have looked like if it had been treated differently?"
The first approach is called the single-cell generator (scGen). It models perturbation effects using simple arithmetic in the latent space. Later, we formulated the problem as a distribution matching scenario, known as trVAE, where we aim to move cells from a control distribution to a perturbed condition.
Finally, during my time at Facebook AI, we developed the composition perturbation autoencoder (CPA), which can predict combinatorial perturbations such as drug combinations or double CRISPR KOs. We also extended CPA to predict unseen drugs (chemCPA) and to support multiple modalities (multiCPA).
In addition to the above, we have written a perspective about the challenges and opportunities in this emerging field.
scGen predicts single-cell perturbation responses.Lotfollahi, M., Wolf, F. A. & Theis, F. J.
Conditional out-of-distribution generation for unpaired data using transfer VAE.Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A.
Predicting cellular responses to complex perturbations in high‐throughput screens.Lotfollahi, M+., Klimovskaia Susmelj+, A., De Donno, C+., Hetzel, L., Ji, Y., Ibarra, I. L., ... & Theis, F. J.
Predicting single-cell perturbation responses for unseen drugs.
Hetzel, L., Böhm, S., Kilbertus, N., Günnemann, S., Lotfollahi, M., & Theis, F. (2022).
Machine learning for perturbational single-cell omics.
Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J.
MultiCPA: Multimodal Compositional Perturbation Autoencoder. Inecik, K., Uhlmann, A., Lotfollahi, M.*, & Theis, F*.
Out-of-distribution prediction with disentangled representations for single-cell RNA sequencing dataLotfollahi, M.+, Dony, L.+, Agarwala, H.+, & Theis, F. J.
Generative AI for modeling high content Microscopy image
Advancements in high-throughput screening, particularly in high-content microscopy, have accelerated drug target identification and mode of action studies by allowing the exploration of complex phenotypic data. However, scaling these experiments to encompass a wide range of drug or genetic manipulations is challenging because only a limited number of compounds exhibit activity in screenings.
Predicting cell morphological responses to perturbations using generative modeling
To address this, We developed a generative model, the IMage Perturbation Autoencoder (IMPA), which predicts cellular morphological effects of chemical and genetic perturbations using untreated cells as input.
Palma, A., Theis, F. J.*, Lotfollahi,M*.
single-cell reference mapping
The availability of single-cell reference datasets and mapping algorithms transforms analytical workflows for single-cell sequencing datasets. These reference atlases are generated with the intention of helping individual labs in the field understand their own data. Single-cell reference mapping address the question of how this can be done efficiently and in a reusable fashion enabling information accumulated from multiple prior experiments to help interpret new data. The ultimate goal is to transition from an expert-centric and tedious pipeline to a rapid, accessible, and accurate procedure for beginners and experts alike.
We introduced the first deep learning algorithm to map single-cell datasets in to pretrained reference building methods called single-cell architecture surgery (scArches). scArches receives a pretrained model and a query data, and map the query data to the reference without retraining the reference model. scArches is now widely used by the community to understand the disease, development, in vivo/vitro differences, imputing missing modalities, and transferring cell-type annotations from reference to query by mapping those query datasets on to a reference atlas. We later introduced treeArches to not just update the reference but also cell-type hierarchies.
We extended reference mapping to support multiple data modalities such as RNA/ATAC using Multigrate. Additionally, we developed expiMap to learn novel gene programs for query datasets. Furthermore, we improved technical aspects by leveraging continuous embeddings with scPoli and using continual learning strategies through continual surgery. scArches repository now serves as a unified framework integrating many applications of single-cell reference mapping including the above.
Mapping single-cell data to reference atlases by transfer learningLotfollahi, M., Naghipourfar, M., Luecken, M. D., Khajavi, M., Büttner, M., Wagenstetter, M., Avsec, Ž., Gayoso, A., Yosef, N., Interlandi, M. & Others.
Single-cell reference mapping to construct and extend cell type hierarchies
Michielsen,L+.,Lotfollahi, M.+, Strobl, D., Sikkema, L., Reinders, M. J. T., Theis, F. J.,Mahfouz, A.
Multimodal Modeling of Single-Cell Data
The integration and simultaneous analysis of genomic, epigenomic, transcriptomic, proteomic, and metabolomic data at the single-cell level are revolutionizing our understanding of cell biology in both normal and diseased states.
We have developed two innovative generative models to facilitate this integration:
Multigrate: This model enables the integration of partially overlapping single-cell modalities to construct a comprehensive multimodal reference atlas. It incorporates single-cell chromatic accessibility, transcriptomics, and surface protein abundance.
mvTCR: This model is designed to integrate T-cell receptor sequences with single-cell RNA-seq data.
Multigrate: Single-Cell Multi-Omic Data Integration.Lotfollahi. M+., Litinetskaya, A+. and Theis, F. J.
Integrating T-cell receptor and transcriptome for large-scale single-cell immune profiling analysis.
Drost, F., An, Y., Dratva, L. M., Lindeboom, R. G. H., Haniffa, M., Teichmann, S. A., Theis, F., Lotfollahi, M.*, Schubert, B*.
Biologically Informed Deep Learning for Single-Cell Genomics
The availability of large-scale single-cell atlases has provided us with detailed insights into cell states. At the same time, advancements in deep learning have facilitated the rapid analysis of query datasets by mapping them into reference atlases. However, the existing data transformations learned by these methods lack interpretability in terms of biologically known concepts such as genes or pathways.
To address this limitation, we introduced two methods: expiMap and intercode. These methods embed single-cell data within a biologically meaningful space that captures the activity of gene programs. Additionally, we demonstrate the feasibility of learning novel gene programs using expiMap.
Biologically Informed Deep Learning to Query Gene Programs in Single-Cell AtlasesLotfollahi M, M+., Rybakov, S+., Hrovatin, K., Hediyeh-Zadeh, S., Talavera-López, C., Misharin, A. V., & Theis, F. J.
Learning Interpretable Latent Autoencoder Representations with Annotations of Feature Sets
S. Rybakov, M. Lotfollahi, F.J. Theis, F.A. Wolf.
Population-level integration of single-cell datasets
The increasing generation of population-level single-cell atlases with hundreds or thousands of samples has the potential to link demographic and technical metadata with high-resolution cellular and tissue data in homeostasis and disease. Constructing such comprehensive references requires large-scale integration of heterogeneous cohorts with varying metadata capturing demographic and technical information.
We introduced, scPoli, scPoli learns both sample and cell representations, is aware of cell-type annotations and can integrate and annotate newly generated query datasets while providing an uncertainty mechanism to identify unknown populations. It can explain sample-level biological and technical variations such as disease, anatomical location and assay by means of its novel sample embeddings.
Population-level integration of single-cell datasets enables multi-scale analysis across samples.
De Donno, C., Hediyeh-Zadeh, S., Wagenstetter, M., Moinfar, A. A., Zappia, L.,Lotfollahi, M.* , & Theis, F. J *.