Predicting gene expression from DNA sequence using deep learning models

Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Ying, P. et al. Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk. Nat. Commun. 14, 5958 (2023).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Sokolova, K., Chen, K. M., Hao, Y., Zhou, J. & Troyanskaya, O. G. Deep learning sequence models for transcriptional regulation. Annu. Rev. Genomics Hum. Genet. 25, 105–122 (2024).

Article  PubMed  CAS  Google Scholar 

Novakovsky, G., Dexter, N., Libbrecht, M. W., Wasserman, W. W. & Mostafavi, S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 24, 125–137 (2023). This Review provides a detailed description of interpretation methods for sequence-to-expression models.

Article  PubMed  CAS  Google Scholar 

Capitanchik, C., Wilkins, O. G., Wagner, N., Gagneur, J. & Ule, J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat. Rev. Genet. https://doi.org/10.1038/s41576-024-00774-2 (2024).

La Fleur, A., Shi, Y. & Seelig, G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev. 38, 843–865 (2024).

Article  PubMed  PubMed Central  Google Scholar 

van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).

Article  PubMed  Google Scholar 

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

Article  PubMed  CAS  Google Scholar 

LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

Article  Google Scholar 

Vaswani, A. et al. Attention is all you need. Preprint at https://arxiv.org/abs/1706.03762 (2017).

Eraslan, G., Avsec, Z., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).

Article  PubMed  CAS  Google Scholar 

Stormo, G. D. Modeling the specificity of protein–DNA interactions. Quant. Biol. 1, 115–130 (2013).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015). Describing DeepSEA, this pioneering paper uses a convolutional neural network to predict the effects of non-coding variants on epigenomic tracks using only DNA sequence as input.

Article  PubMed  PubMed Central  CAS  Google Scholar 

Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

Article  PubMed  CAS  Google Scholar 

Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Avsec, Z. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). Describing Enformer, this study was among the first to effectively use transformers to capture long-distance enhancer–promoter interactions, enabling the prediction of epigenomic and expression tracks across multiple cell types.

Article  PubMed  PubMed Central  CAS  Google Scholar 

Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 56 (2023). This article highlights some limitations of current transformer sequence-to-expression models.

Article  PubMed  PubMed Central  Google Scholar 

He, A. Y., Palamuttam, N. P. & Danko, C. G. Training deep learning models on personalized genomic sequences improves variant effect prediction. Preprint at bioRxiv https://doi.org/10.1101/2024.10.15.618510 (2024).

Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. https://doi.org/10.1038/s41588-024-02053-6 (2025). This paper describes Borzoi, the current state-of-the-art transformer-based model that predicts RNA-seq, among other tracks, by capturing transcription, splicing and poly-adenylation signals.

Toneyan, S. & Koo, P. K. Interpreting cis-regulatory interactions from large-scale deep neural networks. Nat. Genet. 56, 2517–2527 (2024).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Penzar, D. et al. LegNet: a best-in-class deep learning model for short DNA regulatory regions. Bioinformatics https://doi.org/10.1093/bioinformatics/btad457 (2023).

Rafi, A. M. et al. A community effort to optimize sequence-based deep learning models of gene regulation. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02414-w (2024).

Article  PubMed  Google Scholar 

Cochran, K. et al. Dissecting the cis-regulatory syntax of transcription initiation with deep learning. Preprint at bioRxiv https://doi.org/10.1101/2024.05.28.596138 (2024).

Article  PubMed  PubMed Central  Google Scholar 

Dudnyk, K., Cai, D., Shi, C., Xu, J. & Zhou, J. Sequence basis of transcription initiation in the human genome. Science 384, eadj0116 (2024).

Article  PubMed  PubMed Central  CAS  Google Scholar 

He, A. Y. & Danko, C. G. Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation. Preprint at bioRxiv https://doi.org/10.1101/2024.03.13.583868 (2024).

Article  PubMed  PubMed Central  Google Scholar 

Naqvi, S. et al. Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage. Cell Genom. 5, 100780 (2025).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Zrimec, J. et al. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat. Commun. 11, 6141 (2020).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020).

Article  PubMed  CAS  Google Scholar 

Lee, B. H. & Rhie, S. K. Molecular and computational approaches to map regulatory elements in 3D chromatin structure. Epigenet. Chromat. 14, 14 (2021).

Article  CAS  Google Scholar 

Zhang, Y. et al. MLSNet: a deep learning model for predicting transcription factor binding sites. Brief Bioinform. https://doi.org/10.1093/bib/bbae489 (2024).

Avsec, Z. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Zhang, Q. et al. Base-resolution prediction of transcription factor binding signals by a deep learning framework. PLoS Comput. Biol. 18, e1009941 (2022).

Article  PubMed  PubMed Central  CAS  Google Scholar 

Brennan, K. J.

Comments (0)

No login
gif