Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015).
Article PubMed PubMed Central CAS Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Article PubMed PubMed Central CAS Google Scholar
Ying, P. et al. Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk. Nat. Commun. 14, 5958 (2023).
Article PubMed PubMed Central CAS Google Scholar
Sokolova, K., Chen, K. M., Hao, Y., Zhou, J. & Troyanskaya, O. G. Deep learning sequence models for transcriptional regulation. Annu. Rev. Genomics Hum. Genet. 25, 105–122 (2024).
Article PubMed CAS Google Scholar
Novakovsky, G., Dexter, N., Libbrecht, M. W., Wasserman, W. W. & Mostafavi, S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 24, 125–137 (2023). This Review provides a detailed description of interpretation methods for sequence-to-expression models.
Article PubMed CAS Google Scholar
Capitanchik, C., Wilkins, O. G., Wagner, N., Gagneur, J. & Ule, J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat. Rev. Genet. https://doi.org/10.1038/s41576-024-00774-2 (2024).
La Fleur, A., Shi, Y. & Seelig, G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev. 38, 843–865 (2024).
Article PubMed PubMed Central Google Scholar
van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article PubMed CAS Google Scholar
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Vaswani, A. et al. Attention is all you need. Preprint at https://arxiv.org/abs/1706.03762 (2017).
Eraslan, G., Avsec, Z., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
Article PubMed CAS Google Scholar
Stormo, G. D. Modeling the specificity of protein–DNA interactions. Quant. Biol. 1, 115–130 (2013).
Article PubMed PubMed Central CAS Google Scholar
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015). Describing DeepSEA, this pioneering paper uses a convolutional neural network to predict the effects of non-coding variants on epigenomic tracks using only DNA sequence as input.
Article PubMed PubMed Central CAS Google Scholar
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Article PubMed CAS Google Scholar
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
Article PubMed PubMed Central CAS Google Scholar
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
Article PubMed PubMed Central CAS Google Scholar
Avsec, Z. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). Describing Enformer, this study was among the first to effectively use transformers to capture long-distance enhancer–promoter interactions, enabling the prediction of epigenomic and expression tracks across multiple cell types.
Article PubMed PubMed Central CAS Google Scholar
Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 56 (2023). This article highlights some limitations of current transformer sequence-to-expression models.
Article PubMed PubMed Central Google Scholar
He, A. Y., Palamuttam, N. P. & Danko, C. G. Training deep learning models on personalized genomic sequences improves variant effect prediction. Preprint at bioRxiv https://doi.org/10.1101/2024.10.15.618510 (2024).
Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. https://doi.org/10.1038/s41588-024-02053-6 (2025). This paper describes Borzoi, the current state-of-the-art transformer-based model that predicts RNA-seq, among other tracks, by capturing transcription, splicing and poly-adenylation signals.
Toneyan, S. & Koo, P. K. Interpreting cis-regulatory interactions from large-scale deep neural networks. Nat. Genet. 56, 2517–2527 (2024).
Article PubMed PubMed Central CAS Google Scholar
Penzar, D. et al. LegNet: a best-in-class deep learning model for short DNA regulatory regions. Bioinformatics https://doi.org/10.1093/bioinformatics/btad457 (2023).
Rafi, A. M. et al. A community effort to optimize sequence-based deep learning models of gene regulation. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02414-w (2024).
Cochran, K. et al. Dissecting the cis-regulatory syntax of transcription initiation with deep learning. Preprint at bioRxiv https://doi.org/10.1101/2024.05.28.596138 (2024).
Article PubMed PubMed Central Google Scholar
Dudnyk, K., Cai, D., Shi, C., Xu, J. & Zhou, J. Sequence basis of transcription initiation in the human genome. Science 384, eadj0116 (2024).
Article PubMed PubMed Central CAS Google Scholar
He, A. Y. & Danko, C. G. Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation. Preprint at bioRxiv https://doi.org/10.1101/2024.03.13.583868 (2024).
Article PubMed PubMed Central Google Scholar
Naqvi, S. et al. Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage. Cell Genom. 5, 100780 (2025).
Article PubMed PubMed Central CAS Google Scholar
Zrimec, J. et al. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat. Commun. 11, 6141 (2020).
Article PubMed PubMed Central CAS Google Scholar
Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020).
Article PubMed CAS Google Scholar
Lee, B. H. & Rhie, S. K. Molecular and computational approaches to map regulatory elements in 3D chromatin structure. Epigenet. Chromat. 14, 14 (2021).
Zhang, Y. et al. MLSNet: a deep learning model for predicting transcription factor binding sites. Brief Bioinform. https://doi.org/10.1093/bib/bbae489 (2024).
Avsec, Z. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
Article PubMed PubMed Central CAS Google Scholar
Zhang, Q. et al. Base-resolution prediction of transcription factor binding signals by a deep learning framework. PLoS Comput. Biol. 18, e1009941 (2022).
Article PubMed PubMed Central CAS Google Scholar
Brennan, K. J.
Comments (0)