Machine learning (ML) applications within diagnostic histopathology have been extremely successful. While many successful models have been built using general-purpose models trained largely on everyday objects, there is a recent trend toward pathology-specific foundation models, trained using histopathology images. Pathology foundation models show strong performance on cancer detection and subtyping, grading, and predicting molecular diagnoses. However, we have noticed lacunae in the testing of foundation models. Nearly all the benchmarks used to test them are focused on cancer. Neoplasia is an important pathologic mechanism and key concern in much of clinical pathology, but it represents one of many pathologic bases of disease. Non-neoplastic pathology dominates findings in the placenta, a critical organ in human development, as well as a specimen commonly encountered in clinical practice. Very little to none of the data used in training pathology foundation models is placenta. Thus, placental pathology is doubly out of distribution, representing a useful challenge for foundation models. We developed benchmarks for estimation of gestational age, classifying normal tissue, identifying inflammation in the umbilical cord and membranes, and in classification of macroscopic lesions including villous infarction, intervillous thrombus, and perivillous fibrin deposition. We tested 5 pathology foundation models and 4 non-pathology models for each benchmark in tasks including zero-shot K-nearest neighbor classification and regression, content-based image retrieval, supervised regression, and whole-slide attention-based multiple instance learning. In each task, the best performing model was a pathology foundation model. However, the gap between pathology and non-pathology models was diminished in tasks related to inflammation or those in which a supervised task was performed using model embeddings. Performance was comparable among pathology foundation models. Among non-pathology models, ResNet consistently performed worse, while models from the present decade showed better performance. Future work could examine the impact of incorporating placental data into foundation model training.
Competing Interest StatementCooper reports honoraria from Risk Appraisal Forum, Lynn Sage Breast Cancer Foundation, Jayne Koskinas Ted Giovanis Foundation for Health & Policy and Consultation for Tempus. The other authors have no relevant financial interest in the products or companies described in this article.
Funding StatementGoldstein is supported by National Institutes of Health (NIH) R01EB030130 and UG3OD035546. Cooper is supported by NIH U24NS133949, R01LM013523, and U01CA220401. Infrastructure used by this work, including REDCap, is supported by UL1TR001422.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study was approved by the Institutional Review Board of Northwestern University Feinberg School of Medicine in Chicago, Illinois, as STU00214052.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityBenchmarking data are available from Northwestern after execution of a data use agreement.
Comments (0)