Multi-omics driven paradigm for construction of traditional Chinese Medicine Zheng (syndrome) diagnosis and treatment model, taking Shi Zheng (syndrome of dampness) as an example

Physical characteristics of the clinical subjects

We divided healthy controls and SZ patients into two cohorts. Cohort 1 was the discovery cohort and was used for biomarker screening, while cohort 2 was the validation cohort for internal validation of biomarkers. First, we compared the age, BMI of the healthy control (HC) and SZ groups before they were split into cohorts. The results revealed no significant difference in age between the HC and SZ groups (Fig. 1B). The BMI in the SZ group were significantly greater than those in the healthy group (P < 0.01) (Fig. 1C). After the data were split into two cohorts, no significant difference in age between the SZ and HC groups in cohorts 1 and 2 was found, and the BMI maintained their original trends, indicating that the grouping was relatively random and did not change the original distribution of the data (Fig. 1B–G). Additionally, studies have shown a strong correlation between obesity and SZ [13], and obesity can cause abnormal water metabolism. The increase in BMI in the SZ group also supported this finding.

Hexacosanoyl carnitine, leukotriene B4, prostaglandin E2, and other 30 substances were identified as differential metabolites of SZ

After UPLC-MS analysis and data preprocessing, 3302 negative ions and 5302 positive ions were obtained from cohort 1. We first conducted principal component analysis (PCA) on the metabolic data of the HC and SZ individuals in cohort 1. The PCA scatter plot (Fig. 2A, D) in the negative and positive ion modes showed good separation characteristics. Next, we conducted supervised pattern recognition OPLS-DA (NEG: R2 = 0.966, Q2 = 0.963; POS: R2 = 0.983, Q2 = 0.975). The OPLS-DA scatter plot (Fig. 2B, E) in the negative and positive ion modes showed good separation characteristics, and 200 response sorting tests also revealed that the models under the two ion modes (Fig. 2C, F) did not show overfitting. Next, VIP prediction values were obtained based on the OPLS-DA model, and VIP > 0.8 and P < 0.05 were used as screening conditions to obtain 4141 differential ions (1608 negative ions and 2533 positive ions).

Fig. 2figure 2

Screening and enrichment analysis of differential metabolite. AC PCA, OPLS-DA scatter plots, and permutation plots of negative ions showed that the metabolic data of healthy controls and SZ patients had good separation characteristics. DF PCA, OPLS-DA scatter plots and permutation plots of positive ions showed that the metabolic data of healthy controls and SZ patients had good separation characteristics. G Heat map of 30 endogenous differential metabolites. H KEGG enrichment analysis map of 30 differential metabolites

To avoid false positives in subsequent validation (although there is a significant difference in the increasing or decreasing trend of differential metabolites in the two cohorts,, which still shows good diagnostic value), we removed ions with inconsistent average increasing or decreasing trends between the HC and SZ groups and obtained 35 overlapping ions. The ion chemical formula and name were determined based on the retention time (Rt) and mass–charge ratio (m/z) of overlapping ions, as well as, secondary mass spectrometry (MS/MS) data, and the Human Metabolome Database (HMDB) was searched (http://www.hmdb.ca). By matching the metabolic products with the mass spectrometry database, 30 endogenous differential metabolites were identified (Fig. 2G) (Supplementary Table 3). Subsequently, KEGG enrichment analysis was performed on the 30 different metabolites. The results revealed that SZ was mainly involved in five categories, including metabolism, environmental information processing, cellular processes, organismal systems, and human diseases (Fig. 2H).

1-Octanesulfonic acid, N2-succinyl-l-ornithine, and hexacosanoyl carnitine were selected as potential diagnostic indicators of SZ

The ROC curve is a common tool used to evaluate the performance of classification models. It uses the AUC to predict diagnostic value, but this method does not change with the changes in the distribution of categories. The random forest is a classifier that contains many decision trees. It can accurately classify the data and reveal important features. However, due to its randomness, the prediction results often fluctuate. LASSO regression can exclude unimportant features for modeling, effectively address multicollinearity problems, and provide a more stable model, but the number of selected features may be lower. All three methods have certain advantages and disadvantages. To screen more accurate and stable biomarkers, we simultaneously used the three methods to screen important features. The 30 differential metabolites were screened following the conditions of AUC > 0.8, P < 0.05, and 10 important metabolic characteristics were obtained (Supplementary Fig. 2A), which included hexacosanoyl carnitine, N2-succinyl-l-ornithine, 5-(p-methylphenyl)-5-phenylhydantoin, leukotriene B4, (4R,5S,7R,11x)-11,12-dihydroxy-1(10)-spirovetiven-2-one 12-glucoside, 1-octanesulfonic acid, stampangine, 2-octenoic acid, 3-hydroxyquinine, and lithocholic acid glycine conjugate. The results of random forest analysis revealed seven important features (Supplementary Fig. 2B), which included 1,11-undecanedicarboxylic acid, 3-hydroxyquinine, 1-octanesulfonic acid, 5-(p-methylphenyl-phenylhydantoin, PA(18:0/22:6(5Z,8E,10Z,13Z,15E,19Z)-2OH(7S, 17S)), N2-succinyl-l-ornithine, and hexacosanoyl carnitine. LASSO regression analysis revealed 12 features (Supplementary Fig. 2C, D), which included 1-octanesulfonic acid, leukotriene B4, N2-succinyl-l-ornithine, stampangine, 2-hydroxyphenylacetic acid glucuronide, diethone, all trans-decaprenyl diphosphate, 1-(4-methoxyphenyl)-1-penten-3-one, LysoPA (20:3(5Z,8Z,11Z)/0:0), hexacosanoyl carnitine, 2-octenoic acid, and melatonin glucuronide. Three potential biomarkers, 1-octanesulfonic acid, N2-succinyl-l-ornithine, and hexacosanoyl carnitine, were identified by the intersection of important features identified using the three methods (Supplementary Fig. 2E). The diagnostic value of the three potential biomarkers was subsequently verified in the validation cohort. The results of the ROC analysis revealed that the AUC of 1-octanesulfonic acid, N2-succinyl-l-ornithine, and hexacosanoyl carnitine was 0.76, 0.79, and 0.72; among them, the AUC of N2-succinyl-l-ornithine was the largest (Supplementary Fig. 2F).

GAL3ST1, KPNB1, SNCA, and other 271 substances were identified as differential proteins of SZ

The proteomic analysis yielded 1066 protein features, all of which we imported into SIMCA to obtain a PCA 3D scatter plot (Fig. 3A), which revealed that the proteomic data of the HC and SZ groups had good separation characteristics. In total, 1066 proteins were screened based on the criteria of P < 0.05 and FC > 1.2/FC < 0.83 (Fig. 3B); 271 differential proteins were obtained, and the heat map of the protein content of the top 50 proteins (in ascending order of P-value) is shown in Fig. 3C. Differential proteins were analyzed for GO enrichment, and pathways in the three categories were sorted by P-value. The top 10 pathways are shown in Fig. 3D. Finally, KEGG enrichment analysis was performed on the differential proteins (Fig. 3E), and 252 pathways were obtained. The terms in each category are sorted according to the P-value. The top five terms of the organismal systems category include complement and coagulation cascades, protein digestion and absorption, platelet activation, pancreatic secretion, and vascular smooth muscle contraction. The top five terms in the metabolism category were riboflavin metabolism, biotin metabolism, the pentose phosphate pathway, starch and sucrose metabolism, and arginine and proline metabolism. The top five terms in the human diseases category were Salmonella infection, Staphylococcus aureus infection, systemic lupus erythematosus, pathogenic Escherichia coli infection, and bacterial invasion of epithelial cells. The top five terms in the genetic information processing category were ribosome, ribosome biogenesis in eukaryotes, aminoacyl-tRNA biosynthesis, nonhomologous end-joining, and spliceosome. The top five terms in the environmental information processing category were the Rap1 signaling pathway, Ras signaling pathway, Apelin signaling pathway, calcium signaling pathway, and cGMP-PKG signaling pathway. The top five terms in the cellular processes category were motor proteins, regulation of actin cytoskeleton, focal adhesion, endocytosis, and tight junction.

Fig. 3figure 3

Screening and enrichment analysis of differential protein. A PCA 3D scatter plot showed that the protein data of healthy controls and SZ patients had good separation characteristics. B Volcano map of 271 differential protein screening. C Heatmap of top 50 differential proteins (P value). D GO enrichment analysis (The top 10 pathways are presented for each category). E KEGG enrichment analysis of 271 differential proteins (The top five pathways are presented for each category)

HSPA5, KRAS, and RACK1 are the top three proteins in the mutual aid connectivity

We performed GSEA on the main terms (top-most term under each category) selected by KEGG. Among these pathways, the complement and coagulation cascades pathway was highly expressed in the SZ group (Fig. 4B), whereas the Salmonella infection pathway, ribosome pathway, Rap1 signaling pathway, and motor protein pathway were expressed at low levels in the SZ group (Fig. 4C–F). We could not perform GSEA of riboflavin metabolism because there were few related proteins in the pathway (Fig. 4A). Therefore, we compared the changes in the contents of ACP1 and ACP5 in this pathway. ACP1 in the SZ group was significantly lower than that in the HC group (P < 0.01), and ACP5 was significantly greater (P < 0.05).

Fig. 4figure 4

Analysis of GSEA in top-most term under each category and protein mutual aid network. A Comparison of protein content related to Riboflavin metabolism pathway in Metabolism category. B GSEA analysis revealed elevated expression of Complement and coagulation cascades pathway in Organismal Systems category of SZ patients. C GSEA analysis showed decreased expression of Salmonella infection pathway in Human Diseases of SZ patients. D GSEA analysis showed decreased expression of Ribosome pathway in Genetic Information Processing of SZ patients. E GSEA analysis showed decreased expression of Rap1 signaling pathway in Environmental Information Processing of SZ patients. F GSEA analysis showed decreased expression of Motor proteins pathway in Cellular Processes of SZ patients. G Chord diagram of top1 pathway in each category. H Top 25 connectivity protein interactions network map

Next, we used chordal diagrams to show the corresponding proteins of each major pathway (Fig. 4G), in which the pathways associated with complement and coagulation cascades corresponded to the proteins C1R, CFB, C1QA, C1QB, C1QC, SERPING1, CFI, F13B, C2, C1S, and C4A. The riboflavin metabolism pathway corresponded to the proteins ACP1 and ACP5. The Salmonella infection pathway corresponded to the proteins MYL12B, ROCK2, PFN1, HSP90AB2P, RAB5A, FLNA, RAB7A, CDC42, ACTR2, RHOA, ARF6, DYNLL1, TUBA4A, RHOG, TUBB3, CYFIP1, and NCKAP1. The ribosome pathway corresponded to the proteins RPL12, RPL4, RPS20, RPL15, RPS16, RPS14, RPL7A, RPS4X, RPL30, and RPS21. The Rap1 signaling pathway corresponded to the proteins FYB1, KRAS, EGF, GNAI2, PFN1, CDH1, CDC42, RHOA, SIPA1, and TLN1. The motor protein pathway corresponded to the proteins MYL12B, TPM3, TPM2, MYL4, MYH11, CAPZB, CAPZA1, MYL6, DYNLL1, TUBA4A, and TUBB3. Next, we analyzed the mutual aid relationships of different proteins and obtained a network diagram with the top 25 connected proteins (Fig. 4H).

PPAR signaling pathway, taste transduction, oxytocin signaling pathway and other 18 common pathways are associated with SZ symptoms by joint analysis

By analyzing the metabolomics and proteomics results, we identified the biological processes (KEGG pathways) related to the occurrence and development of SZ. To systematically delineate the regulatory process from proteins to metabolism and reveal the upstream and downstream regulatory pathways of key proteins and metabolites, we conducted a joint analysis of metabolomics and proteomics. The analysis was divided into two parts: joint analysis based on the expression level and joint analysis on the pathway level. At the expression level, we ranked the differential metabolites and differential proteins according to the P-value and selected the top 20 metabolites and proteins for correlation analysis (Supplementary Fig. 3). The results revealed a correlation between the differential metabolites and differential proteins. Also, at the content level, the metabolomics data and proteomics data were strongly correlated. At the pathway level, we screened for common pathways enriched by metabolomics and proteomics (Fig. 5A); among them, 44 pathways were enriched by metabolomics, 252 pathways were enriched by proteomics, and 29 pathways were common to both (Fig. 5B).

Fig. 5figure 5

Analysis of metabolomics and proteomics common pathways and their association with SZ symptoms. A Venn plot of 29 pathways common to metabolomics and proteomics. B Bubble plot of the 29 common pathways. C Protein/metabolite-KEGG pathway-symptom association map screened out 18 common pathways (See supplementary Table 4 for references)

In TCM, SZ is a complex black box. Its manifestation is a series of symptoms, but its intrinsic etiology may involve multiple pathways, tissues, and organs. Therefore, to explain the diverse pathogenesis of SZ, we obtained differential protein/metabolite-common pathway-symptom flow maps based on the biological significance of the pathways and their correlation with SZ symptoms (See supplementary Table 4 for references) and retained the 18 pathways common to the differential metabolites and differential proteins (Fig. 5C). The 18 common pathways were pathogenic Escherichia coli infection, rheumatoid arthritis, arachidonic acid metabolism, Fc gamma R-mediated phagocytosis, oxytocin signaling pathway, GnRH signaling pathway, the C-type lectin receptor signaling pathway, serotonergic synapses, thermogenesis, the regulation of lipolysis in adipocytes, renin secretion, the peroxisome proliferator-activated receptor (PPAR) signaling pathway, bile secretion, inflammatory mediator regulation of TRP channels, circadian entrainment, vitamin digestion and absorption, fat digestion and absorption, and taste transduction. The results of the flow map revealed the pathways associated with SZ symptoms and could be traced upstream to the metabolites/proteins causing changes in the pathways, which revealed the molecular mechanism underlying SZ.

Pathogenic Escherichia coli infection, rheumatoid arthritis, PPAR signaling pathway, bile secretion, GnRH signaling pathway, fat digestion and absorption are the key pathways associated with the main symptoms of SZ

The diagnostic symptoms of SZ are divided into the main symptoms and secondary symptoms (Table 1). The main symptoms are common characteristics of SZ, and the secondary symptoms are individualized performance indicators. We identified the common characteristics of SZ and screened potential biomarkers for its diagnosis. Therefore, this study focused on optimizing the common pathways based on the main symptoms and screening the key pathways that can represent the common characteristics of SZ. By ranking the enrichment scores of the 18 common pathways (Fig. 6A), we found that the top six pathways with enrichment scores were pathogenic Escherichia coli infection, rheumatoid arthritis, PPAR signaling pathway, bile secretion, GnRH signaling pathway, and fat digestion and absorption. Based on the association map (Fig. 5C), the corresponding relationship between the top six pathways and the main symptoms (Fig. 6B) was determined, and the results revealed that the top six pathways were associated with 100% (9/9) of the main symptoms.

Fig. 6figure 6

Screening of key pathways associated with SZ main symptoms. A Ranking map of enrichment scores of 18 common pathways got six key pathways. B Association map of the top six common pathways with main symptoms. C Differential metabolites and differential proteins associated with the top 6 common pathways

Next, the differential metabolites and differential proteins in the top six pathways were extracted (Fig. 6C). There were seven differential metabolites, which included LysoPA (20:3(5Z,8Z,11Z)/0:0), prostaglandin E2, leukotriene B4, lithocholate 3-O-glucuronide, 3-hydroxyquinine, lithocholic acid glycine conjugate, and PA(18:0/22:6(5Z,8E,10Z,13Z,15E,19Z)-2OH(7S, 17S)). Seven differential metabolites were associated with nine main symptoms through the top six pathways, and these metabolites were used to construct the diagnostic model. There were 23 differential proteins, including PRKACA, ATP1A1, FABP1, KRAS, CDC42, PRKCD, ROCK2, IGHV1-46, IGHV6-1, IGHV1-3, NCL, MYH11, ACTR2, RHOA, ARF6, TUBA4A, TUBB3, CYFIP1, NCKAP1, APOA2, ATP6V1G1, TNFSF13, and ACP5. The 23 differential proteins were associated with nine main symptoms through the top six pathways, and these proteins were used for key target screening.

PA(18:0/22:6(5Z,8E,10Z,13Z,15E,19Z)-2OH(7S, 17S), lithocholic acid glycine conjugate, LysoPA(20:3(5Z,8Z,11Z)/0:0), prostaglandin E2, leukotriene B4, 3-hydroxyquinine, and lithocholate 3-O-glucuronide are associated with the main symptoms and constitute a stable diagnostic model for SZ

Shi Zheng (SZ) is an aggregate of symptoms, and the biomarkers of a single indicator are not sufficient to represent the overall changes in SZ. Therefore, to study SZ, one or two diagnostic indicators were selected for each main symptom, and then, all diagnostic indicators were fitted to construct a joint diagnostic model, which could more accurately represent the overall changes in SZ. LysoPA (20:3(5Z,8Z,11Z)/0:0), prostaglandin E2, leukotriene B4, lithocholate 3-O-glucuronide, 3-hydroxyquinine, lithocholic acid glycine conjugate, and PA(18:0/22:6(5Z,8E,10Z,13Z,15E,19Z)-2OH(7S, 17S) were found to be associated with all the main symptoms through pathogenic Escherichia coli infection, rheumatoid arthritis, PPAR signaling pathway, bile secretion, GnRH signaling pathway, and fat digestion and absorption pathways. Thus, these seven indicators were selected to establish a combined diagnostic model. The ROC curve revealed that in the discovery cohort, the AUC of the combined diagnosis was 0.9, and the maximum AUC of a single indicator was 0.86 (Fig. 7A). In the validation cohort, the AUC for the combined diagnosis was 0.99, and the maximum AUC for a single indicator was 0.78 (Fig. 7B).

Fig. 7figure 7

ROC analysis of differential metabolites within the six key pathways. A LysoPA(20:3(5Z,8Z,11Z)/0:0), Prostaglandin E2, Leukotriene B4, Lithocholate 3-O-glucuronide, 3-Hydroxyquinine, Lithocholic acid glycine conjugate, PA(18:0/22:6(5Z,8E,10Z,13Z,15E,19Z)-2OH(7S, 17S) single indicator ROC (AUCMax = 0.86) and 7 indicators combined ROC (AUC = 0.90) in discovery cohort. B LysoPA(20:3(5Z,8Z,11Z)/0:0), Prostaglandin E2, Leukotriene B4, Lithocholate 3-O-glucuronide, 3-Hydroxyquinine, Lithocholic acid glycine conjugate, PA(18:0/22:6(5Z,8E,10Z,13Z,15E,19Z)-2OH(7S, 17S) single indicator ROC (AUCMax = 0.78) and 7 indicators combined ROC (AUC = 0.99) in validation cohort

RHOA, TNFSF13, PRKCD, APOA2, ATP1A1, and FABP1 are associated with the main symptoms and identified as potential therapeutic targets of SZ

The six key pathways (pathogenic Escherichia coli infection, rheumatoid arthritis, PPAR signaling pathway, bile secretion, GnRH signaling pathway, and fat digestion and absorption) contained 23 differentially expressed proteins. Among them, 14 were involved in the pathogenic Escherichia coli infection pathway (Fig. 8A), six in the rheumatoid arthritis pathway (Fig. 8B), two in the PPAR signaling pathway (Fig. 8D), two in the bile secretion pathway (Fig. 8E), four in the GnRH signaling pathway (Fig. 8C), and one in the fat digestion and absorption pathway (Fig. 8F). Differential proteins in each pathway were sorted according to the P-value from the smallest to the largest, and six key proteins were screened by combining the P-value and the position and biological significance of the proteins in the pathway (Supplementary Figs. 4–7).

Fig. 8figure 8

Screening of protein targets within six key pathways (n = 5). A Comparison of relative protein content within the Pathogenic Escherichia coli infection pathway. B Comparison of relative protein content within the Rheumatoid arthritis pathway. C Comparison of relative protein content within the GnRH signaling pathway. D Comparison of relative protein content within the PPAR signaling pathway. E Comparison of relative protein content within the Bile secretion pathway. F Comparison of relative protein content within the Fat digestion and absorption pathway. Data were analyzed by unpaired t test, and the significant difference is indicated by P < 0.05, *P < 0.05, **P < 0.01

When screening the target proteins according to the position in the pathway, the upstream proteins with a wider impact on the pathway should be selected. In addition, the change trend of protein content is consistent with the trend of pathway pathogenicity. For example, in the pathogenic Escherichia coli pathway, pathogenic bacteria can produce virulence factors that target RhOA protein and damage the activity of RhOA protein, thereby destroying the host epithelial/endothelial barrier, paralysing the migration and phagocytosis of immune cells, invading and spreading into epithelial cells, and inhibiting cell division [14, 15]. This suggests that RHOA plays an important role in the pathogenesis of Escherichia coli, and the development of drugs targeting RHOA protein for the treatment of Escherichia coli infection is of great potential. In addition, RHOA was decreased in the SZ group, which was in line with the trend of pathogenesis, so RHOA was selected as the target protein of pathogenic Escherichia coli infection pathway. The protein targets of the other 5 key pathways were screened by the same method, and the rheumatoid arthritis pathway was TNFSF13, the PPAR signaling pathway was APOA2, the bile secretion pathway was ATP1A1, the GnRH signaling pathway was PRKCD, the fat digestion and absorption pathway was FABP1. Six key targets were associated with symptoms through the corresponding pathways, which can be used as a reference for the precise treatment of SZ.

Comments (0)

No login
gif