A Neutrophil Extracellular Traps-Related Gene Trait Revealed the Prospective Therapy Strategy of Coronary Atherosclerosis

Introduction

As the population ages, the morbidity of coronary atherosclerosis (CA) is increasing annually and has become the leading cause of death worldwide.1–3 According to a report from the American Heart Association, approximately 10.9% of adults aged 45 years and older and 17.0% of adults aged 65 and older had CA, and approximately 800,000 Americans suffered a myocardial infarction (MI) each year.4 CA can cause coronary atherosclerotic heart disease, namely, coronary artery disease (CHD), by causing myocardial ischaemia and hypoxia. The main clinical manifestations of CHD contain angina pectoris, myocardial infarction, myocardial fibrosis and sudden coronary death, which is the main reason of death. The treatment of CHD mainly includes lifestyle changes, drug therapy, surgery, and drug treatment, which are the basis of coronary disease control.5–8 Drug therapy includes lipid-regulating drugs, antiplatelet drugs, anti-myocardial ischaemia drugs, drugs to improve cardiac remodelling and prognosis, as well as thrombolytic drugs and anticoagulants, all of which have their own side effects. Surgical treatment mainly involves coronary revascularization, including percutaneous coronary intervention (PCI) and coronary bypass grafting (CABG). Currently diagnostic tests for CHD include blood tests, electrocardiograms, exercise stress tests, coronary CT angiography and coronary angiography; however, there are no specific biomarkers for realize the early diagnosis and prognosis assessment of CHD. More studies are needed to identify specific biomarkers of CHD to enable the early diagnosis and prognosis assessment of CHD.

The inflammatory mechanism runs through the entire process of initiation, progression and complication formation in CA.9,10 When external stimuli, such as dyslipidemia, hypertension or proinflammation cytokines lead to disorders of the arterial endothelium, they can express more adhesion molecules to promote the adhesion of leukocytes and blood vessel walls. With the accumulation of chronic inflammation, endothelial cells become dysfunctional, allowing low-density lipoprotein and inflammatory cells to enter the endodermis, and smooth muscle cells (SMCs) migrate from the tunica media into the intima and promote the synthesis of extracellular matrix, finally leading to CA.10,11

Neutrophil extracellular trap (NETs) necrosis (NETosis) is an inflammatory cell death mode of neutrophils. Activated neutrophils trap and kill pathogens by releasing NETs, which consist of depolymerized chromatin and intracellular granular proteins. NET formation is accompanied by neutrophil death. This novel mode of death is different from apoptosis, cell necrosis and is called NETosis.12 A series of factors, including pathogens, lipopolysaccharides (LPS), immune complexes, antibodies, complement C3,13,14 cytokines14–16 and drugs such as phorbol myristate acetate (PMA) and calcium ionophores. These inducers can promote NETs key proteins neutrophil elastase and myeloperoxidase (MPO). Transcription and translation of MPO and peptidearginine deaminase 4 (PAD4), PAD4 into the nucleus, and nuclear histones are citrullinated by PAD4, resulting in changes in chromatin charge that promote chromatin decondensation. Subsequently, the nuclear and granular membranes dissolved, and a large number of antibacterial proteins in the cytoplasm attached to the depolymerized chromatin formed a network structure, which was finally released into the extracellular. Several studies have shown that NETs are involved in multiple cancers, such as lung, pancreatic, esophageal, gastric, breast and colorectal cancers. NETs may be involved in tumor progression through a number of pathways,17–21 including promoting tumor cell growth,22–24 migration, invasion,21,25,26 increasing angiogenesis27,28 and promoting intravascular adhesion.21 NETs promote tumor growth by increasing cell proliferation, inhibiting apoptosis and promoting activity of matrix metalloproteinase 9 (MMP9) and neutrophil elastase.22–24 Through the biological processes of MMP9 and extracellular matrix (ECM) interactions, cathepsin G mediated cellular aggregation, the disruption of E-cadherin, and NETs are involved in tumor migration and invasion.21,25,26 NETs could promote angiogenesis due to the VEGF liberation mediating by MMP9, neutrophil elastase and cathepsin G. Furthermore, NETs participated in the negative effects on distant organs of tumors, including increasing inflammation and damaging blood vessels.29,30 Some studies have shown NETs involved in autoimmune diseases.31–33 Some auto-antibodies in the body can activate neutrophils to release NETs, which carry the auto-antigens of systemic autoimmune diseases (such as myeloperoxidase, double-stranded DNA, etc)., and then produce auto-antibodies, that can promote the release of NETs. For example, systemic lupus erythematosus releases high mobility group protein 1 (HMGB1), and, by binding DNA and anti-DNA auto-antibodies through the receptor for advanced glycation end products (RAGE), neutrophils are stimulated to release NETs, resulting in kidney damage caused by anti-DNA auto-antibodies in lupus nephritis.34

The mechanism leading to NETosis is not fully understood, however, ROS production is required. NETs plays a vital role in the thrombogenesis.14,35 The formation of immune thrombus helps to clear the pathogen, however, excessive accumulation of thrombus and its interactions with endothelial cells could lead to the formation of atherosclerotic plaques.14 Various inflammatory factors are also risk factors for AS and cardiovascular disease. Therefore, NETs may play an important role in the AS pathogenesis. Bioinformatic analysis is a powerful tool for mining valuable information from existing genomic and proteomic data. This study aimed to elucidate the role of NETs in the pathogenesis mechanism of AS using bioinformatics analysis and to identify specific biomarkers of CHD to achieve early diagnosis and prognosis assessment of CHD.

In this study, NET-related differentially expressed genes (NETDEGs) associated with CA were identified in two GEO datasets. Gene prognostic traits were considered, based on which the disease samples were divided into high- and low-risk subgroups. The related biological characteristics, as well as the changes in immune cell infiltration, were analyzed, which provided the evidence that NETDEGs could be the potential biomarkers of CA progression and the therapeutic targets for this disease. A flow chart of the analysis is shown in Figure 1.

Figure 1 The flow chart of bioinformatic analyses.

Materials and MethodsData Download

CA datasets GSE20681,36,37 GSE4214838 were downloaded from GEO database39 (https://www.ncbi.nlm.nih.gov/geo/) by R package GEOquery.40 The samples of GSE20681 and GSE42148 were all from Homo sapiens and the tissue sources were whole blood tissue. The chip platform of dataset GSE20681 was GPL4133 and the chip platform of dataset GSE42148 was GPL13607. Among them, dataset GSE20681 contained 99 CA samples and 99 control samples and dataset GSE42148 contained 13 coronary CA samples and 11 control samples. All CA samples and control samples were included in this study and the detailed information is shown in Table 1.

Table 1 GEO Microarray Chip Information

GeneCards database41 (https://www.genecards.org/) is a collection of Neutrophil extracellular traps necrosis related genes (NETs-related Genes, NETRGs). The term “NETs” was as the search keyword and only “protein-coding” NETRGs were searched. A total of 78 NETRGs were obtained. In addition, the term “NETs” was as keywords to search in PubMed website (https://pubmed.ncbi.nlm.nih.gov/) and the NETs related gene sets were obtained from published literature.42 A total of 69 NETRGs were identify. After merging and removing duplications, a total of 128 NETRGs were obtained.

Datasets GSE20681 and GSE42148 were debatched using the R package sva43 and were normalized using the R package limma44 and the annotation probes were normalized. All the samples in dataset GSE20681 were included in this study as the test set and all the samples in dataset GSE42148 were used as the validation set for subsequent analysis.

Analysis of Differentially Expressed Genes

The samples of GSE20681 and GSE42148 were divided into coronary atherosclerosis (CA) group and control (Control) group, respectively. The R package limma was used to analyze the differences in genes between the CA group and the Control group. |logFC| > 0 and P-value < 0.05 were set as threshold of the differentially expressed genes (DEGs). Of them, DEGs with logFC > 0 and P-value < 0.05 were defined as up–regulated genes, DEGs with logFC < 0 and P-value < 0.05 were defined as down–regulated genes. The results of DEGs analysis were plotted using the R package ggplot2 to create volcano plot.

To obtain NETDEGs associated with CA, the DEGs, contained in the CA samples, and NETRGs were intersected and were mapped Venn using the R package pheatmap. The R package RCircos45 was used to draw the chromosomal localization map.

Verification of Differential Expression of NETDEGs

To further explores the expression differences of NETDEGs in the CA group and the Control group in the GSE20681 and GSE42148 datasets respectively, a group comparison map was drawn based on the expression levels of NETDEGs. Then, the R package pROC was used to draw the ROC Curve of NETDEGs and calculate the area under the curve (AUC) value to evaluate the diagnostic effect of NETDEGs expression level on CA. The AUC was generally between 0.5 and 1. The closer the AUC is to 1, the better the diagnostic performance. When AUC was between 0.5 and 0.7, the accuracy was low; when AUC was between 0.7 and 0.9, the accuracy was moderate; and when AUC was above 0.9, the accuracy was high.

Subsequently, in order to explore the correlation between NETDEGs, the Spearman algorithm was used to analyze the correlation between the expression levels of NETDEGs in GSE20681 and GSE42148 datasets. The results of correlation analysis were visualized using the R package pheatmap.46 The absolute value of correlation coefficient below 0.3 was weak or no correlation, 0.3–0.5 was weak correlation, 0.5–0.8 was moderate correlation, and above 0.8 was strong correlation.

Gene Ontology (GO) and Pathway (KEGG) Enrichment Analysis

Gene Ontology (GO) analysis47 is a common method for large-scale functional enrichment studies, including cell component (CC), biological process (BP) and molecular function (MF). Kyoto Encyclopedia of Genes and Genomes (KEGG)48 is a widely used database storing information on genomes, biological pathways, diseases and drugs. Gene ontology (GO) and pathway (KEGG) enrichment analysis of NETDEGs were performed using the R package clusterProfiler.46 Item screening criteria of P-value < 0.05 and FDR value (Q-value) < 0.25 were considered statistically significant. Finally, the R package Pathview49 was used to visualize the related pathway map of KEGG enrichment analysis results.

Gene Set Enrichment Analysis (GSEA)

Gene set enrichment analysis (GSEA)50 is used to evaluate the distribution trend of genes in a predefined gene set in a gene table ranked by their correlation with phenotype, and thus determine their contribution to phenotype. In this study, the dataset GSE20681 was first ranked according to logFC value, and then the R package clusterProfiler was used to perform GSEA on all genes in GSE20681 dataset. The parameters used in GSEA were as follows: the seed is 2020, the number of computations is 1000, the minimum number of genes contained in each gene set is 10, and the maximum number of genes contained in each gene set is 500. Gene set c2.cp.all.v2022.1.Hs.symbols.gmt [All Canonical Pathways](3050) was obtained from the Molecular signatures Database (MSigDB) Database51 to conduct GSEA. The screening criteria of GSEA were adj.p < 0.05 and FDR value (Q-value) < 0.25, and the P-value correction was Benjamini-Hochberg (BH).

The CA samples in dataset GSE20681 were divided into High-risk and Low-risk groups according to the median value of LASSO risk score. The R package limma was used for differential analysis and genes (| logFC | > 0 and P-value < 0.05) the samples of High- and Low-risk groups in CA group were selected. Finally, The GSEA was performed for all genes in CA group with the R-pack cluster Profiler using following the parameters: the seed is 2020, the minimum number of genes contained in each gene set is 10, and the maximum number of genes contained in each gene set is 500. Gene set c2.cp.all.v2022.1.Hs.symbols.gmt [All Canonical Pathways](3050) was obtained from Molecular Signatures Database (MSigDB) to conduct GSEA. The screening criteria of GSEA were adj.p < 0.05 and FDR value (Q-value) < 0.25, and the P-value correction was Benjamini-Hochberg (BH).

Gene Set Variation Analysis (GSVA)

Gene Set Variation Analysis (GSVA)52 is a nonparametric unsupervised analysis method that evaluates gene set enrichment results of microarray nuclear transcriptome by converting gene expression matrix between different samples into gene set expression matrix between samples to evaluate whether different pathways are enriched in different samples. The gene set h.all.v7.4.symbols.gmt was obtained from the Molecular signatures Database51 (MSigDB) and GSVA was performed on all genes in dataset GSE20681. The difference in functional enrichment between the CA group and the Control group was calculated and the screening criterion of GSVA was P-value < 0.05.

The gene set h.all.v7.4.symbols.gmt was obtained from the Molecular traits Database51 (MSigDB) and GSVA was performed on all genes in dataset GSE20681. The difference of functional enrichment between the High-risk group and Low-risk group was calculated and the screening criterion of GSVA was P-value < 0.05.

Construction of Coronary Atherosclerosis Diagnostic Model

To obtains the coronary atherosclerosis diagnostic model of dataset GSE20681, the Logistic regression analysis was performed on NETDEGs. When the dependent variable was a binary variable, namely CA groups and Control groups, the Logistic regression analysis was used to analyze the association between independent and dependent variables. The P-value < 0.05 was used as the criterion to screen NETDEGs and the Logistic regression model was constructed. The group expression of NETDEGs included in the Logistic regression model was displayed by Forest Plot.

Then, Support Vector Machine (SVM)53 algorithm was used to construct the SVM model based on NETDEGs included in the Logistic regression model. NETDEGs were screened based on the number of genes with the highest accuracy and the lowest error rate.

Finally, The R package glmnet54 was used to perform Least Absolute Shrinkage and Selection Operator (LASSO) based on the NETDEGs included in the SVM model with parameters of set.seed (500) and family= “binomial”. LASSO regression analysis reduces the overfitting of the model by adding a penalty term (lambda × absolute value of slope) and improves the generalization ability of the model based on the linear regression analysis. The results of LASSO regression analysis were visualized using diagnostic model and variable trajectory diagrams. The results of LASSO regression analysis were the diagnostic model of CA, and the NETDEGs included in the model were the Key Genes. Finally, the LASSO Risk Score was calculated based on the risk coefficient of LASSO regression analysis, and the risk score was calculated using the following formula:

Validation of the Diagnostic Model for Coronary Atherosclerosis

A Nomogram55 is a graph that uses a cluster of disjoint line segments to represent the functional relationship between multiple independent variables in the rectangular coordinate system of the plane. The R package rms is used to draw a Nomogram based on the results of Logistic regression analysis to show the mutual relationship of Key Genes. Calibration Curve was drawn by Calibration Analysis to evaluate the accuracy and discrimination of CA diagnostic model based on LASSO regression analysis. The R package ggDCA56 was used to draw the decision curve analysis (DCA) map based on the Key Genes in the dataset GSE20681. DCA is a simple method to evaluate clinical prediction models, diagnostic tests and molecular markers. Next, in order to evaluate the diagnostic effect of expression level of LASSO risk score (risk score) on CA, the R package pROC was used to plot the ROC curves and calculate the AUC values in the GSE20681 and GSE42148 datasets.

In addition, the CA group was divided into High-risk group and Low-risk group according to the median expression value of risk score (risk score) of the CA diagnostic model. In order to further explore the expression differences of Key Genes between High-risk group and Low-risk group of CA group, group comparison map was drawn based on the expression levels of Key Genes.

Immune Infiltration Analysis of Key Genes Based on ssGSEA Algorithm

Single-Sample Gene-Set Enrichment Analysis (ssGSEA)57 could quantify the relative abundance of each immune cell infiltrate. First, each infiltrating immune cell type was labeled, such as Activated CD8+ T cell, Activated dendritic cell, Gamma-delta T cell, Natural killer cell, Regulatory T cells (Treg) and other human immune cell subtypes. Secondly, the enrichment scores calculated by ssGSEA analysis were used to represent the relative abundance of each immune cell infiltration in each sample to obtain the immune cell infiltration matrix. Then, the immune cells with significant differences in the two groups were screened for subsequent analysis. The correlation between immune cells was calculated based on Spearman algorithm, and the R package pheatmap was used to draw the correlation heatmap to show the correlation analysis results of immune cells themselves. The correlation between Key Genes and immune cells was calculated based on Spearman algorithm, and the results with P-value < 0.05 were retained. The R package ggplot2 was used to draw correlation bubble plot to show the correlation analysis results between Key Genes and immune cells.

Immune Infiltration Analysis of High and Low Risk Groups (CIBERSORT)

CIBERSORT58 deconvolutes the transcriptome expression matrix based on linear support vector regression to estimate the composition and abundance of immune cells in a mixture of cells. The CIBERSORT algorithm was used to combine the immune cell signature gene matrix and filter out the data with immune cell enrichment score greater than zero, and the specific results of immune cell infiltration matrix in CA samples of dataset GSE20681 were finally obtained. Finally, the R package ggplot2 was used to draw group comparison maps to show the difference in the expression of immune cells between the High-Risk and the Low-Risk groups in the CA samples of dataset GSE20681. Subsequently, the immune cells with significant differences in the two groups were screened for subsequent analysis, and the correlation between immune cells was calculated based on Spearman’s algorithm. The correlation between Key Genes and immune cells was calculated based on Spearman algorithm, and the results with P-value < 0.05 were retained. The R package ggplot2 was used to draw correlation bubble plot to show the correlation analysis results between Key Genes and immune cells.

mRNA-TF, mRNA-miRNA, mRNA-RBP, mRNA-Drug Interaction Network

CHIPBase database (version 3.0)59 (https://rna.sysu.edu.cn/chipbase/) identified thousands of combined base sequence matrix and its binding site from the DNA binding protein ChIP - seq data and predicted the transcriptional regulatory relationships between millions of Transcription Factors (TFs) and genes. The CHIPBase database (version 3.0) were searched for TFS that bind to Key Genes and the sums of Number of samples found (upstream) and Number of samples found (downstream), greater than 6, was used as the screening criteria to screen mRNA-TF interaction pairs. Cytoscape was used to visualize the mRNA-TF Interaction Network.

ENCORI database59 is StarBase version 3.0. The interaction between miRNA-ncRNA, miRNA-mRNA, ncRNA-RNA, RNA-RNA, RBP-ncRNA and RBP-mRNA in ENCORI database is based on CLIP-seq and decompose sequencing (for plants) data mining. ENCORI database was used to predict miRNAs that interact with Key Genes. The pancancerNum > 3 was used as the screening criterion to screen the mRNA-miRNA interaction pairs, and Cytoscape software was used to visualize the mRNA-miRNA Interaction Network.

RNA-binding Protein (RBP)60 plays a key role in the process of gene regulation, such as RNA synthesis, alternative splicing, modification, transport and translation. StarBase v3.0 database61 (https://starbase.sysu.edu.cn/) was used to forecast the Key Genes target RBP, and clusterNum > 2 was used as the selection criterion for mRNA-RBP interaction. Cytoscape software was used to visualize the mRNA-RBP Interaction Network.

The Comparative Toxicogenomics Database43 (https://ctdbase.org/) was used to predict the direct and indirect drug targets of Key Genes and explore the interaction between Key Genes and drugs. The mRNA-Drug Interaction pairs were screened with the screening criterion of “Reference Count” > 2, and Cytoscape software was used to visualize the mRNA-Drug Interaction Network. 1.12 Prediction of protein domain.

Proteins are essential for life and knowing their structures could facilitate to understand their function. The AlphaFoldDB62 database (https://alphafold.com) contains about 350,000 protein structures in humans as well as 20 model organisms commonly used in biological research, such as Escherichia coli, fruit fly, zebrafish and mouse. AlphaFoldDB was used to predict and visualize the protein structures of Key Genes. AlphaFoldDB generates a Predicted Local Distance Difference Test (pLDDT) between 0 and 100 for each residue. When pLDDT < 50, the predicted structure had low confidence; when 50 < pLDDT < 70, the predicted structure had moderate confidence; when 70 < pLDDT < 90, the predicted structure had high confidence; when pLDDT > 90, the predicted structure has higher confidence.

Total RNA Extraction and Quantitative Real Time PCR (qRT-PCR)

A total of 10 peripheral blood samples were obtained from CA patients, and 10 peripheral blood samples were donated from the normal people as the control in Cardiothoracic Surgery Department of Qingdao Traditional Chinese Medicine Affiliated of Qingdao University, who were informed of the use of the samples. All samples were approved by the Ethics Committee of Qingdao Traditional Chinese Medicine Hospital and informed consent was obtained from the patient (2022HC05LS001). Total RNA was extracted from the peripheral blood lymphocytes using the Direct-zol kit (Zymo Research) according to manufacturer’s instructions. The cDNA library was constructed with High-Capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific) according to manufacturer’s instructions, and the qRT-PCR was done using SsoAdvanced™ Universal SYBR Green Supermix real-time PCR kit (Bio-Rad). The specific primers used in this assay were listed as follows: MMP9-forward: GGACAAGCTCTTCGGCTTCT; MMP9-reverse: TCGCTGGTACAGGTCGAGTA; ERN1-forward: AAAACTACGCCTCCCCTGTG; ERN1-reverse: GTCAGATAGCGCAGGGTCTC; G0S2-forward: CCAAGGAGATGATGGCCCAG; G0S2-reverse: GCTGCACACAGTCTCCATCA.

Western Blotting

Leukocytes were obtained from peripheral blood lymphocytes of CA patients, which were then lysed by RIPA containing 1 mm PMSF on ice. Total protein was harvested from supernatant after the cell lysate was centrifuged at 15,000 rpm for 20 min at 4 °C. Protein concentrations were measured using the bicinchoninic acid (BCA) assay (23,225, Pierce). Equal amounts of protein (20 μg/lane) were loaded onto 4–15% precast protein gels (Bio-Rad) and transferred to polyvinylidene difluoride (PVDF, Bio-Rad) membranes. Non-specific binding was blocked with 5% milk in tris-buffered saline with Tween-20 (TBST) for 1 h at room temperature. The membranes were then incubated with primary antibodies including rabbit anti-MMP9 (30592-1-AP, Proteintech) at 1:1000, rabbit anti-ERN1 (28164-1-AP, Proteintech) at 1:1000, and rabbit anti-G0S2 (12091-1-AP, Proteintech) at 1:1000 overnight at 4 °C. Once excess primary antibody was washed off by TBST, membranes were incubated with horseradish peroxidase (HRP)-linked anti-rabbit secondary antibody (SA00001-2, Proteintech) at 1:1000 for 1 h at room temperature. Rabbit anti-β-actin antibody (20536-1-AP, Proteintech) at 1:1000 was used as a reference (Bu et al, 2018).

Statistical Analysis

All data processing and analysis in this article were based on R software (Version 4.2.2). If not otherwise specified, statistical significance of normally distributed variables was estimated by independent Student’s T-Test for comparisons of continuous variables between two groups. Mann–Whitney U-Test (Wilcoxon Rank Sum Test) was used to analyze the differences between variables that were not normally distributed. Kruskal–Wallis test was used for comparison of three or more groups. Spearman correlation analysis was used to calculate the correlation coefficients between different variables. All statistical P-values were two-sided if not specified, and a P-value of less than 0.05 was considered to indicate statistical significance.

ResultsIdentification of NETDEGs in CA

To identify the DEGs between CA and norma individuals, GEO database was searched, and GSE20681 and GSE42148 datasets were processed (Table 1). First, the R package sva was used to remove batch effect on datasets GSE20681 and GSE42148 (Supplementary Figure 1). The R package limma was used for differential analysis of GSE20681 and GSE42148 datasets to obtain DEGs between the two groups, and the results were as follows: GSE20681 altogether 886 meet | logFC | > 0 and P-value < 0.05 threshold of DEGs; Under the threshold, 543 genes were identified to be raised expressed genes (logFC > 0 and P-value < 0.05), and 343 genes were identified to be lower expressed genes (logFC < 0 and P-value < 0.05) (Figure 2A and Supplementary Table 1). For GSE42148 datasets, a total of 3725 meet | logFC | > 0 and P-value < 0.05 threshold of DEGs; Under the threshold, 2024 gene were found to be raised expressed genes (logFC > 0 and P-value < 0.05), and 1701 genes were found to be lower expressed genes (logFC < 0 and P-value < 0.05) (Figure 2B and Supplementary Table 2).

Figure 2 The differential genes expression analysis between samples with CA and normal samples in GEO database. (A and B) Volcano map of differential expression genes between the two groups in datasets GSE20681 (A) and GSE42148 (B). NETDEGs were labeled in the figure. (C) Venn diagram of genes and NETRGs in all CA samples in the data sets GSE20681 and GSE42148. (D and E) Heat maps of NETDEGs in datasets GSE20681 (D) and GSE42148 (E). (F) Chromosomal mapping of NETDEGs.

Abbreviations: CA, coronary atherosclerosis; DEGs, differentially expressed genes; NETRGs, NETs-related genes; NETDEGs, NETs-related differentially expressed genes.

To obtain NETDEGs, Venn drawing was performed between all DEGs for the CA and NETRGs samples, and six genes (MMP9, G0S2, MGAM, PLAUR, VNN3, and ERN1) were identified as NETDEGs (Figure 2C). Based on the intersection results, the expression differences of the six NETDEGs between different sample groups in the GSE20681 and GSE42148 datasets were analyzed, and the R package pheatmap was used to draw a heatmap to display the analysis results (Figure 2D and E). Finally, the locations of the six NETDEGs on the human chromosome were analyzed using the R package RCircos, and the chromosome localization map was drawn. The chromosome mapping showed that the six NETDEGs were located on chromosomes 1, 6, 7, 17, 19, and 20, respectively, which were: G0S2 was located on chromosome 1, VNN3 on chromosome 6, MGAM on chromosome 7, ERN1 on chromosome 17, PLAUR on chromosome 19, and MMP9 on chromosome 20 (Figure 2F).

The Prediction Function of the Six NETDEGs Analysis

To explores the expression differences of the six NETDEGs in GSE20681 dataset, the group comparison figure was used to show the difference analysis results of the expression levels of 6 NETDEGs in CA samples and Control samples in GSE20681 dataset. The results showed that five NETDEGs (ERN1, G0S2, PLAUR, MGAM, and MMP9) were highly significantly expressed in CA samples compared to the controls (Figure 3A). A heatmap was drawn to analyze the correlation of the six NETDEGs in GSE20681 dataset, and the results showed that most genes had a significant correlation, among which, MGAM and MMP9 had the strongest positive correlation (R-value = 0.71, P-value < 0.05) (Figure 3B). The R package pROC was used to analyze the ROC curve based on the expression levels of the six NETDEGs in dataset GSE20681, and the results showed that the expression levels of ERN1, G0S2, VNN3, PLAUR, MGAM, and MMP9 in the six NETDEGs showed high accuracy (0.5 < AUC < 0.7) in the classification of CA samples and controls (Figure 3C–E).

Figure 3 The prediction function of the six NETDEGs analysis. (A and B) The comparison diagram of NETDEGs in CA samples and Control samples in dataset GSE20681. (B) Correlation heat map between NETDEGs in dataset GSE20681. (C–E) ROC curves of ERN1 and G0S2 (C) VNN3 and PLAUR (D) MGAM and MMP9 (E) in NETDEGs in dataset GSE20681. (F) The comparison diagram of NETDEGs in CA samples and Control samples in dataset GSE42148. (G) Correlation heat map between NETDEGs in dataset GSE42148. (H–J) ROC curves of ERN1 and G0S2 (H) VNN3 and PLAUR (I) MGAM and MMP9 (J) in NETDEGs in dataset GSE42148. ns represented P-value ≥ 0.05, which had no statistical significance. * means P-value < 0.05, which is statistically significant. ** means P-value < 0.01, which is highly statistically significant. When AUC > 0.5, it indicates that the expression of molecules is the trend to promote the occurrence of events, and the closer the AUC is to 1, the better the diagnostic effect is. AUC has low accuracy at 0.5–0.7, certain accuracy at 0.7–0.9, and high accuracy at above 0.9.

Abbreviations: CA, coronary atherosclerosis; ROC, receiver operating characteristic; AUC, area under the curve; TPR, true positive rate; FPR, false positive rate; Cor, correlation.

For the six NETDEGs in dataset GSE42148, the differential results showed that the expression of ERN1 in the CA samples and the controls in dataset GSE42148 was highly statistically significant (P-value < 0.01) (Figure 3F). Correlation analysis showed that some of the six NETDEGs had a significant correlation, among which G0S2 and PLAUR had the strongest positive correlation (R-value = 0.83, P-value < 0.05) (Figure 3G). The ROC curve showed that the expression levels of six NETDEGs showed certain accuracy (0.7 < AUC < 0.9) in the classification of CA samples and controls (Figure 3H–J).

Biological Characteristics of the CA Samples

The biological characteristics of the six NETDEGs were analyzed using GO and pathway (KEGG) enrichment. The results showed that the six NETDEGs were mainly enriched in the BPs for cell apoptotic regulation, which included the following terms: intrinsic apoptotic signaling pathway, regulation of apoptotic signaling pathway, regulation of cysteine-type endopeptidase activity involved in apoptotic signaling pathway, positive regulation of release of cytochrome c from mitochondria and positive regulation of epidermal growth factor receptor signaling pathway (Figure 4A, Table 2 and Supplementary Table 3). For CCs, the six NETDEGs were significantly enriched in the terms of tertiary granules, ficolin-1-rich granule, secretory granules membranes, serine-type peptidase and endoribonuclease complexes (Figure 4A, Table 2 and Supplementary Table 3). In MFs, six NETDEGs were enriched in the terms of glucosidase activity, platelet-derived growth factor receptor binding, Hsp90 protein binding, and Hsp70 protein binding (Figure 4A, Table 2 and Supplementary Table 3). The KEGG signaling pathway enrichment results showed that the six NETDEGs were involved in the Proteoglycans in cancer and Lipid and atherosclerosis biological pathways (Figure 4A, Table 2, Supplementary Table 3 and Supplementary Figure 2). Meanwhile, the network maps of BPs, CCs, MFs and KEGG enriched terms were drawn according to GO and KEGG enrichment analysis. The results showed that MMP9 and ERN1 were significantly correlated to the bioprocess of cell apoptosis and the pathway of Lipid and atherosclerosis (Figure 4B–E).

Table 2 Results of GO and KEGG Enrichment Analysis for the NETRDEGs

Figure 4 GO and KEGG enrichment analysis of the six NETDEGs analysis. (A) Enrichment analysis of GO and pathway (KEGG) of NETDEGs. Bubble map showing BP, CC, MF, and biological pathways (KEGG). The horizontal coordinates are GO terms and KEGG terms. (B–E) The results of GO and pathway (KEGG) enrichment analysis of NETDEGs are shown in the network diagram. Pink nodes represent entries, blue nodes represent molecules, and lines represent relationships between entries and molecules. In the bubble diagram, the size of the bubble represents the number of genes, and the color of the bubble represents the size of the P-value. The redder the color, the smaller the P-value, and the bluer the color, the larger the P-value.

Abbreviations: NETDEGs, NETs-related differentially expressed genes; GO, gene ontology; KEGG, Kyoto encyclopedia of genes and genomes; BP, biological process; CC, cellular component; MF, molecular function.

To further explore the biological characteristics of CA, GSEA was used to investigate the relationship between the expression levels of all genes in dataset GSE20681 and the biological processes, cellular components and molecular functions. The results showed that all genes in dataset GSE20681 were significantly enriched in inflammatory factor-related signaling pathways (Figure 5A, Table 3 and Supplementary Table 4), such as Neutrophil Degranulation (Figure 5B), Interleukin 10 Signaling (Figure 5C), interleukin 6 7 Pathway (Figure 5D), interleukin 6 7 pathway (Figure 5E).

Table 3 Results of GSEA Analysis of CA Samples in GSE20681

Figure 5 GSEA and GSVA analysis of GSE20681. (A) GSEA of data set GSE20681 shows four biological function mountain maps. (B–E) GSEA showed that all genes were significantly enriched in Neutrophil Degranulation (B), Interleukin 10 Signaling (C), Il67 Pathway (D), Interleukin 4 and Interleukin 13 (E). (F) Group comparison maps of the results of GSVA between CA and Control groups in dataset GSE20681. The screening criteria for gene set enrichment analysis (GSEA) were adj.P-value < 0.05 and FDR value (Q-value) < 0.25, and the P-value correction method was Benjamini-Hochberg (BH). The screening criteria for GSVA was P-value < 0.05. * means P-value < 0.05; ns represented P-value ≥ 0.05, which had no statistical significance.

Abbreviations: GSEA, gene set enrichment analysis; CA, coronary atherosclerosis; GSVA, gene set variation analysis.

Moreover, GSVA was further performed on all genes in dataset GSE20681, the logFC ranked Top 10 positive enrichment and negative enrichment pathways with P-value < 0.05 were screened. The results showed that the four pathways glycolysis, complement, and hypoxia were statistically and significantly enriched in CA samples, while myc targets v2 were enriched in the controls (Figure 5F, Table 4, Supplementary Table 5 and Supplementary Figure 3A).

Table 4 Results of GSVA Analysis of CA Samples in GSE20681

Identification of NETDEGs Traits Connected to CA

To determine the diagnostic value of the NETDEGs, logistic regression was performed, and a logistic regression model was constructed based on the expression of the six genes in GSE20681 data set. The results showed that the six NETDEGs were statistically significant in the Logistic regression model (P-value < 0.05). In addition, the SVM model was constructed based on the expression of the six NETDEGs and SVM algorithm, the number of genes with the lowest error rate (Figure 6A) and the highest accuracy (Figure 6B) analysis result showed that the SVM model had the highest accuracy when the number of genes was 4 (MMP9, ERN1, MGAM and G0S2)., The four genes included in the SVM model were selected, and the LASSO Operator regression analysis was used to construct the diagnostic model of CA. The results showed that three of the four NETDEGs, including MMP9, ERN1, and G0S2, were identified as the prognostic traits (Figure 6C and D, Supplementary Table 6).

Figure 6 Construction of diagnostic model of CA. (A and B) The number of genes with the lowest error rate (A) and the number of genes with the highest accuracy (B) obtained by SVM algorithm are visualized. (C and D) Diagnostic model diagram (C) and variable locus diagram (D) of LASSO regression model. (E) Nomogram of Key Genes in dataset GSE20681 in a diagnostic model of CA. (F) The CA diagnostic model is based on the DCA map of Key Genes in the dataset GSE20681. (G and H) risk score ROC curve in data sets GSE20681 and GSE42148. (I) Grouping comparison diagram of Key Genes in the High- and Low-risk groups of the CA group. *** represents P-value < 0.001, which is highly statistically significant. When AUC > 0.5, it indicates that the expression of molecules is the trend to promote the occurrence of events, and the closer the AUC is to 1, the better the diagnostic effect is. AUC has low accuracy at 0.5–0.7, and AUC has certain accuracy at 0.7–0.9. Light red represents the High-risk group and light blue represents the Low-risk group.

Abbreviations: SVM, support vector machine; LASSO, least absolute shrinkage and selection operator; CA, coronary atherosclerosis; DCA, decision curve analysis; ROC, receiver operating characteristic; AUC, area under the curve; TPR, true positive rate; FPR, false positive rate.

A Nomogram based on the expression of the three prognostic traits was used to further verify the value of CA diagnostic model. The results showed that the expression level of ERN1 had significantly higher utility in the diagnostic model of CA than other variables (Figure 6E). The utility of G0S2 expression in the diagnostic model was significantly lower than that of other variables (Figure 6E). Decision curve analysis (DCA) was used to evaluate the clinical utility of the diagnostic model in dataset GSE20681, and the results showed that the line of the model was stably higher than that all positive and all negative in a certain range, and the net benefit of the model was greater, and the effect of the model was better (Figure 6F). In addition, the R package pROC was used to draw the ROC curve based on the risk score in the GSE20681, and the results showed that the expression level of risk score in dataset GSE20681 showed a low accuracy among different groups (0.5 < AUC < 0.7). As the validation set, we chose GSE42148 and used the three genes to construct the CA diagnostic model, and the results showed that GSE42148 data set showed a certain accuracy with different groups (0.7 < AUC < 0.9) (Figure 6G and H). The CA samples in GSE20681 dataset were then divided into High-risk and Low-risk groups according to the median expression value of risk score of the CA diagnostic model. The differential results showed that the expression levels of the three prognostic traits between the two subgroups were extremely statistically significant (P-value < 0.001) (Figure 6I).

The Biological Pathway Alteration Linked to the Risk Diagnostic Model

To determine the difference of the biological pathways between the two risk groups in dataset GSE20681, GSEA was used to investigate the association between the expression of all 19,749 genes and the biological processes, cellular components, and molecular functions involved in two different risk groups. The results showed that that all genes in the High-risk group were significantly enriched in Neutrophil Degranulation, Il-17 Signaling Pathway, Erythropoietin and Phosphoinositide 3 Kinase Pi3K, Il5 Signaling Pathway and other inflammatory response-related signaling pathways (Figure 7A–E, Table 5 and Supplementary Table 7).

Table 5 Results of GSEA Analysis Between Low- and High-Risk Subgroups in GSE20681

Figure 7 GSEA and GSVA enrichment analysis based on high or low risk scores. (A) GSEA of data set GSE20681 shows four biological function mountain maps. (B–E) GSEA showed that all genes were significantly enriched in Neutrophil Degranulation (B) Il-17 Signaling Pathway (C) Erythropoietin Activates Phosphoinositide 3 Kinase Pi3K (D) Il5 Signaling Pathway (E). (F) The group comparison maps of the results of GSVA between CA and Control groups in dataset GSE20681. ns represented P-value ≥ 0.05, which had no statistical significance. * means P-value < 0.05, which is statistically significant; ** means P-value < 0.01, which is highly statistically significant; *** represents P-value < 0.001, which is highly statistically significant. The heat map shows low concentrations in blue and high concentrations in red. The screening criteria for GSVA was P-value < 0.05. The screening criteria for gene set enrichment analysis (GSEA) were adj.P-value < 0.05 and FDR value (Q-value) < 0.25, and the P-value correction method was Benjamini-Hochberg (BH).

Abbreviations: GSEA, gene set enrichment analysis; CA, coronary atherosclerosis; GSVA, gene set variation analysis.

To explore the difference of h.all.v7.4.symbols.gmt gene set between High-risk and Low-risk groups, GSVA was performed on all genes in dataset GSE20681. Positive enrichment pathways with P-value < 0.05 and logFC ranking Top 10 and negative enrichment pathways with Top 10 were screened (Table 6, Supplementary Figure 3B and Supplementary Table 8). Subsequently, the difference was verified based on the Mann–Whitney U-test, and then the group comparison diagram showed that 15 pathways, including hypoxia, coagulation, inflammatory response, TNFa signaling via NFkb, complement, il6 jak stat3 signaling, E2F targets, androgen response, protein secretion, reactive oxygen species pathway, epithelial mesenchymal transition, MYC targets v2, bile acid metabolism, unfolded protein response, and KRAS signaling up, were statistically significant between the two risk groups (P-value < 0.05) (Figure 7F).

Table 6 Results of GSVA Analysis Between Low- and High-Risk Subgroups in GSE20681

Immune Infiltration Comparison Between the Two Risk-Group

The immune infiltration of CA samples in the two risk-groups was firstly analyzed using ssGSEA algorithm, and the expression matrix of the samples in dataset GSE20681 was used to calculate the immune infiltration abundance of 28 immune cells. The group comparison plot showed that 13 of the 28 immune cells had statistically and significant differences (P-value < 0.05) (Figure 8A and Supplementary Table 9). Among them, the infiltration proportion of Activated B cell, Activated CD4+ T cell, Activated CD8+ T cell, Effector memory CD8+ T cell, Immature B cell, Type 1 T helper cell, CD56dim natural killer cell were higher in the samples of Low-risk group than them in the samples of High-risk group, while Eosinophil, Immature dendritic cell, Macrophage, Monocyte, Neutrophil, Plasmacytoid dendritic cell were more infiltrated in the samples of the High-risk group (Figure 8A). The correlation results showed that the three prognostic traits were strongly correlated with immune cells infiltration (Figure 8B and C). MMP9 had the strongest positive correlation with Neutrophil both risk groups (R-value = 0.568, P-value < 0.05) (Figure 8B and C). G0S2 also had a strong and positive correlation with the infiltration of Neutrophil in the Low-risk group (Figure 8B).

Figure 8 Immune infiltration analysis by ssGSEA algorithm and CIBERSORT of Key Genes. (A) The grouping comparison diagram of immune cells in the Low Risk and High Risk groups of dataset GSE20681. (B and C) Correlation bubble maps of the abundance of immune cell infiltration and Key Genes in the Low-risk (B) and High-risk (C) groups of CA. (D) The proportion of immune cells in a CA sample. (E) Grouping of immune cells in a CA sample. (F an G) Bubble map of correlation between the abundance of immune cell infiltration and Key Genes in a CA sample. ns represented P-value ≥ 0.05, which had no statistical significance. * means P-value < 0.05, which is statistically significant. ** means P-value < 0.01, which is highly statistically significant. *** represents P-value < 0.001, which is highly statistically significant. The absolute value of the correlation coefficient (R-value) is a weak correlation between 0.3 and 0.5. Light red is the High-risk group and light blue is the Low-risk group. Red is a positive correlation, blue is a negative correlation, and the depth of the color indicates the strength of the correlation.

Abbreviations: CA, coronary atherosclerosis; ssGSEA, single-sample gene-set enrichment analysis.

CIBERSORT algorithm was further used to analyze the difference in the immune infiltration abundance of 22 immune cells in the two risk groups. The results showed that 20 types of immune cells were enriched in CA samples, respectively, including Naive B cells, Plasma cells, CD8+ T cells, Naive CD4+ T cells, Resting Memory CD4+ T cells, Activated memory CD4+ T cells, Helper follicular T cells, Tregs, Gamma delta T cells, Resting NK cells, Activated NK cells, Monocytes, M0 Macrophages, M1 Macrophages, M2 Macrophages, Activated dendritic cells, Resting mast cells, Activated mast cells, Eosinophils, Neutrophils (Figure 8D and Supplementary Table 10). Among them, Resting mast cells, Neutrophils, and activated NK cells were highly infiltrated in high risk samples compared with the low risk samples, and Naive B cells, CD8+ T cells, Activated memory CD4+ T cells, Resting NK cells, Monocytes were highly filtrated in the Low-risk group (Figure 8E). The results of bubble correlation plots showed that among the CA samples, the strongest positive correlation was found between MMP9 and Neutrophils (R-value = 0.456, P-value < 0.05) (Figure 8F and G). In the High Risk group, ERN1 was showed to be strongest and positively correlated with Activated NK cells (R-value = 0.462, P-value < 0.05) (Figure 8G and Supplementary Table 11). G0S2 in whole blood samples was also positively correlated with systemic neutrophil activity, which may reflect the infiltrated proportion of neutrophils in CA samples in the Low-risk group (Figure 8F and Supplementary Table 12), which was consistent with the results of ssGSEA analysis.

The Potential Regulatory Mechanism of the Three Prognostic Traits

Subsequently, the potential regulatory mechanisms which regulate the expression of these three prognostic traits were analyzed. The TFs that bind to MMP9, ERN1, and G0S2 were obtained from ChIPBase database, and the mRNA-TF Interaction Network was constructed and visualized using Cytoscape software (Figure 9A and Supplementary Table 13), which contained two key genes (MMP9, ERN1) and 33 TFs. The miRNAs related to MMP9, ERN1, and G0S2 were obtained from ENCORI database, and the mRNA-miRNA Interaction Network was constructed and visualized using Cytoscape software (Figure 9B and Supplementary Table 14), which contained two Key Genes (ERN1 and G0S2) and 33 miRNAs. Moreover, the RNA-binding proteins (RBP) related to MMP9, ERN1, and G0S2 were predicted by StarBase database, and the mRNA-RBP Interaction Network was constructed and visualized by Cytoscape software (Figure 9C and Supplementary Table 15), which contained two Key Genes (MMP9, ERN1) and 41 RBPs.

Figure 9 Analysis of mRNA-miRNA, mRNA-TF, mRNA-RBP, mRNA-Drug interaction network of Key Genes. (A) The mRNA-TF Interaction Network of key genes. (B) mRNA-miRNA Interaction Network of the key genes. (C) mRNA-RBP Interaction Network of the key genes. (D) The mRNA-Drug Interaction Network of the key genes.

Abbreviations: TF, transcription factor; RBP, RNA-binding protein.

The Potential Regulatory Mechanism of the Three Prognostic Traits

According to the previou

Comments (0)

No login
gif