Implicating type 2 diabetes effector genes in relevant metabolic cellular models using promoter-focused Capture-C

Epigenetic landscape of cell line models of metabolism

To gain insights into key aspects of the epigenetic landscape of type 2 diabetes-relevant cell models, we analysed high-resolution promoter-focused Capture-C and ATAC-seq data in triplicate for the EndoC-BH1 and HepG2 cell lines, along with SGBS cells both at the pre-adipocyte stage and the in vitro differentiated adipocyte-like state (SGBS_undiff, SGBS_diff). We also leveraged the gene expression in each cell line using bulk RNA-seq [14].

For analysing the promoter-centric chromatin architecture, we identified Capture-C-defined contacts with gene promoters both at the level of individual restriction fragments (1frag) and by in silico binning four consecutive fragments (4frag) in order to increase the power to call distant interactions, as previously performed [14, 20, 24], yielding 53,689 1frag and 92,152 4frag promoter contacts for EndoC-BH1, respectively, and ranging from 80,449 to 229,880 1frag and 155,569 to 293,618 4frag promoter contacts for the other cell lines (Fig. 1a, ESM Table 1).

Fig. 1figure 1

Construction of promoter interaction maps for EndoC-BH1, HepG2 and SGBS cells. (a) Summary of number of promoter contacts called using promoter-focused Capture-C per cell type. Fragment indicates the resolution at which the loops were called: 1frag, single fragment resolution; 4frag, binning of four consecutive fragments. (b) Summary of the number of OCRs called per cell type and whether the OCR intersects a PIR (PIR-OCR) or does not (nonPIR-OCR). (c) Cumulative distribution of distances between ends of the promoter contacts called by promoter-focused Capture-C for each cell type. (d) Pairwise Jaccard index of loops between promoter and OCR. Jaccard index represents the ratio of intersect to union of genomic regions in each cell type annotation

We leveraged ATAC-seq to identify OCRs, which potentially act as cREs to influence gene expression. We called OCRs as the set of reproducible peaks in at least two of three replicates. We identified 68,596 OCRs in EndoC-BH1 cells, 73,296 in HepG2, 74,296 in SGBS_undiff and 52,609 in SGBS_diff.

We then annotated OCRs to genes if they either overlapped with a promoter (−1500/+500 bp) or intersected the other end called by promoter-focused Capture-C, which we termed promoter interacting regions (PIRs). We found that 28% of OCRs contacted at least one gene promoter (Fig. 1b, ESM Table 2), which is in line with previous reports [14, 20]. We next compared the distribution of distances between the putative enhancer and promoter-connected region fragments and found that they were comparable (mean: EndoC-BH1, 133,320.9 bp; HepG2, 109,703.7 bp; SGBS_undiff, 155,686.6 bp; SGBS_diff, 207,733.6 bp) (Fig. 1c).

Next, we determined the degree of similarity across promoter–OCR connections between cell types by comparing the magnitude of overlap between the PIR-OCR-connected regions using the Jaccard index, i.e. the ratio of nucleotides located in both annotations to the union present in either set of annotations. We found that the four cell types displayed less than 30% overlap (Fig. 1d), suggesting that the promoter landscapes have distinct cell type-specific epigenetic and chromatin conformational features that contribute to transcriptional differences across these cell lines. Also consistent with previous reports, we observed bait-to-bait contacts ranging from 10% to 29% of called loops (ESM Table 3).

Comparison of promoter-focused defined maps with existing enhancer atlases

To further validate our OCR–gene contacts, we compared the expression of genes with a promoter contacting at least one OCR vs that of genes not in contact with any OCR. Consistent with our prior promoter-focused Capture-C datasets [16, 17, 19], genes with a promoter in contact with OCRs were generally expressed at significantly higher levels (unpaired Mann–Whitney U test, two-sided p<2.2 × 10−16) than those without OCR contacts across cell types (Fig. 2a). Similarly, we assessed the degree of similarity to GTEx expression profiles of equivalent tissue types and found that the genes contacted in our cell lines were expressed at higher levels in the corresponding tissue in that public domain dataset [25] (Fig. 2b). These results suggest that promoter contacts with OCRs are associated with increased gene expression.

Fig. 2figure 2

Validation of promoter-connected cREs. (a) The distribution of gene expression from matching cell line data (log2 TPM+1) between genes with promoters with at least one PIR-OCR contact (red) and those genes without PIR-OCRs (green). (b) The distribution of gene expression from GTEx tissue (log2 TPM+1) from the cells between genes with promoters with at least one PIR-OCR contact (red) and those genes without PIR-OCRs (green). Pancreas expression with EndoC-BH1 contacts; visceral/subcutaneous adipose tissue expression with SGBS_diff and SGBS_undiff contacts; and liver expression with HepG2 contacts. In boxplots, the central line represents the median, box edges represent 25th–75th percentiles, whiskers represent 1.5 times the IQR and outliers are depicted as points. (c) log2-fold enrichment of PIR-OCRs to ChromHMM-defined annotations for the epigenome roadmap for the indicated cell type/tissue type pairs. (d) Genomic location of a verified pancreas enhancer in beta cells from the VISTA regulatory element browser with chromatin contacts to ABCC8. (e) lacZ staining from the VISTA browser showing the expression pattern for the contact between ABCC8 and hs1977 detailed in (d). (f) Genomic location of a liver-expressed regulatory element from VISTA with chromatin contacts to PCYOX1L in HepG2 cells (g) lacZ staining from the VISTA browser showing the expression pattern for the contact between PCYOX1L and hs1752

We then compared the chromatin states identified by the epigenome roadmap with the open connected regions defined by ATAC-seq and promoter-focused Capture-C. As expected, we found at least fourfold enrichment for active and bivalent TSSs and enhancers, given that OCRs can represent either active or ‘poised’ regulatory elements (Fig. 2c). The shared TSS/promoter signature represents the number of bait-to-bait interactions present, suggesting either that some genes are co-regulated or that promoters of genes not expressed in a particular cell type can act as enhancers, as previously reported by others [26].

We subsequently examined the VISTA database for curated validated enhancers in different embryological tissues [27]. However, there were only six experimentally validated pancreatic enhancers within the database. Despite this limitation, we observed in our data that a known human enhancer, hs1977, was in contact with the ABCC8 promoter (Fig. 2d, e). ABCC8 encodes a component of an ATP-sensitive potassium channel expressed in beta cells and modulates glucose-dependent insulin secretion [28]. This enhancer region was accessible only in EndoC-BH1 cells, while expression was observed in the pancreas in the VISTA database, although several of the in situs display staining in neural tissues in addition to the pancreas. We also identified a liver enhancer, hs1752, expressed in the liver, heart and other abdominal tissues with contacts to PCYOX1L, which encodes prenylcysteine oxidase 1 like (Fig. 2f, g). Prenylcysteine oxidases are enzymes that scavenge free cysteines from a metabolic pathway involved in the degradation of prenylated proteins [29]. This cRE was also connected to the promoter of several non-coding RNAs: miR-143, miR-145 and AC131025.2. Taken together, these results support the use of chromatin features in these cell lines as a valid model to investigate cREs for type 2 diabetes-related traits.

Variant-to-gene mapping in metabolic-relevant cells and prioritisation of type 2 diabetes-associated GWAS signals

Next, we sought to investigate the degree that cellular models are enriched for heritability associated with metabolic traits. To this end, we performed partitioned LD score regression [30] to test for enrichment of putative cREs (promoter OCRs + PIR-OCRs) (Fig. 3, ESM Table 4). We observed significant enrichment of EndoC-BH1 cREs for type 2 diabetes, fasting glucose, BMI and fetal body weight. HepG2 cREs were enriched for coronary artery disease and fasting glucose levels. SGBS_diff cREs were enriched for WHR, fasting insulin, plasma triglyceride and HDL-cholesterol levels, while SGBS_undiff cREs were enriched for WHR. cREs across all cell types were enriched for height. In addition, we included recent GWAS studies from Alzheimer’s disease and systemic lupus erythematosus as negative controls, and as expected we did not observe enrichment for these neural and immune GWAS traits in these principally metabolic cell types [30,31,32]. Therefore, these results supported the utility of these cell models in investigating the cis-regulatory architecture for the type 2 diabetes-related traits.

Fig. 3figure 3

Enrichment of metabolic traits in EndoC-BH1 and relevant cells. A summary of partitioned LD score regression showing the z score of enrichment for indicated traits. Colour indicates the z-transformed p values, with traits significantly enriched marked with asterisks

To implicate type 2 diabetes causal variants impacting cREs, we curated the set of lead sentinel SNPs from the two most recent European and trans-ancestral GWAS reports for the disease [33, 34]. We identified proxies in high LD with the reported lead variants for each signal (r2>0.8) and intersected their genomic coordinates with OCRs and PIRs (Table 1). The subsequent 810 implicated genes from this variant-to-gene (V2G) mapping strategy, corresponding to 370 type 2 diabetes sentinels, revealed that the majority of putative causal variants did not contact the gene nearest to the sentinel, and in most cases where the nearest gene was implicated other gene promoters were also additionally contacted (Fig. 4a, ESM Table 5). Of these 810 genes implicated by this V2G mapping approach, 79.9% were specific to one cell line (130 EndoC-BH1, 311 HepG2, 123 SGBS_diff, 83 SGBS_undiff) while 163 (20.1%) were shared in at least two cell types and 53 (6.5%) across all cell types (Fig. 4b). We compared the list of genes with phenotypes identified in mice, and found significant enrichment for nervous system, haematopoietic system, growth/size body region and mortality/ageing (ESM Fig. 2, ESM Table 6).

Table 1 Summary table of the input sentinels and proxies used in this study and how many in at least one cell typeFig. 4figure 4

V2G mapping of type 2 diabetes loci across cell types. (a) The counts of sentinels implicated to the nearest gene to the sentinel, to multiple genes including the nearest gene or to gene(s) not including the gene closest to sentinel for each cell type. (b) Overlap of genes implicated in type 2 diabetes in each cell type; the top bar graph indicates the number of genes in each intersection set (intersection size). Red indicates the subset of genes found in one cell type, blue indicates two cell types, black indicates three cell types and green indicates the genes implicated in all four cell types. The side bar graph indicates the total number of genes per cell type (set size). (c) TFs predicted to be impacted by proxy SNPs. The x-axis indicates whether the mean effect of type 2 diabetes SNPs is predicted to be increasing or decreasing stability and the y-axis indicates the expression of the TF. Colour is scaled to indicate the mean expression of the predicted target genes and size indicates the number of proxies predicted to disrupt a given TF motif. (d) Network showing predicted proxies (blue) connected by promoter-focused Capture-C and ATAC-seq to target genes (coloured by expression: blue, low; yellow, higher expression). (e) Intersection of our list of implicated genes in EndoC-BH1s with several known databases. T2D_Mahajan_2022_finemapped_nominations [36]; Open_Targets_T2D [46]; T2D_effector_index [37]; GWAS_catalogue [47]; Miguel-Escalada_et_al_Islet_Promoter_Capture_HiC [3]; Thomsen_2016_EndoC-BH1_screen_hits [8]. Ref-Alt, reference allele minus alternative allele; T2D, type 2 diabetes

One obvious mechanism by which non-coding variants can influence gene expression is through disruption of TF binding sites. To investigate this possibility, we predicted whether any of the V2G-implicated open proxies that contacted genes contained known TF binding motifs. This approach led to the identification of 288 proxies predicted to alter binding affinity (Fig. 4c, ESM Table 7). Most notably, we observed that several binding sites for zinc finger TF KLF and GLI families were disrupted by variants identified in EndoC-BH1. Seventeen of the TFs with motifs impacted by type 2 diabetes variants are predicted effector transcripts in at least one cell type (HES2, ZNF384, STAT6, NR2C1, ONECUT1, TCF12, SOX15, TP53, HNF1B, TCF4, TCF3, OSR1, PPARG, NFKB1, CREB3, ZBTB6, ZBTB26). We compared the motifs with publicly available chromatin immunoprecipitation (ChIP) data from ENCODE, to determine if there is evidence of TF binding at the genomic location in various cellular contexts (ESM Fig. 3, ESM Table 8). We also checked the expression of implicated TFs and observed 280 of 447 (62.6%) of the predicted TFs with disrupted motifs had transcripts per million mapped reads (TPM) >1 and 187 had TPM >10 in EndoC-BH1 cells, suggesting that a majority of predicted TFs are expressed (ESM Fig. 4).

We then examined the genes contacting proxies regardless of the status of predicted disruption of TF binding sites. While some signals were only predicted to have one proxy interacting with one gene, others yielded multiple proxies with contacts to multiple genes (Fig. 4d). While this could suggest a subset of variants acting through multiple genes, further work is necessary to functionally validate these predictions. We note that we did not observe a trend towards those multi-gene signals corresponding to more statistically significant type 2 diabetes GWAS loci.

Next, we compared our analysis with previous type 2 diabetes functional prioritisation approaches, yielding moderate agreement but highlighting a potentially higher confidence gene set for further investigation [3, 8, 35,36,37] (Fig. 4e, ESM Table 9). Additionally, we compared our findings with a recent CRISPR interference (CRISPRi) screen identifying potential type 2 diabetes effectors, noting eight overlapping genes, seven associated with decreased insulin content and one with increased content (decreased: CHD4, PRPF18, GMEB1, CREB3, PITPNM2, SIN3A and ATP6 V1C1; increased: FADS1). However, this overlap did not reach strict statistical significance (Fisher’s test, one-sided, p=0.09). Moreover, we compared our list with one generated using chromatin conformation in pancreatic cells, finding 100 genes annotated as type 2 diabetes effectors in beta cell Hi-C also present in our dataset (ESM Table 10) [9].

Subsequently, we compared our results with eQTL associations of nearby genes [5, 25]. Among 221 eQTL-linked genes associated with type 2 diabetes risk loci in pancreatic/islet tissues, only 12 had open proxies in chromatin contact with their promoters in EndoC-BH1 cells: SMCO4, DOC2A, OPRL1, TUFM, YBEY, MTMR11, UBE2D3, PLEKHA1, GPSM1, RNF6, ABCD9 and AGFG2. Six genes were implicated by both liver eQTLs and our promoter-focused Capture-C in HepG2 cells, while 25 genes were linked to adipose tissue eQTLs and our Capture-C in at least one adipose model (SGBS_undiff, SGBS_diff) (ESM Table 11). Colocalisation analyses between GTEx v7 and DIAMANTE type 2 diabetes GWAS signals [33] revealed 22 genes colocalised (posterior inclusion probability (PIP) >0.85) in the pancreas, with only DOC2A also implicated by our Capture-C approach. We detected 11 genes with eQTL colocalisation in the liver, of which MAN2C1, AP3S2 and CEP68 were also implicated by promoter contacts in HepG2. Additionally, 54 genes were identified in subcutaneous or visceral adipose tissue, with AP3S2, CALR and NDUFAF6 also implicated by our V2G approach in SGBS_diff cells, while PABPC4, PLEKHA1, AP3S2 and DCAF16 were implicated in the SGBS_undiff dataset (ESM Table 12). Although there is limited overlap, the intersection of these methods can better prioritise genes for functional validation.

Inhibition of SMCO4 expression increases glucose-stimulated insulin secretion in EndoC-BH1

In our analysis pipeline, two genes, SMCO4 and FXR2, were among those identified as potential type 2 diabetes candidate genes (Fig. 5a, b) in both EndoC-BH1 and prior primary beta cell chromatin maps [9]. Additionally, an eQTL associated with SMCO4 colocalises with a type 2 diabetes signal marked by rs57235767 in islets [5]. To investigate the impact of these two genes on insulin secretion in EndoC-BH1 cells (Fig. 5c), we performed targeted knockdown experiments of SMCO4 and FXR2. Confirmation of SMCO4 and FXR2 knockdown was achieved using RT-qPCR (Fig. 5d). Our analyses revealed a significant increase in insulin secretion in SMCO4 knockdown cells relative to non-targeting siRNA controls (~18.0% increase, paired two-tailed t test p=0.0243) (Fig. 5e). Notably, there was no significant difference when IBMX was added to the stimulation medium (ESM Fig. 5). IBMX raises cellular levels of cAMP to stimulate secretion of insulin. These results suggest that that SMCO4, but not FXR2, inhibits glucose-stimulated insulin secretion in EndoC-BH1 cells, while having no significant impact on IBMX-stimulated secretion.

Fig. 5figure 5

Assaying effects of SMCO4 and FXR2 knockdown in Endo-BH1 cells on glucose-stimulated insulin secretion. (a, b) Genomic locations of SMCO4 (a) and FXR2 (b) relative to implicated proxy variants. The top tracks represent Hi-C contact matrices from EndoC-BH1 Hi-C data, while the subsequent tracks depict significant contacts for Capture-C (Chicago score ≥5) and ATAC-seq (normalised using the reads-per-genomic-content approach). The locations of implicated SNPs are drawn as vertical lines. (c) Diagram summarising the time course of insulin treatment. (d) RT-qPCR results of the knockdown of either FXR2 or SMCO4. The data were adjusted using the ΔΔCt method and all conditions were subsequently normalised to basal non-targeting control. Two datapoints are depicted and the top of the bar graph depicts the mean expression and error bars depict the SD. (e) Boxplots depict the results of the insulin secretion assay, the central line represents the median, box edges represent 25th–75th percentiles, whiskers represent 1.5 times the IQR and outliers are depicted as points. Insulin content and secretion were calculated from ELISA plates with standard curves (see Methods). Measurements were taken for either basal (0 mmol/l glucose) or 20 mmol/l glucose. The mean of six technical replicates was taken for four biological replicates. Paired two-tailed t tests were used to assess statistical significance across the four biological replicates. *p<0.05. Stim, stimulation

Comments (0)

No login
gif