Effects of brain microRNAs in cognitive trajectory and Alzheimer’s disease

ROS/MAP cohort

The Religious Orders Study (ROS) and Rush Memory and Aging Project (MAP), together referred to as ROS/MAP, are longitudinal clinical-pathologic community-based cohort studies focused on cognitive decline, dementia, and aging [3]. ROS recruits priests, monks, and nuns from across the United States, while MAP recruits lay people from retirement communities, social service agencies, and church groups in the greater Chicago area. Both studies enroll individuals without known dementia at baseline. ROS/MAP participants receive annual cognitive and clinical evaluations, and all participants are organ donors, provide informed consent, and sign an Anatomical Gift Act and a repository consent to allow their data and biospecimens to be repurposed. An Institutional Review Board of Rush University Medical Center approved the studies.

Phenotype dataClinical diagnoses

The final clinical diagnosis was made based on the recommendations of the National Institute of Neurological and Communicative Disorders and Stroke and the AD and Related Disorders Association by a neurologist using select clinical data but blinded to postmortem data [26]. It was treated as a binary outcome of cognitive impairment (MCI or Alzheimer’s dementia) versus no cognitive impairment (NCI), as reported in [2, 4, 5].

Cognitive trajectory

Cognitive trajectory is a person-specific rate of change of cognitive performance over time estimated for each of five cognitive domains and for global cognition [29]. Cognitive function in each domain (working memory, semantic memory, episodic memory, perceptual orientation, and perceptual speed) was measured at each timepoint using between 2 and 7 cognitive tests. Then, a single composite score at each timepoint was defined by converting the raw scores from the individual tests to Z-scores (using the mean and standard deviation of the cohort), and then averaging the Z-scores. Global cognition was summarized at each timepoint as the average of the Z scores from the full set of tests (19 total from the five domains). Cognitive trajectory for each domain and for global cognition was estimated using a linear mixed effects model with the longitudinal cognitive measure as the outcome. The models controlled for age at baseline, sex, and years of education. A positive trajectory value indicates improvement of cognitive performance over time, while a negative trajectory value indicates decline over time. The cognitive trajectories were treated as continuous outcomes.

Pathologies

Ten age-related pathologies were considered as outcomes and/or covariates. Beta-amyloid, neurofibrillary tangles, arteriolosclerosis, cerebral atherosclerosis, and cerebral amyloid angiopathy were treated as continuous variables, and Lewy body disease, TDP-43 pathology, gross infarcts, microinfarcts, and hippocampal sclerosis were treated as binary variables (absent v. present). Below, we briefly summarize each pathology measure.

The area occupied by amyloid beta protein was measured by immunohistochemistry and quantified by image analysis. The value is the percent area occupied by amyloid beta averaged over eight brain regions [39]. Neurofibrillary tangle pathology was measured as paired helical filament (PHF) tau density using immunohistochemistry and cortical density (per mm2) and was determined using systematic sampling. The value is the mean density averaged over eight brain regions [39]. For beta-amyloid and neurofibrillary tangles, the square root of the values was used to improve the normality of the variable distributions.

Arteriolosclerosis was graded by evaluation of the small vessels of the anterior basal ganglia for signs of deterioration or of arteriolar thickening resulting in narrowing of the vascular lumen [9]. Cerebral atherosclerosis was assessed by visual inspection of the vessels of the Circle of Willis, and severity was scored based on the number of affected arteries and the extent of involvement of each artery [1]. Cerebral amyloid angiopathy was described by assessing amyloid deposition using immunostaining for beta-amyloid in paraffin-embedded sections [8]. Arteriolosclerosis, cerebral atherosclerosis, and cerebral amyloid angiopathy were each summarized using a semi-quantitative score corresponding to four levels (none, mild, moderate, and severe).

Lewy body pathology was diagnosed based on algorithmic analysis and neuropathologist’s opinion of distribution of alpha-synuclein measured using immunohistochemistry. Lewy body disease was classified into four categories: not present, nigral-predominant, limbic-type, and neocortical-type [32]. TDP-43 pathology was measured using immunohistochemistry in six brain regions. TDP-43 pathology was coded as four stages: none, stage 1 (amygdala only), stage 2 (limbic [TDP-43 in hippocampus]), and stage 3 (neocortical) [28]. Presence of gross infarcts and microinfarcts was determined by pathologic assessment, blinded to clinical data, and reviewed by a board-certified neuropathologist [33, 34]. Gross infarcts included chronic, acute, and subacute infarcts and were detected by visual inspection of fixed slabs using the naked eye followed by histological confirmation via dissection. Presence of typical hippocampal sclerosis was assessed by severe neuronal loss and gliosis in CA1 and/or subiculum; evaluation was performed unilaterally in a coronal section of the mid-hippocampus at the level of the lateral geniculate body [27].

Sex

Female or male sex was based on self-report.

Race

Self-identified race was defined based on the participant’s response to the question ‘What is your race?’ with possible answers White; Black or African American; American Indian or Alaska Native; Native Hawaiian or Other Pacific Islander; Asian; or Other.

MiRNA data

miRNA profiling has been described in detail previously [38]. Frozen post-mortem dorsolateral prefrontal cortex (dlPFC) samples were obtained from Rush University and 672 samples were selected for library preparation using New England Biolabs’ (NEB) NEBNext Multiplex Small RNA Library Prep Kit (NEB, Ipswich, MA, USA) and sequencing on an Illumina HiSeq 3000 (Illumina, San Diego, CA, USA) at the Emory Yerkes Genomics Core (Atlanta, GA). Sequencing depth ranged from 13.2 to 37.4 million reads per sample, with a median of 27.4 million reads per sample. Adaptor sequences were trimmed using Trimmomatic [6] (version 0.36), and miRNA counts were generated using mirDeep2 [13]. Reads were mapped to known human precursor and mature miRNA sequences obtained from MiRBase [20] (Release 22.1) and then filtered to only the 504 precursors and 857 mature miRNAs that have been included in the curated miRNA gene database MirGeneDB 2.1 [15] to minimize falsely detected miRNAs. Reads were allowed to map up to two nucleotides upstream of the mature sequence and up to five nucleotides downstream of the mature sequence, and with up to one mismatch. We removed reads for miRNAs with low abundance (< 1 RPM for > 50% of samples) and miRNA-precursor pairings absent from MirGeneDB 2.1. For miRNAs mapped to multiple precursors, we kept the entry with the highest total count across samples.

Sample outliers were identified using both raw counts and normalized counts. First, we iteratively removed samples that were greater than five standard deviations (SD) from the mean within the respective batch for total read count, trimmed read rate, and mapped read rate. We then filtered 11 samples from participants who did not self-identify their race as White. The analysis was limited to these participants because of small sample size for participants from other race groups. Next, the raw counts were normalized for library size and transformed to log2 counts using EdgeR. We then estimated the top 10 principal components (PCs) using the normalized data and removed 6 samples that were greater than 4 SD from the mean of either of the first two PCs. Finally, we used KING [24] with genotype data available from these samples to estimate and remove second degree and closer relatives. This final QC step filtered one sample. After quality control, data from 648 participants for 528 curated miRNAs present in the MirGeneDB database remained for analysis. Mapped read count for these samples ranged from 7.7 to 27.3 million reads per sample, with a median of 18.7 million reads per sample.

RT-PCR validationmiRNA and sample selection

The validation experiments were focused on a subset of miRNAs and samples to maximize the value of the experiments with the available resources. From the putative causal miRNAs identified by SMR, we selected the three miRNAs associated with beta-amyloid or neurofibrillary tangles before adjusting for pathologies since these are outcomes of high interest. The selected miRNAs were miR-31-5p, miR-214-3p, and miR-3622a-3p. For selecting samples, we reasoned that we could increase the power of the analysis if, instead of randomly drawing individuals from the discovery sample, we used a case–control sampling scheme and selected samples in the extremes of the distributions for beta-amyloid and neurofibrillary tangles. Specifically, we ranked samples by their beta-amyloid and neurofibrillary tangles scores and got an averaged rank per sample, and then, contingent on tissue availability, selected samples with the lowest and highest ranks. We selected 45 cases (high beta-amyloid and high neurofibrillary tangles) and 45 controls (low beta-amyloid and low neurofibrillary tangles).

RNA extraction

RNA was extracted from dlPFC tissue samples using the Maxwell RSC simplyRNA Tissue Kit (Promega, catalog number AS1340). In detail, frozen brain tissue samples were dissected to 30 mg each. Each tissue sample was expeditiously homogenized in 200 uL of chilled 1-thioglycerol/homogenization solution using an ultrasonic homogenizer for 30–60 s, until complete homogenization indicated by no visible tissue fragments. If foaming occurred, sample was set on ice for up to 5 min for foam to settle. Homogenization solution was added to bring homogenate to a final volume of 200 μl, prior to addition to the cartridge.

Cartridges from the Maxwell RSC simplyRNA kit were loaded in in the deck trays of a Maxwell RSC 48 instrument, with well #1 facing away from the elution tubes. All sealing tape and any residual adhesive were then removed from the loaded cartridges. A sterile plunger was placed in well #8 of each cartridge, followed by the placement of an empty 0.5 mL elution tube into the elution tube position for each cartridge in the deck trays. 50 μl of nuclease-free water was pipetted to the bottom of each elution tube.

The sample lysates were loaded into well #1 followed by the addition of DNase I solution into well #4 of each cartridge. The Maxprep software was used to run the ‘simplyRNA Tissue’ method preloaded in the Maxwell RSC 48 instrument. Method-specific variables, such as the sample ID and elution volume, were entered into the system when prompted. Sample eluates were stored at  – 80 °C in the Maxwell elution tubes for downstream use.

Reverse-transcriptase PCR (RT-PCR)

RT-PCR was performed at the Emory Integrated Genomics Core. The 90 samples were randomized onto three plates by beta-amyloid, neurofibrillary tangles, sex, age, and RIN. All samples had RIN of at least 3.3, and median RIN was 6.7. Reverse transcription was performed using the TaqMan MicroRNA Reverse Transcription Kit (ThermoFisher, catalogue number 4366596) and qPCR reactions were performed using TaqMan MicroRNA Assays (ThermoFisher assay IDs 2279, 461768_mat, 464955_mat). Reactions were performed in triplicate for the three miRNAs of interest and U6 snRNA as a reference gene.

Data filtering

We first filtered outlier samples based on visual inspection of CT plots. One sample with high CT for the reference gene was dropped from all analyses, and one sample with high CT for miR-214-3p was dropped from analyses for that miRNA only. We next filtered samples with failure of more than one out of the three triplicate reactions. This filter removed two samples from the analyses for miR-3622a-3p. Finally, we removed samples with CT standard deviation (within triplicate) greater than 0.25. This filtered removed 8 samples for miR-214-3p, 15 samples for miR-31-5p, and 50 samples for miR-3622a-3p. The large standard deviation values for miR-3622a-3p are consistent with its generally lower abundance.

Statistical analysis

All statistical analyses were done using R version 3.6.0.

Estimation of miRNA surrogate variables

miRNA surrogate variables (SVs) were estimated using the normalized log2 CPM abundance data with R package SVA. The miRNA SVs capture unmeasured sources of variation in the expression data, including cell type heterogeneity across samples, and are used as covariates in regression modeling to account for these potentially confounding factors. SVs were estimated for each model separately. The estimation produced 14 or 15 SVs to be included as covariates, depending on the model.

miRNA differential expression analysis

Differential expression was tested with voom-limma. Each of the nine outcomes was tested using two models, resulting in 18 models in total (Table 2). The first model for each outcome included batch, PMI, RIN, age at death, sex, education and miRNA surrogate variables as covariates. We refer to these models, which control only for technical and demographic covariates, as the minimally adjusted models. The second model for each outcome additionally included co-existing age-related pathologies as covariates. We refer to these as the fully adjusted models. Specifically, for the 6 cognitive trajectory models, ten pathologies (beta-amyloid, neurofibrillary tangles, arteriolosclerosis, cerebral atherosclerosis, cerebral amyloid angiopathy, Lewy body pathology, TDP-43, gross infarcts, microinfarcts, hippocampal sclerosis) were included as covariates in addition to the covariates included in the minimally adjusted models. For the beta-amyloid and neurofibrillary tangles models, the other nine pathologies were included as additional covariates. Finally, for the clinical diagnosis model, beta-amyloid and neurofibrillary tangles were included as additional covariates. The fully adjusted models were tested to identify miRNAs that are associated with each outcome independently of age-related pathologies.

False discovery rate (FDR) correction to control for multiple testing was performed using the Benjamini–Hochberg procedure and was applied to results of each model separately. In other words, FDR correction was based on the number of miRNAs tested. Differentially expressed miRNAs were defined as those with FDR < 0.05.

Pathway analysis

Pathway analysis was performed for each set of models (minimally adjusted and fully adjusted) in parallel using the predicted targets of the DE miRNAs for each trait. Predicted gene targets for each DE miRNA were extracted from TargetScan 8.0 [25] and low confidence predictions (cumulative weighted context++ score ≥ -0.2) were removed. A more negative TargetScan cumulative weighted context++ score indicates a stronger gene suppression effect between the miRNA and target gene. For each trait, we generated an initial list of genes by taking the union of the predicted target genes of the DE miRNAs. For the six cognitive trajectory traits, a single initial list was generated by taking the union of target genes across all six outcomes. This resulted in three lists of target genes for beta-amyloid, neurofibrillary tangles, and cognitive trajectories, respectively. The number of genes per list varied widely because the number of target genes is highly correlated with the number of DE miRNAs, which also differed across traits. Since the number of test genes substantially impacts pathway analysis results, we used a filtering procedure to select a comparable number of top target genes for each trait. First, we ranked the genes in each of the three lists by the number of DE miRNAs (for the relevant trait) that targeted them. We refer to the number of targeting DE miRNAs per gene as Nt. We then counted the number of genes per list with Nt ≥ 1, Nt ≥ 2, and so on. Using these counts, we determined trait-specific cut-off values for Nt that would produce input lists per trait containing roughly the same number of genes. Pathway analysis was performed for each input list using DAVID [36] with the Reactome and WikiPathways databases in addition to the default databases. The background gene set was defined as the 15,582 genes present in a previously-created brain transcriptomic dataset generated from 637 post-mortem brains, which represented the brain-expressed genes that could be potentially targeted by brain miRNAs. Significance was defined as FDR < 0.05 using Benjamini–Hochberg adjustment.

Sex-biased differential expression analysis

Sex-specific differential expression analyses were conducted by splitting the dataset into male-only and female-only datasets and then applying the 18 models described previously except with the sex term removed from each model. The purpose of conducting the differential expression analyses in the sex-specific datasets was to improve sensitivity for miRNAs that are differentially expressed in only one sex but not the other, or that have opposite direction of effect in the two sexes. The significant associations from these sex-specific analyses were combined with the significant associations from the analysis using the ‘joint’ (males and females combined) dataset to make up the set of associations to test for sex-biased differential expression.

For each association of interest, we tested for sex-biased differential expression by adding a sex-by-trait interaction term to the relevant model and fitting this model in the joint dataset. Instead of Benjamini–Hochberg correction for multiple testing, we applied the stricter Bonferroni correction since there were fewer tests. The correction was applied to results of each model separately. Significant sex-biased differential expression was defined as sex interaction term Bonferroni p-value < 0.05.

Mendelian randomization (MR)

The summary data-based Mendelian randomization (SMR) and HEIDI tests [42] were used to integrate miRNA-QTL (miR-QTL) statistics with the SNP-trait association statistics to identify DE miRNAs whose association with the outcome trait is consistent with causality or pleiotropy. Only DE miRNAs with significant miR-QTLs within 500 Kb of the miRNA precursor were considered. The miR-QTL association statistics were estimated previously [38]. That analysis used the same miRNA data we used here for the miRNA differential analysis but was limited to a subset of 604 individuals based on the availability of both miRNA and genotype data. The association between SNPs and AD and its endophenotypes was estimated using PLINK2 using both minimally adjusted and fully adjusted models analogous to the miRNA differential expression analysis. The technical and demographic covariates for these models were age at death, sex, study, education, batch, PMI, RIN, and 10 genetic PCs. We defined miRNAs associated with the outcome trait through causality, pleiotropy, or linkage as those with SMR p-value < 0.05. Associations due to linkage were distinguished as those with HEIDI p-value < 0.05.

RT-qPCR Validation

For each sample for each miRNA of interest, the normalized expression value ΔCT was calculated as the difference in CT between the miRNA of interest and the reference gene. The association of case/control status (high beta-amyloid and high neurofibrillary tangles/low beta-amyloid and low neurofibrillary tangles) with miRNA expression was tested for each miRNA independently using linear regression with ΔCT as the outcome, case/control status as the main predictor, and sex, age, education, PMI, RIN, and qPCR plate as covariates. Significant association was defined as p-value < 0.05.

Comments (0)

No login
gif