This study employed a comprehensive approach by harnessing summary statistics from multiple public Genome-Wide Association Studies (GWAS) to execute a bidirectional Mendelian Randomization (MR) analysis, thereby investigating the causal connections between 731 distinct immune cell subtypes and lung cancer development. To ensure the robustness of the causal inferences, rigorous quality control measures were implemented, including heterogeneity assessments and tests for gene pleiotropy.
Public Databases and Lung Cancer Genome Research Consortium (LCGRC) Datasets:
To conduct our Mendelian Randomization (MR) analysis, we utilized data from publicly available databases and the Lung Cancer Genome Research Consortium (LCGRC) datasets. These resources provided comprehensive genetic and clinical data essential for our study.
Public databases:
UK Biobank (UKB):
URL: https://www.ukbiobank.ac.uk
Reference: Sudlow C, Gallacher J, Allen N, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015 Mar 31;12(3):e1001779.
Genetic Epidemiology Research on Adult Health and Aging (GERA):
URL: https://www.kaiserpermanente.org/research/gera-cohort-study
Reference: Grossman et al. (2016). "The Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort: Study Design and Characteristics." BMC Medical Genomics, 9(1), 47.
Lung Cancer Genome Research Consortium (LCGRC) Datasets:
URL: https://www.lungcancerconsortium.org
Reference: Russo J, Giri VN. Germline testing and genetic counselling in prostate cancer. Nat Rev Urol. 2022 Jun;19(6):331–343. Inclusion and Exclusion Criteria:
UK Biobank (UKB):
Inclusion criteria:
Participants aged 40–69 years at recruitment.
Availability of complete genetic and health data.
Exclusion criteria:
Individuals with incomplete or missing genetic data.
Individuals with known severe comorbidities unrelated to lung cancer (e.g., autoimmune disorders).
Lung Cancer Genome Research Consortium (LCGRC):
Inclusion criteria:
Participants diagnosed with lung cancer.
Availability of complete genetic and clinical data.
Exclusion criteria:
Individuals with incomplete or missing genetic data.
Individuals with a history of other primary cancers.
Selection of Genetic Loci as Instrumental Variables:
To select genetic loci as instrumental variables (IVs), we applied the following preset threshold points:
Significance Threshold:
GWAS P-value: ≤ 5 × 10–8.
F-statistic: ≥ 10 to ensure strong instruments.
Linkage Disequilibrium (LD) Clumping:
LD Clumping Distance: ≤ 100 kb.
LD Clumping r2 Threshold: ≥ 0.8 to avoid highly correlated SNPs.
Functional relevance:
SNPs must be functionally relevant to the immune cell phenotypes of interest.
By using these stringent criteria, we ensured that the selected genetic loci were robust and appropriate for use as instrumental variables in our MR analysis.
2.2 Genome-wide association study (GWAS) data sources for lung cancerIn this research, the genetic underpinnings of lung cancer (LC) are explored utilizing comprehensive genome-wide association study (GWAS) data. The genetic summary statistics are derived from a meticulously curated dataset, which focuses on the European population. This dataset encompasses a well-characterized cohort of 3,791 lung cancer cases and a substantial control group of 489,012 healthy individuals.
A rigorous quality control (QC) protocol was implemented during the genotyping and imputation processes to ensure data integrity. This entailed stringent filtering criteria to eliminate potential errors, such as sample contamination, genotyping inconsistencies, and population stratification. Following these QC measures, a total of 24,188,684 single nucleotide polymorphisms (SNPs) were incorporated into the analysis, providing a comprehensive genomic landscape for identifying associations between genetic variations and lung cancer susceptibility.
2.3 GWAS data sources of immune cellsThe present study draws upon a compendious collection of 731 immunological traits sourced from the extensive Genome-Wide Association Studies (GWAS) Catalog, covering accession numbers ranging from GCST0001391 to GCST0002121. This rich dataset represents a comprehensive exploration of immune system dynamics, with a focus on understanding the genetic foundations of various immune responses.
From these 731 immune characteristics, a diverse array of phenotypes is represented. Approximately 26% of these traits correspond to relative cell count measurements, capturing the proportional abundance of different immune cell populations. Another 16% consist of absolute cell counts, quantifying the exact number of cells within a specific compartment. Furthermore, approximately 4% of the traits are dedicated to morphological parameters, delving into the structural aspects of immune cells. The remaining majority, comprising nearly 53% of the dataset, involves median fluorescence intensity (MFI) measurements, which provide insights into the functional expression levels of immune-related proteins or receptors.
2.4 Selection of instrumental variablesFor the selection of instrumental variables (IVs), a rigorous methodology was employed, drawing on established practices in the field. A stringent significance threshold of P-value less than 5 × 10^−8 was adopted, as per prior studies, to ensure the reliability of the genetic associations [21]. This threshold helps filter out false positives and guarantees that the chosen SNPs (single nucleotide polymorphisms) are strongly associated with the respective immune cell abundances.
To maintain the genetic independence of the IVs, the European subset of the 1000 Genomes Project (IKG-EUR) was utilized as the reference panel [24]. A stringent linkage disequilibrium (LD) threshold of r2 < 0.001 was set, which aids in eliminating SNPs that are in proximity and potentially correlated. A window size of 10,000 base pairs (kb) was designated to scan the genome, allowing for the exclusion of highly correlated SNPs and ensuring the exclusivity of the selected markers [25].
Subsequently, SNPs that exhibited significant associations with immune cell abundance were integrated into the lung cancer GWAS summary data. The corresponding statistical parameters were extracted, harmonizing the information across datasets. A crucial step was to align the effect alleles of the significant immune cell abundance SNPs with those from the lung cancer GWAS results. This alignment ensures that the estimated hazard and effect values correspond to the same genetic variant, providing a consistent basis for the Mendelian randomization analysis.
2.5 Statistical analysisLinkage disequilibrium analysis, MR analysis, and quality control in the study were conducted by using “Two Sample MR” package in R software (version 4.0.3) [19, 20]. The study used five methods to validate causative effects: IVW method, simple mode method, MR-Egger method, weighted median method and weighted mode method [23, 26]. To test the causal relationship between immune cell phenotypes and lung cancer risks, we employed the Inverse Variance Weighted (IVW) method as our primary method for estimating causal effects. The IVW method is widely recognized for its effectiveness and is commonly used in Mendelian Randomization (MR) studies due to its robustness and statistical power. The IVW method has greater effectiveness of testing than the remaining four MR methods. This study uses the IVW method as the preferred causative effect validated method [27]. At the same time, to interpret the better results, the study converted the resulting Beta values into odds ratios and calculated 95% confidence intervals.
To ensure the stability and reliability of the results, a leave-one-out cross-validation was conducted for sensitivity analysis. Heterogeneity was assessed using Cochran's Q statistic and corresponding p-values. To account for potential horizontal pleiotropy, the MR-Egger method was employed; if the intercept in this model was statistically significant, it indicated the presence of horizontal pleiotropy [28].
Comments (0)