The Stockholm early detection of cancer study (STEADY-CAN): rationale, design, data collection, and baseline characteristics for 2.7 million participants

Construction and content

To address our research questions, a population-based cohort, the Stockholm Early Detection of Cancer Study (STEADY-CAN) cohort, of individuals aged ≥ 18 years old residing or having access to healthcare in Stockholm County has been established. The data extraction period spanned from January 1, 2011, to December 31, 2021. For those turning 18 years old during the study period, data were only included from the date when they turned 18.

An outcome of main interest was incident cancer during the 10-years period from January 1, 2012 to December 31, 2021, defined as having no previous cancer during the outcome period and back to January 1, 1992.

Data were obtained from various prospectively collected data sources and linked using the participants’ unique Swedish personal identification numbers (PINs) [23] to enable large-scale population-based analyses. Cancer data were obtained from the Swedish Cancer Registry (SCR) [22], while data on diagnoses and symptom codes from PHC, specialized outpatient and inpatient care, socio-economic data based on MOSAIC areas [24, 25], and visitor statistics were collected from Region Stockholm’s healthcare administration's databases (VAL) [26]. Additionally, data on dispensed medications relevant to anaemia were collected from the National Prescribed Drug Register [27]. Finally, cancer-related laboratory data were obtained from the active clinical laboratories in the Stockholm Region (Karolinska University Laboratory, Unilabs, and SYNLAB Medilab).

All data were submitted to and linked by the Swedish National Board of Health and Welfare. Once the requested data had been linked, it was pseudonymized by replacing the PINs with randomly generated serial numbers to ensure patient privacy and confidentiality. The key for linking the serial numbers to the PINs was stored by the National Board of Health and Welfare, allowing for potential modifications, follow-ups, and/or new linkages and updates to the STEADY-CAN database. An overview of the data collection procedure is given in Fig. 1.

Fig. 1figure 1

Data sources and cohort construction in the Stockholm early detection of cancer study (STEADY-CAN)

The entire STEADY-CAN project data set was transferred to the responsible researchers after pseudonymization. Data were stored on encrypted and secure servers within the Stockholm Region. Data analyses adhered to university regulations, and the project was reviewed and approved by the Swedish Ethical Review Authority (Dnr. 2021-05069 and 2023-00704-02).

Management of healthcare utilization data

Data on healthcare interactions in Stockholm County are automatically compiled and stored in the region’s healthcare administration’s databases–VAL [26]. The intended uses of VAL include tasks as healthcare planning, determining compensation for healthcare providers, and evaluating the quality of care. All residents in Region Stockholm with documented healthcare interactions are registered in VAL, which also contains information on dates of death and migrations in and out of the region. Both public and private healthcare providers within the region report to VAL.

Data on each healthcare visit was accompanied by the date of the visit (and, if applicable, discharge date), the visited unit/medical department, and established diagnoses according to the ICD 10th revision coding system (ICD-10). All ICD-10 codes were retrieved for each individual from January 1, 2011, to December 31, 2021. Additionally, ICD 10-codes for anaemia diagnoses (D50.x − D64.x and Y44.x) were retrieved for the period January 1996 (or from the earliest possible date) until December 31, 2021, to allow for exclusion of previously known anaemia or hemoglobinopathy. To take account of comorbidity, all data needed to calculate the Charlson Comorbidity Index (CCI) [28] were collected. In addition, comprehensive data on chronic disorders and diagnostic codes from nearly all healthcare contacts were collected, allowing for adjustments for relevant comorbidities based on the specific study designs. Demographic data included sex and date of birth, migration in and out of the region, and which PHC centre an individual was registered with on December 31 each year during the window of extraction. Additionally, socio-economic data based on MOSAIC areas [24] were obtained per individual at each healthcare event. For privacy reasons, the date of birth obtained by the researchers after pseudonymization was truncated to only show year and month of birth, and the date of birth used in the analyses was then set to the 15th of each month.

Table 1 Demographic and clinical characteristics of the STEADY-CAN cohortManagement of prescribed medications

Data on medications relevant to anaemia were included in the STEADY-CAN cohort using their generic names and active substance ATC codes, as outlined in Table 1, along with the quantity of dispensed medication defined daily doses (DDD) [29] and dispensing dates. This information was obtained from the National Prescribed Drug Register [27], a nationwide registry compiling data on all prescribed medications dispensed by Swedish pharmacies. The register had an almost complete coverage (> 99.7%) for all dispensed medications but did not include information on over-the-counter medications or medications for outpatient or hospital care administered within healthcare facilities (Table 2). Data on medication dispensing were collected for each individual from January 1, 2011, to December 31, 2021.

Table 2 Overview of selected ATC codes, their corresponding drug classes, and descriptionsLaboratory data extraction

The second key component of the STEADY-CAN project focuses on the integration of laboratory data. Three main laboratories (SYNLAB Sverige AB part of the SYNLAB group, Unilabs AB, and Karolinska University Laboratory) were responsible for performing nearly all clinical laboratory tests in the region throughout the study period. Each laboratory provider retrieved the specified laboratory tests during the study period encompassing the entire region. The main interest of the present study was haemoglobin measurements in primary care, secondary care, or tertiary care during 2011–2021. All haemoglobin test results during this period, along with additional laboratory measurements routinely associated with cancer investigation, were included in the analyses, guiding continuous patient assessment, and identifying relevant outcomes, as outlined in Table 5. Each laboratory test, aside from test specifics, included a unique patient identifier [23], test date, ordering unit, and analysis method. Only venous haemoglobin measurements were considered. Both inter- and intra-laboratory variabilities were considered negligible, with routine quality checks and standardization assessments of the labs regularly performed by the national external quality assessment (EQA) provider Equalis AB (www.equalis.se).

Management of the Swedish Cancer Registry

The SCR, established in 1958, is one of the oldest disease registers in the world and has high validity [22]. All physicians, including pathology laboratories, in Sweden are obliged by law to report all incident cases of cancer to the SCR. The registry covers all types of cancer, except non-melanocytic skin cancer (ICD10 code C44) and includes a wide range of variables. For the STEADY-CAN cohort, data on all cancer diagnoses with a diagnosis date during the period from January 1, 1992 to December 31, 2021 were collected. Besides date of diagnosis, data were obtained on cancer stages at diagnosis according to the TNM (Tumour, Node, Metastasis) Classification of Malignant Tumours and date of death.

Comments (0)

No login
gif