Development and validation of an assessment tool for public health emergency management program

Study setting, period design

The study was conducted in Ethiopia, a country in the Horn of Africa with a population of about 130 million. It is the second most populous nation in Africa and is bordered by Eritrea to the north, Djibouti and Somalia to the east, Sudan and South Sudan to the west, and Kenya to the south. The study purposefully engaged diverse national, regional, and local emergency management practitioners across Ethiopia's heterogeneous contexts. The study period was from July 2023 to June 2024.

During this period, the literature and desk review informing the initial tool design was conducted over approximately six months, starting from July 2023. An initial draft questionnaire was developed within one month following this preliminary work. Stakeholder feedback was incorporated through consultation workshops over one month. Face and content validation was completed in one month, followed by nearly two months for pilot testing. In total, twelve months were dedicated to instrument development, expert input, and finalizing the tool following validity testing.

Public health models: Donabedian framework

The literature review addressed the structural dimension of the Donabedian framework by identifying critical gaps in resources, infrastructure, and supply chains essential for PHEM, thereby informing the design of our assessment tool. Our methodology included consultative workshops with key stakeholders to uncover procedural challenges like ineffective communication and limited inter-sectoral coordination, which helped establish best practices and collaborative approaches for effective PHEM. The outcome dimension was validated through exploratory factor analysis face validity, and content validity assessments, ensuring the tool accurately captures key PHEM components, thereby measuring the effectiveness of emergency management programs. We also demonstrated how these theoretical components apply in practice, with elements like resource allocation and multi-sectoral coordination directly enhancing emergency response times and community resilience, ultimately improving public health outcomes.

Instrument development procedures

Instrument development followed a comprehensive process to establish a robust theoretical and empirical foundation. A literature review synthesized relevant frameworks, defining core domains and indicators. Subject matter experts conducted a focused evidence review, evaluating conceptual coherence and extracting measurement priorities. An expert consultative workshop employed structured techniques optimizing content validity and applicability. The conceptual model iteratively evolved by integrating stakeholder viewpoints and triangulating evidence sources into a contextually grounded, multidimensional framework. Items were systematically developed based on psychometric principles to assess hypothesized theoretical constructs and associations across proposed subscales. Finally, rigorous translation protocols involving independent forward and back translation addressed linguistic equivalence prior to pilot testing.

Additionally, we have incorporated Donabedian's Structure-Process-Outcome Model as a theoretical framework to guide the development and validation of our tool. This model allows us to assess the resources and workforce (structure), the multi-sector coordination and resource allocation processes (process), and the effectiveness of responses and community resilience [25]. By employing this framework, we ensure that our tool not only assesses the implementation status of public health emergency management programs but also elucidates the relationships between structures, processes, and outcomes. This theoretical underpinning enhances the comprehensiveness of our tool and its effectiveness in both local and global contexts.

Literature review and desk review

The study followed methods described by Zamanzadeh et al. [26] to develop the questionnaire. To establish a theoretical foundation, an extensive literature review was conducted by a team of eight subject matter experts [27]. This review systematically identified, evaluated, and synthesized relevant empirical studies and theoretical frameworks from the published literature on public health emergency management and program assessment, following best practices [28, 29]. Specifically, the literature review explored existing frameworks and models for public health emergency management programs; core components and domains of emergency preparedness and response systems; items and factors influencing program implementation status; validated assessment tools used to evaluate emergency management programs; and relevant population, implementation, and outcome-related variables. This review process defined pertinent domains and key items, and identified important variables, populations, and validated domains for capturing multi-level determinants of public health emergency management program implementation status to inform development of the assessment tool. While resources and processes are critical, the outcomes associated with effective PHEM programs remain underexplored. This study aims to bridge this gap by integrating outcome measures aligned with Donabedian's framework.

A focused desk review was conducted in Adama town by subject matter experts [30] to critically appraise and consolidate the evidence synthesized during the comprehensive literature review phase. This process involved evaluating the conceptual coherence, empirical grounding, and contextual relevance of identified domains and indicators [31], cross-analyzing themes to ensure theoretical saturation [32], and extracting key measurement priorities and substantive focal areas for tool development [33]. The desk review facilitated systematic consolidation of the accumulated knowledge and distillation of the evidence into a coherent preliminary framework. This framework synthesized diverse perspectives to advance the nascent measurement model by comprehensively integrating empirical findings and conceptual underpinnings. Through an iterative process, the output was refined to subsequently inform consultative workshop discussions. These discussions centered on examining the proposed items'conceptual relationships, the underlying constructs being assessed, and strategies for optimizing the tool's applicability across various contexts.

Consultative workshop

A two-day consultative workshop was then held in Hawassa, the southern capital of Ethiopia, where we brought together 12 subject matter experts from all regions of the country. Participants included public health officials who provided local contextual insights, as well as national experts from organizations such as the Ethiopian Public Health Institute and the Ministry of Health, who contributed a broader perspective. The workshop employed various techniques to gather stakeholder input aimed at enhancing the content validity of the measurement tool. An iterative framework development phase systematically integrated these diverse Ethiopian viewpoints. Guided by this optimized theoretical model, we developed initial items for pilot testing, taking into account evidence from previous steps that engaged national and regional Ethiopian experts. Overall, this multi-step instrument development process embedded perspectives specifically from within the Ethiopian public health context, ensuring the cultural relevance and local applicability of the tool for use in Ethiopia.

Item review and refinement utilized a mixed deductive-inductive approach [34]; Q-sort procedures assessed convergent and discriminant validity [35]; focus groups examined item clarity, relevance, and comprehensiveness [36]; and expert review with content validity indexing was conducted [37]. Through iterative consensus-building, experts critically evaluated the conceptual relationships between proposed items and underlying constructs, probing construct validity, semantic appropriateness, and theoretical congruence of the initial item pool. Contextualization strategies were devised to enhance cultural relevance, community acceptability, and local applicability, recommending systematic translation procedures incorporating back-translation, cognitive debriefing, and pilot testing to ensure conceptual and semantic equivalence [38]. This structured consultative process leveraged complementary qualitative and quantitative approaches to critically examine the emerging instrument's psychometric properties, consolidating multidisciplinary inputs to optimize content validity, construct validity, and cross-cultural applicability prior to pilot testing and psychometric evaluation.

Iterative framework development

The iterative framework development process systematically integrated diverse stakeholder viewpoints and empirical evidence from multiple sources to advance the nascent measurement model. It involved triangulation across findings from the literature review, desk review, and consultative workshop [39], critical reflection and theory-based refinement after each development stage [40], constant comparative analysis to identify convergent and divergent perspectives [41], and member checking to verify interpretive congruence with participant intents [42]. This iterative approach facilitated critical examination of the consolidated conceptual, empirical, and methodological evidence, with feedback informing revisions to optimize alignment between the evolving theoretical model and empirical data [43]. By synthesizing inputs from a multidisciplinary team of five core researchers engaging 10 external experts/stakeholders across phases like expert reviews, the comprehensive process enhanced the construct validity and content validity of the final integrated measurement framework [31].

Item development

Item generation was guided by established scale development procedures and best practices from the psychometric literature. Measurement items were drafted to logically assess the identified theoretical factors and their hypothesized relationships [31, 34]. This involved grouping items into proposed subscales corresponding to the key construct domains [44]. Item formulation carefully considered evidence from the literature review, expert consultations with a panel of 10 subject matter experts [30], and stakeholder inputs to ensure content representation and relevance to the Ethiopian context. The pool of items drew upon diverse sources and viewpoints, anchoring the measurement approach in both empirical evidence and practical considerations from the field [45, 46]. This integrated process of systematically deriving items based on theory and formative research involving literature reviews, expert consultations, and stakeholder inputs helps establish content validity [26, 37].

Language translation

The questionnaire was translated from English to Amharic using a forward translation process. Four independent translators, proficient in both Amharic and English, completed the forward translations. Subsequently, two additional language experts conducted back-translations of each forward translation into English. The back-translations were then compared with the English version to ensure the consistency of conceptual meaning and interpretive accuracy. This translation process comprehensively addressed translatability concerns before further refining the questionnaire through pilot testing and analysis. Following preliminary field testing and analysis, the translated version was empirically evaluated for reliability, validity, and cultural appropriateness within the target population and setting [38].

Validity tests

Validity testing is crucial for ensuring that research tools accurately measure their intended constructs [47]. The development of reliable and valid tools involves several steps, including item generation, reliability assessment, and various forms of validity testing [48]. This process is particularly important in fields like health economics, where model validation tools can enhance the consistency and reproducibility of economic evaluations [49]. The need for robust tool development and validation extends to management sciences, where practitioners and theoreticians alike recognize the importance of developing standardized approaches for selecting appropriate research methods and techniques [50]. Such tools can improve the quality and reliability of research processes, addressing the challenges of creating and verifying new theories in management sciences. Face validity involved a panel of public health emergency management experts assessing the tool for relevance and clarity, resulting in refined item wording. Content validity was established using a Content Validity Index (CVI), which ensured all key aspects of public health emergency management were represented. This multi-faceted validation process guarantees that the tool accurately reflects critical constructs for reliable assessment and application.

Face validity

A face validity study was conducted to evaluate the translated questionnaire prior to data collection. A total of 30 subject matter experts in PHEM working in the Ethiopia context were invited to participate. This number of experts falls within the recommended range of 25–75 respondents for face validity evaluation [51]. The questionnaire and study objectives were provided to the experts, with clear instructions to critically review each item. After one week, a panel discussion was held, during which the experts evaluated each item line-by-line and provided feedback and clarity/comprehension ratings on a 4-point scale.

The discussion focused on assessing whether the questionnaire items appropriately measured the research objectives in terms of relevance, representativeness, and comprehensiveness. All feedback was documented, and necessary revisions were made to the questionnaire. Items achieving a Face Validity Index (FVI) of at least 0.80 were retained in the final questionnaire. This 0.8 cutoff for the FVI was chosen based on widely accepted psychometric conventions and recommendations from the literature on instrument development and validation [37, 46, 52, 53]. Specifically, an FVI of 0.8 or higher is commonly considered an acceptable level indicating items are judged as clear and comprehensible by a sufficient proportion of expert raters [52, 53]. Using this 0.8 cutoff aligns with seminal guidance from Polit and Bec [37, 46]

Utilizing a 4-point relevance rating scale with multiple experts, as was done in the present study, adopts a conventional 0.8 standard that allows consistency with best practices, facilitates comparability to prior instrument validation studies across different fields, and represents an accepted balance between achieving adequate face validity while retaining sufficient content coverage[37].

Two methods were employed to determine the scale-level face validity index (S-FVI). The S-FVI/Ave took the mean of the Item-level Face Validity Index (I-FVI) scores across all items on the scale, alternatively calculated as the average clarity and comprehension proportions across raters. The S-FVI/UA represented the proportion of items that received full agreement (i.e., a rating of 3 or 4) from all raters. A universal agreement (UA) score of one was assigned if 100% agreement was achieved, otherwise 0. The S-FVI/UA was then calculated as the sum of UA scores divided by the total number of items.

In general, the I-FVI and the two S-FVI calculation methods—the average and universal agreement methods—provided quantitative indices to systematically evaluate face validity at both the item and scale levels during instrument development and validation.

Content validity

Content validity of the questionnaire was established through an expert review involving eight PHEM experts with relevant educational qualifications. We conducted face validity assessments with experts to ensure that each item accurately reflected the constructs being measured. They independently rated each item on a 4-point Likert scale assessing how relevant the item was in measuring the designated construct. The scale was defined as: (1) irrelevant, (2) somewhat relevant, (3) quite relevant, and (4) highly relevant. Scores of 1–2 were coded as ‘0’ and 3–4 coded as ‘1’. After calculating Item-level content validity index (I-CVI), each item was judged as appropriate if the I-CVI was higher than 0.83 and eliminated if it was less than 0.83 [54, 55]. This cutoff value of 0.83 for an acceptable I-CVI was selected based on evidence-based recommendations from seminal content validity literature. With eight subject experts providing relevance ratings in the present study, Polit et al. [37] recommend a minimum I-CVI value of 0.83, demonstrating excellent content validity for an item to be retained. Utilizing this evidence-based 0.83 threshold follows validated guidelines and ensures only items achieving a sufficiently high degree of agreement among the expert panel regarding their relevance are included in the final instrument. A lower cutoff risked retaining items lacking adequate content validity support, while a higher value may have been overly stringent given the relatively small panel of 8 raters, potentially excluding too many items [54, 55].

Factorial validity

Exploratory factor analysis (EFA) was selected for its strength in identifying underlying dimensions within complex constructs, making it especially suitable for capturing the multifaceted nature of PHEM [56]. This approach allowed us to empirically validate the tool’s structure by uncovering latent factors that represent critical components of emergency management, such as coordination, resource allocation, and system readiness. This expanded explanation underscores the relevance and rigor of our chosen methods, further enhancing the tool’s credibility and applicability in diverse settings. Prior to factor extraction, the Kaiser–Meyer–Olkin measure verified adequate sample size for a valid analysis [57, 58]. Initially, principal component analysis with Varimax rotation was performed to extract factors based on standard criteria, including eigenvalues exceeding 1 [29, 59], inspection of the Scree plot, and factor loadings above 0.4 [56, 60]. Further, parallel analysis was utilized as an empirical means of determining the optimal number of factors to retain, empirically accounting for inter-item correlations and strengthening validation of the revealed factor structure, following recommended statistical practices [29, 59, 60]. A minimum of 60% variance explained has been shown to provide good empirical support for confirming a measurement tool's construct validity, as referenced in seminal methodology texts [56].

Internal consistency reliability test

The study used the pre-final version of the instrument to test the internal consistency reliability. Experts in selected national, regional, districts and facility participated, and their responses were analyzed to assess reliability.

Participants and sample size

The study population consisted of public health emergency management (PHEM) experts working at national PHEM, across various regions, zones, and woredas (districts) of Ethiopia. The participants were selected from areas with different levels of experience in implementing PHEM programs, ensuring a diverse representation of perspectives and contexts.

During the tool development process, we conducted two regional consultation workshops: one in Adama during the desk review phase to gather stakeholder input and another in Hawassa-Sidama during instrument development to obtain feedback on the draft tool. For validity assessment, we purposively selected 30 face validity experts and 8 content validation experts from various regions, including Sidama, to ensure representation of diverse emergency management roles at both national and sub-national levels. This approach helped ensure that the content, structure, and questions of the tool were relevant and comprehensive.

For reliability testing, 260 professionals completed the survey instrument. These respondents were recruited across multiple regions in Ethiopia, with support from regional health bureaus, reflecting the intended national and regional user population involved in public health emergency management.

Data collection procedure

The developed tool was distributed to the study sample for data collection. The tool was disseminated online via email using a secure survey platform. Participants received an email invitation with a brief introduction to the study aims and a link to access and complete the anonymous self-administered questionnaire electronically. This online distribution method enabled efficient data collection while adhering to social distancing protocols.

Data analysis methods

The data analysis methods section outlines the steps taken to validate the psychometric properties of the developed tool through exploratory factor analysis for construct validity, Principal Component Analysis (PCA), internal consistency reliability testing using Cronbach's alpha, and Descriptive Statistics utilizing the Statistical Package for Social Sciences (SPSS) version 25 software. This provides a comprehensive overview of the data analysis methods used to ensure the tool accurately measures the intended constructs related to PHEM implementation status. It exhibits reliable and consistent measurement properties across diverse settings in Ethiopia, as follows.

Exploratory factor analysis(EFA) and Principal component analysis(PCA)

Exploratory factor analysis (EFA) was employed to examine the underlying factorial structure of the questionnaire and validate its ability to measure the intended theoretical constructs consistently. The data's suitability for factor analysis was first assessed by examining the correlation matrix for coefficients exceeding 0.3, item communalities, and the results of Bartlett's Test of Sphericity and the Kaiser–Meyer–Olkin [61] Measure of Sampling Adequacy.

Principal Component Analysis (PCA) with Varimax rotation was then performed to extract factors, utilizing the criteria of eigenvalues greater than 1, inspection of the scree plot, and factor loadings above 0.4. Additionally, parallel analysis was conducted as an empirical method to determine the optimal number of factors to retain, accounting for inter-item correlations.

Internal consistency reliability

The internal consistency reliability of the instrument was evaluated using Cronbach's alpha. This statistic measures the homogeneity of items within each scale, with values of 0.70 or higher indicating adequate internal consistency [62].

Descriptive statistics

Descriptive statistics, including frequencies and percentages, were calculated to summarize the socio-demographic characteristics of the study participants.

Ethical clearance

Ethical approval for this tool development and validation study was granted by the Institutional Review Board of the Ethiopian Public Health Institute (EPHI IRB) under reference number EPHI 6.13/68 on 19 July 2023. All participants provided informed consent after being informed about the study objectives to develop and validate an assessment tool and the procedures, potential risks, and benefits of participation. They were assured of the confidentiality of their data and responses, their right to withdraw at any time without reprisal, and that anonymity would be maintained.

Comments (0)

No login
gif