General Aptitude Test Battery (GATB): Use of Obsolete Tests, Data, and Beliefs Causes Harm

Our main objective was to examine the performance of university students on the GATB CDN (Nelson, 1986) intelligence, verbal, numerical, and spatial aptitudes today, nearly 40 years after it was normed on the unspecified Canadian GWP sample. Our results show that undergraduate university students today scored on average about 1 SD (15 IQ points equivalent) below GATB CDN 1985 GWP norms and about 2 to 3 SD (30 to 45 IQ points equivalent) below what various psychological textbooks claim is the average IQ of undergraduate students (Uttl et al., 2024a). In contrast, these same students scored on average 103 IQ points on Shipley-2 (Shipley et al., 2009) which was normed in 2008 on a US population sample, some 15 years ago. Thus, the GATB CDN aptitude G (intelligence) of our sample is about 18 IQ points equivalent below our sample’s mean Shipley-2 IQ and even lower on aptitude N (numerical) where our sample scored a whopping 26 IQ points below their mean Shipley-2 IQ of 103.

Obsolescence of the GATB CDN and the Career Handbook Data

Our results are consistent with Yeasting’s (1996) findings but indicate that in the intervening three decades, there have been further substantial declines in university students’ performance on the GATB CDN aptitudes G (intelligence), V (verbal) and N (numerical) but not on aptitude S (spatial). In combination, Yeasting’s (1996) and our results demonstrate that the GATB CDN norms have been obsolete and plainly wrong for at least three decades, since at least 1995. In turn, opinions issued during the last three decades by vocational counselors, vocational psychologists, neuropsychologists, and forensic psychologists based on the obsolete GATB CDN norms are invalid, pseudoscientific, misinformation, cloaked in an aura of scientific validity by the academic titles (e.g., Ph.D), and professional designations (i.e., Registered Psychologist) of the (misinformed) experts who relied on these outdated and obsolete norms. In turn, thousands of examinees, employers, insurance companies, government disability benefits agencies, courts, tribunals, and other adjudicative bodies have been likely misled by professionals who relied on obsolete norms to offer their invalid and plainly wrong opinions.

Moreover, as detailed in the introduction, the GATB CDN is often used with the Career Handbook’s (Human Resources Development Canada, 1996, 2003, 2016) even more obsolete, non-experimental, non-normative aptitude level rating data, ratings that have previously been misrepresented as indicating “minimum aptitude levels required” for various occupations (e.g., the example in the introduction; Vespa v. Dynes, 2002 ABQB 25, CanLII.org; Decision No. 1593/23, 2024 ONWSIAT 45, CanLII.org). Using the Career Handbook data, Fig. 2 makes it obvious that our students (with a few exceptions) scored so low that, according to various professionals using GATB CDN and the Career Handbook, these university students are not eligible for any occupation requiring university degrees and that their aptitude G (intelligence) or general mental ability (GMA) make them qualified for either no occupation at all or at best, aptitude level 4 occupations such as janitors, general farm workers, and construction trades helpers (see Table 2). This conclusion holds even after one adds 1 SEM to their scores as per the GATB CDN manual.

Table 2 Example occupations’ G, V, N, and S aptitude levels as listed in the Career Handbook (Human Resources Development Canada, 2016)

To make the situation worse, when assessing individuals with university degrees, vocational psychologists often believe, falsely, that university students and university graduates have, on average, above average intelligence, ranging between 115 and 130 IQ points (i.e., 1 to 2 SDs above the average of the population). Their false belief in the brilliance of undergraduate students is itself based on outdated and obsolete IQ data collected in the 1940s and 1950s and similarly obsolete aptitude level ratings in the Career Handbook. The magnitude of the fictitious impairments resulting from the reliance on the combination of two adverse effects—the obsolete GATB CDN GWP norms and equally obsolete beliefs in brilliance of undergraduate students and university graduates—is nothing short of astonishing, easily amounting to 30 to 45 IQ points. As detailed in Uttl et al. (Uttl et al., 2024a), although the myth of brilliant undergraduate students and university graduates is propagated in various scientific literature, assessment textbooks, and popular press, the average IQ of university students is a mere 102 IQ points today and has been far below the fairy tale belief of university students having an average IQ of 115 to 130 for decades. For example, Longman et al. (2007) analyzed both the US and Canadian normative sample for the Wechsler Adult Intelligence Scale–III (Wechsler, 1997) and found that the middle 95% of the Canadian normative sample with 16 or more years of education scored between 78 and 142 IQ points with a mean of 108.7 and SD of 14.3, whereas the middle 95% of the US normative sample with 16 or more years of education scored between 86 and 140 with a mean of 111.6 and SD of 13.2. A several years later, Holdnack et al. (2013) analyzed the US normative sample for the Wechsler Adult Intelligence Scale–IV (Wechsler, 2008) and found that the mean IQ of the US normative sample dropped still further between WAIS-III and WAIS-IV, down to the mean of 107.4 and SD of 13.9. Accordingly, when a professional uses obsolete test norms and obsolete beliefs in the superior IQ of university students and graduates, the professional easily cuts off some 30 to 45 IQ points from examinees with university education and makes them appear far less smart, instantly eligible for disability benefits, and instantly far below the “required” IQ for specific jobs.

Accordingly, a psychologist using the GATB CDN (Nelson, 1986) with the Career Handbook (Human Resources Development Canada, 2016) to make judgments about an examinee of average intelligence will likely make them appear to have intellectual disabilities, cognitive deficits, and cognitive impairments, often in multiple domains. In some assessment situations, this may be what an examinee is seeking. For example, to qualify for Alberta Income for the Severely Handicapped (AISH), a psychologist relying on the GATB CDN and the Career Handbook would make the examinee appear about 15 to 45 IQ points less intelligent—15 to 25 IQ points because of the obsolete GATB CDN norms and another 15 to 30 IQ points because of the obsolete speculative aptitude ratings in the Career Handbook. An ignorant or unscrupulous psychologist would easily conclude that a person of an average intelligence and cognitive abilities was mentally disabled, unable to perform any jobs (just like almost all of our university students), and qualified for the AISH benefits just as in Almedom v. Wawanesa. The examinee is likely to be satisfied with this assessment because they would likely obtain the financial and other benefits they were seeking. In contrast, taxpayers may not be satisfied to learn that other citizens are drawing disability benefits to which they are not entitled, simply because some psychologists used 40 to 80-year-old GATB data and even more obsolete aptitude rating data in the Career Handbook. In other words, the psychologist using obsolete tests and data to make an examinee look mentally disabled may be helping the examinee but at the same time improperly draining the government assistance programs and taxpayers themselves, that is, the psychologist is helping the examinee to defraud government assistance programs even if the examinee does not know it. In other assessment situations, an examinee may be forced to undergo a fitness-for-duty evaluation by an employer who may or may not have legitimate concerns about the examinee. Psychologists using the obsolete GATB CDN with the Career Handbook (and artificially making the examinee appear 30 to 45 IQ points less smart) are likely to opine that the examinee is unable to perform their job, the opinion the employer is seeking and paying for. In turn, the employer will rely on the opinion of the registered psychologist in an attempt to justify the firing of the employee just like the employer did in Ms. T’s case. In this later situation, the psychologists cause harm directly to the individuals assessed while, knowingly or unknowingly, assisting the employers.

Clearly, vocational counselors and psychologists using these obsolete tests, norms, and the Career Handbook data failed in their duty to keep up with science. Importantly, the Career Handbook clearly states it is to be used for low-stakes career exploration and counseling only and specifies that it is not to be used for determining insurance benefits or in any high-stakes criterion referenced employment decisions. Any professional who treats the Career Handbook’s Aptitude Profiles as minimum requirements or uses them as a criterion in high stakes assessments has done so contrary to the expressed limitations in the Career Handbook.

Implications for Clinical Practice

The question arises as to whose responsibility it is to ensure that counselors and psychologists use current, reliable, and valid tests for their intended purposes, whether it be counseling/job exploration, eligibility for government benefits, or fitness for duty assessments. Some have argued that it is the duty of the psychologist to ensure that the test and its norms are current, and this duty is in fact explicitly stated in many standards and ethics codes including the Standards for Educational and Psychological Testing (AERA et al., 2014) and the APA Ethical Principles (American Psychological Association, 2017). In contrast, many other standards and ethics codes are silent and do not mention anything about any duty of psychologists to rely on current tests and current norms and may not even mention that psychologists ought to use reliable, valid, and current tests (e.g., College of Alberta Psychologists Standards of Practice, 2019, 2023; Canadian Code of Ethics for Psychologists, 2017). Accordingly, others have argued that it is the test publisher’s or someone else’s responsibility, that the test is current until the publisher issues a new version and new norms, and that the test is valid until someone invalidates it (Russell, 2010). This later view is misguided, clearly wrong, and directly causing harm.

The Standards for Educational and Psychological Testing (AERA et al., 2014) have been clear that it is the responsibility of the user to verify that the test and norms are appropriate for purposes they are to be used for, before they are used. Standard 9.7 states:

Test users should verify periodically that their interpretations of test data continue to be appropriate given any significant changes in the population of test takers, the mode(s) of test administration, and their purposes in testing. Comment: Over time, a gradual change in the demographic characteristics of an examinee population may significantly affect the accuracy of inferences drawn from the group averages.

Obviously, if the only research on the reliability and validity of some test that psychologists can find is 30 or even 80 years old, the psychologists have not “[verified] periodically that their … test data continue to be appropriate” given significant changes including the educational attainment and structure of populations, etc..

The APA Ethical Principles of Psychologists and Code of Conduct (American Psychological Association, 2017) are similarly clear that obsolete tests and outdated test results are not to be used to support psychologists’ opinions. Principle 9.08 Obsolete Tests and Outdated Test Results states:

(a)

Psychologists do not base their assessment or intervention decisions or recommendations on data or test results that are outdated for the current purpose.

(b)

Psychologists do not base such decisions or recommendations on tests and measures that are obsolete and not useful for the current purpose.

The Canadian Code of Ethics for Psychologists (Canadian Psychological Association, 2017) includes nothing at all about reliability of tests, validity of tests, currency of tests, currency of test norms, or anything related to testing and assessment at all except very broad principles requiring psychologists, for example, to “Keep themselves up to date with a broad range of relevant knowledge, research methods, techniques, and technologies…” (Principle II.9) (Canadian Psychological Association, 2017, p. 20).

The situation is similar in the standards of practice published by various provincial and state regulatory bodies. For example, the College of Alberta Psychologists’ Standards of Practice (College of Alberta Psychologists, 2023) mentions nothing about tests at all except that psychologists need to maintain “test results” and “the basic test data” in the section titled “Maintaining Client Records.” In contrast, the Standards of Professional Conduct of the College of Psychologists and Behaviour Analysts of Ontario (College of Psychologists & Behavioral Analysts of Ontario, 2024) imposes explicit duties on psychologists to be familiar with tests and techniques, including reliability and validity of tests, appropriate use of tests, and even avoidance of “outdated norm-based data.” Verbatim, the Standard 10.1 states:

Familiarity with Tests and Techniques

Registrants must understand and adhere to the standardized norms, reliability, validity, and/or appropriate application of tests and techniques. Registrants must also avoid using outdated, obsolete, or invalid tests. In cases where no appropriate tools are available, they may use individual test items or stimuli for clinical assessment purposes, avoiding use of outdated norm-based data, or the otherwise inappropriate use of such data. Any departure from proper use should be documented with a clear rationale.

Unfortunately, none of the standards and ethics codes provide any guidance as to when the tests and test norms become “outdated” and “obsolete” and when use of test data becomes “inappropriate.”

The International Test Commission’s Guidelines for Practitioner Use of Test Revision, Obsolete Tests, and Test Disposal (International Test Commission, 2015, p. 15) defines generally when a test is obsolete, but it also assumes that psychologists and other professionals conducting assessment have at least some minimal knowledge of the basic principles of psychometrics and psychological testing, and understand, for example, what “underlying theory,” “norms,” “item content,” and “technical adequacy” are:

... Generally speaking, a test is obsolete when its underlying theory, item content, norms, or technical adequacy no longer meet the needs for its intended purpose, professional standards, or when its continued use would lead to inappropriate or inaccurate decisions or diagnoses.

Unfortunately, some psychologists lack basic understanding of essential psychometric concepts such as how test validity is affected by normative samples, changes in populations of interest, and other factors. They incorrectly believe that if someone published some test and some norms that the test is valid for whatever use those psychologists want to use it for forever or until someone demonstrates that the test and its norms are invalid. For example, Russell (2010, p. 66) states:

... neuropsychology should recognize that the research, including norming, based on a validated test or battery remains valid until it has been demonstrated to be invalid. Regardless of how long ago a validated test was published, its results are still sound unless research has demonstrated its lack of validity.

The idea that the tests and norms remain valid until someone demonstrates them to be invalid is clearly wrong and misguided. For example, when a population is changing as time goes by (e.g., population composition, educational curriculum and experience, educational attainment, familiarity with technology, and lived experiences), norms derived from testing a representative sample of the original population as it once was, decades ago, become invalid for the current population because they no longer describe how a representative sample of the current population would perform on the test. For example, there is no need to conduct any research studies to invalidate test norms from the 1940s or 1950s to appreciate that the average IQ of university students and university graduates declined as a greater proportion of population attained university education and university degrees, and therefore, that any norms from the 1940s and 1950s became invalid (Uttl et al., 2024c). In fact, it is a mathematical fact requiring no research at all that when all members of a population attain university degrees, the average IQ of those members with university degrees will be 100 with SD of 15 by definition. Accordingly, a periodic renorming of psychological tests is necessary in order to verify that “interpretations of test data continue to be appropriate.” When the renorming or verification of the tests’ currency is not done, the test needs to be abandoned and must not be used, to prevent harm to examinees and society.

The problem of psychologists and other professionals relying on decades obsolete tests as if they were current, reliable, and valid is not limited to the GATB nor to intelligence tests only. For example, Wonderlic (1992, p. 2) reported in 1992, three decades ago, that there had been a large decline in performance of undergraduate students and graduates on the Wonderlic Personnel Test (WPT), another intelligence or general mental ability test widely used in occupational counseling and employment selection, which also forms the basis for many now outdated and obsolete research claims about intelligence, education, and occupational success (Gottfredson, 1997, 2002). Despite the large declines observed on the WPT, it has not been renormed since 1992, and counselors and psychologists continue to rely on norms and interpretive guidelines published in Wonderlic (Wonderlic, 1992) that are now more than four decades obsolete. Similarly, vocational counselors and psychologists use outdated and obsolete personality tests. For example, the Personality Assessment Inventory (PAI) (Morey, 2007), a widely used personality test, was normed in 1991 or earlier on normal adults, clinical samples, and college samples, rendering the norms about 35 years old today. Consequently, a number of studies have shown that PAI norms (for at least college students) are wrong by as much as one to two standard deviations on various clinical scales and are no longer describing current populations of college students in the USA or Canada (Nails et al., 2023; Uttl et al., 2024b). In turn, many examinees whose scores are determined using the 1990 norms will have numerous elevated clinical scale scores suggesting various psychopathologies, but when compared to the current samples of college students, they are in fact perfectly normal and average (Jeffay et al., 2021; Nails et al., 2023; Uttl et al., 2024b).

Implications for Legal Professions

Our findings have numerous implications for legal professions including lawyers, judges, and others involved in legal proceedings. First, lawyers, judges, and others need to determine the basic information about tests administered to examinees including, at minimum, the date the test was published, the date the norms were established, the characteristics of the normative sample, the population the normative sample was recruited from, and the availability of any subsequent independent research on the psychometric properties of the test. With respect to the GATB CDN, the answers to these questions today would raise numerous red flags: the test was published 38 years ago, the norms for 2/3 of the test were established in 1985 on the Canadian sample, and the norms for 1/3 of the tests were established in 1940s on the US sample (about 85 years ago); the Canadian sample included nearly 1000 examinees, but nothing is known about this sample except that it was meant to be the General Working Population sample, and critically, subsequent research by Yeasting (1996) indicated the GATB CDN norms were already invalid by 1995.

Second, lawyers, judges, and others involved in legal proceedings need to realize that psychological tests and norms become more invalid and more hazardous to use as more time has elapsed from the test publication and/or the norms creation date. While the test and norms’ invalidity does not accrue from a simple passage of time, the test and norms invalidity accrues due to changes in a wide variety of relevant factors including changes in composition of populations, abilities, and personality traits of populations, technology, language, and even changes due to extraordinary events such as the COVID-19 pandemic.

Third, lawyers faced with psychological assessment reports may be best advised to seek services of scientists, for example, university professors with specialization in psychometrics, psychological testing, assessment, and thus, the ability to evaluate scientific underpinning of the tests, the test norms adequacy, and the limitations on conclusions that can be drawn from the test scores. The scientists as opposed to practitioners are also far more likely to have access to scientific databases and be able to locate up-to-date scientific research on any specific test. In turn, they are far more likely to provide lawyers with up-to-date current science on any given tests than practitioners whose relevant knowledge may date to their graduate school training obtained perhaps decades ago.

Fourth, lawyers, judges, and others cannot assume that practitioners’ credentials/titles, the existence of the standards of practice, and the codes of ethics are evidence of relevant expertise and ethical conduct of such practitioners. As detailed above, some standards of practice and ethics code do not even explicitly require that practitioners be familiar with tests, with reliability and validity of tests, and do not even explicitly require practitioners to be up-to-date on science underlying various tests. Moreover, as demonstrated by Ms. T’s case, some regulatory bodies, for example, the College of Alberta Psychologists, consider using even 80 years old outdated test norms as at least minimally competent conduct (Uttl et al., 2024c).

Finally, it has been widely acknowledged that when psychologists’ economic well-being depends on a steady stream of referrals for assessments for disability support and/or fitness for duty assessments and subsequent expert testimony, the situation often produces bias, conflict of interest, and even “hired guns” or the “soldiers” for the retaining parties, that is, psychologists hired to arrive to specific ethically dubious opinions that retaining party seeks and wants to rely on (Harrison & Sparks, 2022) rather than a “scout” on a mission to determine the truth (Lovett, 2022). A retaining party may even explicitly approve or disapprove of the psychologist’s findings and opinions even before the psychologist writes the assessment report. Frequently, following the assessment, the psychologist calls the retaining party and presents the results of the assessment and resulting opinions, and if the retaining party does not like the results and opinions, it informs the psychologist that no report is needed and thanks the psychologist for their work to date. In contrast, if the retaining party likes the results, the party asks the psychologists to prepare the report and may ask the psychologist later to testify as an expert in legal proceedings. In this situation, the psychologist will be strongly motivated to produce opinions that the retaining party seeks to maximize billable hours, charged at $300 to $500 per hour, and ensure a steady stream of referrals seeking the psychologist’s expertise and friendly opinions. In these situations, outdated and obsolete tests provide unique business opportunities for unscrupulous vocational and forensic psychologists. For example, a vocational/forensic psychologist may assess a Canadian examinee using WAIS-IV CDN (Wechsler, 2008) and then compare the examinee’s Full Scale IQ (FSIQ) to the mean FSIQ of examinees employed in specific occupation decades ago and assessed with the WAIS (Wechsler, 1955). In this comparison process, the examinee scoring in the “average” range loses between 13 (Uttl et al., 2024c) and 16 IQ points to the Flynn Effect (at 0.3 IQ/year) and another 5 to 8 IQ points to the difference between scoring the examiner’s WAIS-IV using Canadian vs. US norms (Harrison et al., 2014; Uttl et al., 2024c). The psychologist may then conclude that the examinee scored at the bottom of their vocational peers and may be eligible for disability benefits simply because the psychologists compared the examinee’s WAIS-IV CDN IQ scores to the norms for a different, outdated, and obsolete test normed over a half-century prior to WAIS-IV CDN on a different population. An ignorant or unscrupulous psychologist may further bolster their opinion that the examinee is intellectually disabled by simply using a very liberal criteria for “impaired” or “low” scores (Guilmette et al., 2020; Suhr & Johnson, 2022) and by cherry picking low scores from the collection of a half a thousand scores produced by a typical battery of 15 neuropsychological and personality tests (Harrison & Sparks, 2022). If the examinee is seeking disability benefits, they will be happy with the psychologist’s opinion. Similarly, if an employer is seeking to dismiss the employee, they will also be happy with the psychologist’s opinion, although the employee might not be. In all of these situations, the psychologists will maximize the steady stream of referrals, high billable hours, and high incomes. In most cases, the losing parties, the examinees or the tax payers, will never know what really happened.

Limitations

Our study examined how undergraduate university students perform on G (intelligence), V (verbal aptitude), N (numerical aptitude), and S (spatial aptitude) measures. However, our data do not constitute new updated GWP norms nor new general population norms for the GATB CDN.

Although our data demonstrate conclusively that the GATB CDN (Nelson, 1986) norms for aptitudes G, V, N, and S are so outdated that their use amounts to practice of pseudoscience, we have not examined performance on the GATB CDN parts 1, 5, and 7 to 12 and accordingly have no data on how today’s university students may perform on aptitudes Q (clerical perception), P (form perception), K (motor coordination), F (finger dexterity), and M (manual dexterity). How today’s undergraduate university students would perform on these unexamined aptitudes is unknown. Today, the GATB CDN aptitudes P and Q GWP norms are 40 years old, and the GATB CDN aptitudes K, F, and M GWP norms are over 80 years old and were merely copied from the USES GATB GWP norms.

Some readers may question whether relying on obsolete and outdated tests, norms, and non-experimental data misrepresented as norms (e.g., aptitude level rating data in the Career Handbook) by vocational psychologists, forensic psychologists, and other practitioners is junk science and pseudoscience. They may not be aware that “junk science” and “pseudoscience” are terms widely used in popular, scientific, and legal literature and cases. Junk science is defined as bad science (flawed, biased, and unreliable research); pseudoscience includes junk science as well as claims supported by no scientific evidence at all, for example, claims relying on authority, reversed burden of proof (e.g., psychological tests and norms are valid until proven otherwise), and other tricks (Lilienfeld et al., 2014). To illustrate, the courts have recognized that experts often proffer junk science, that juries are ill-equipped to distinguish science from junk science, and accordingly, the courts have set up procedures for vetting scientific evidence proffered to the court to keep junk science and pseudoscience out of the courtroom (Daubert v. Merrell Dow Pharmaceuticals, Inc., 1993; R v. Mohan, 1994; White Burgess Langille Inman v. Abbott and Haliburton Co., 2015). Unfortunately, numerous academic as well as judicial reviews indicate that the courts’ efforts to eliminate pseudoscience out of the court room have been largely unsuccessful (Goudge, 2008; Young & Goodman-Delahunty, 2021).

Comments (0)

No login
gif