Traditional IQ Tests, Their Subtests and Game-Based Assessments

Jiaying Law

People Scientist

Let’s be honest: we all want “smart” people in our teams since we think that they will be competent and learn things fast. But how do we actually know the person we are going to hire is “smart”? 

One of the most popular and scientifically-proven methods for predicting future job performance is the General Mental Ability test (GMA a.k.a. the traditional IQ test). However, at Equalture, from a scientific standpoint we have chosen to develop game-based assessments (GBA) instead of a traditional IQ test. 

  • A deep dive into GMA tests
  • Should we see the scores from subtests separately? 
  • What can GBAs bring to the table?

What exactly do the GMA tests measure?

In most of the research on validities of recruitment methods, GMA has the highest predictive power when it comes to predicting future job performance (Schmidt & Hunter, 1998 & 2004; Schmidt et al., 2016; Kuncel et al., 2004 & 2014). 

Broadly speaking, GMA assesses individual differences in cognitive abilities among people, which is also known as g factor (i.e., intelligence). The dominant model of g in intelligence research goes to the Cattell-Horn-Carroll (CHC) theory of cognitive ability (Schneider & McGrew, 2012). Simply put, CHC theory proposed that the g factor (Stratum III) reflects the level of shared characteristics among specific cognitive abilities (Stratum II; e.g., fluid reasoning, visual processing, processing speed etc.; Landers et al., 2021). A more detailed discussion of intelligence can be found here

Provided that GMAs represent a broad range of cognitive abilities, different test publishers include various aspects of cognitive abilities in their GMA assessments. For example, the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) is commonly used by trained professionals to measure a wide range of cognitive abilities and determine IQ scores of individuals ranging from 0 months to 90 years old (See the graph below; Climie  & Rostad, 2011). According to the test manual (Wechsler, 2008), the average testing time for completing all the core subtests has been reduced to “only” 67 minutes compared to the WAIS-III.

Should we look at scores from subtests separately?

As we mentioned earlier, most GMAs consist of different subtests aiming to efficiently cover the whole range of our cognitive abilities. Each subtest assesses our cognitive abilities in a very specific, narrowed-down manner, such as testing our vocabulary. In this section, we will discuss how (in)efficient it is if we measure all different types of specific cognitive abilities separately.

In the hierarchical structure of intelligence as CHC theory suggested, g has been positioned at the highest and most general level, which means that it serves as a common factor across specific cognitive abilities regardless of the subtests of intelligence tests. The graph below objectively shows an example (using Wechsler Intelligence Scale for Children-Fifth Edition; WISC-V) of how g can account for certain extents for each specific cognitive ability in representing a person’s cognitive aptitudes (McGill et al., 2018; Canivez et al., 2017).

So, what does this result try to tell us? 

Most importantly, we need to be aware of the pervasive influence of g across all subtests in GMAs and be cautious if we want to make a conclusion about certain Stratum II scales and Stratum I subtests(McGill et al., 2018). 

As we can see from the graph above, g explains large portions for each subtest except for the “Coding” subtest. Given that the Stratum II Index integrates several of these subtests, g might also saturate those index scores. This might lead to a risk of overinterpretation or misinterpretation of these Stratum II indices and also overestimating their “reliabilities”. 

Despite that, McGill and colleagues (2018) also pointed out that there is a lack of evidence supporting IQ tests with more than 4 Stratum II Indices (e.g., WJ-IV-Cognitive). They warned us that interpreting scores of specific cognitive abilities could backfire if the IQ test has more than 4 Stratum II Indices (i.e., the risk of over-factoring). Thus, a four-factor structure, like WAIS-IV, could be a good basis for designing a GBA measuring intelligence (Dombrowski et al., 2018). 

In short, these reminders tell us that we should view the IQ test which measures the g factor as a whole, rather than individually interpreting scores from each subtest, to avoid misevaluation of one’s aptitudes. 

What can GBAs bring to the table?

As broad as their names are, game-based assessments or gamified assessments could be ranging from only changing the appearance of traditional assessments to deliberately designing theory-driven game-based assessments that capture players’ behaviours. 

The decisions lay in the hands of the GBAs’ publishers. Even though different games are intended to measure the same traits, the effort that has been invested might differ, leading to a discrepancy in quality and reliability. Just like assigning homework to high school kids: some of them completed the tasks in a very detailed way and some of them just wanted to get the task done. Thus, it is important to look into the GBAs’ development and validation process to determine the quality of a GBA. 

Develop a theory-driven g-GBA

According to the CHC theory and what we discussed above, we know that g serves as the shared variance across all the cognitive-loaded tests and a four-factor structure could be an evidence-based foundation for designing a g-GBA. 

Embretson (1994) provided a guideline while constructing ability tests with cognitive theory, which includes a) matching certain behaviours to the underlying cognitive theory behind the construct that is being measured; and b) defining which relationships are expected between behaviours to prove the first step was successful (i.e., the pattern of (in)significant relations among scales that measure the same or different constructs; Landers et al., 2021).

Within the creation of a g-GBA, one should also take into account specific item design features (e.g., amount of information shown to the test-takers; types of rules etc.; Primi, 2014; Landers et al., 2021). This can provide a strong conceptual framework for designing a g-GBA since the same techniques are used in the creation of general cognitive tests, but instead, within the context of game design.

An example: validating our targeted construct — problem-solving ability

There are plenty of ways to validate a g-GBA. One popular method is comparing it to an existing assessment with rich theoretical foundations which has been proven to successfully measure the desired construct. In this section, we will show an example of how we selected an assessment to compare with, when we wanted to assess problem-solving ability using a GBA.

Executive functioning (EF) is another school of cognitive theory explaining multiple higher-order cognitive control processes that allow us to handle daily tasks and solve problems (van Aken et al., 2019). Some of the EFs are planning, attention, emotion regulation, inhibitory control, working memory and more. These EFs have been extensively studied and shown to be correlated with fluid intelligence which includes reasoning ability, problem-solving ability and more (e.g., Diamond 2013, Saggino et al., 2006; Santarnecchi et al., 2021). 

With this strong theoretical background, we can then choose an assessment with solid psychometric properties that measures the same targeted construct with our g-GBA. One of the options is the Tower of London test (ToL; Shallice, 1982), which primarily assesses planning ability and problem-solving ability. It shows us one’s capability to plan and execute strategies toward external approaching tasks or problems (Phillips et al., 2001). On top of that, it has been widely validated and used in clinical practice to assess planning skills in problem-solving processes (Köstering et al., 2015; Unterrainer et al., 2004; Paula et al., 2012). 

Thus, if the scores from our g-GBA which aims at evaluating problem-solving ability are associated with the results from the ToL test, then we can confidently conclude that our GBA and ToL test are measuring the same construct (planning and problem-solving ability). 

All in all, we suggest reading through the validation reports of a g-GBA to get more insight into their reliability and construct validity, making sure that the GBA you’re using is reliable. 

Humans are very complex beings, and so is our intelligence..

Despite the long history of intelligence research, the ways of defining, measuring and interpreting them are still on their ways of improving alongside the technologies. Thus, we should keep in mind the potential downside of each way of testing and keep on evaluating our hiring tools from time to time. 

Happy hiring! 🙂


Canivez, G. L., Watkins, M. W., & Dombrowski, S. C. (2017). Structural validity of the Wechsler Intelligence Scale for Children–Fifth Edition: Confirmatory factor analyses with the 16 primary and secondary subtests. Psychological Assessment, 29, 458. 

Climie, E. A., & Rostad, K. (2011). Test review: Wechsler adult intelligence scale. 

Diamond, A. (2013). Executive functions. Annual review of psychology, 64, 135. 

Dombrowski, S. C., McGill, R. J., & Canivez, G. L. (2018). An alternative conceptualization of the theoretical structure of the Woodcock-Johnson IV Tests of Cognitive Abilities at school age: A confirmatory factor analytic investigation. Archives of Scientific Psychology, 6, 1. 

Embretson, S. (1994). Applications of cognitive design systems to test development. In Cognitive assessment (pp. 107-135). Springer, Boston, MA. 

Köstering, L., Schmidt, C. S., Egger, K., Amtage, F., Peter, J., Klöppel, S., … & Kaller, C. P. (2015). Assessment of planning performance in clinical samples: Reliability and validity of the Tower of London task (TOL-F). Neuropsychologia, 75, 646-655. 

Kuncel, N. R., Rose, M., Ejiogu, K., & Yang, Z. (2014). Cognitive ability and socio-economic status relations with job performance. Intelligence, 46, 203-208. 

Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2004). Academic performance, career potential, creativity, and job performance: Can one construct predict them all?. Journal of personality and social psychology, 86, 148.  

Landers, R. N., Armstrong, M. B., Collmus, A. B., Mujcic, S., & Blaik, J. (2021). Theory-driven game-based assessment of general cognitive ability: Design theory, measurement, prediction of performance, and test fairness. Journal of Applied Psychology. 

McGill, R. J., Dombrowski, S. C., & Canivez, G. L. (2018). Cognitive profile analysis in school psychology: History, issues, and continued concerns. Journal of school psychology, 71, 108-121. 

Paula, J. J. D., Neves, F., Levy, Â., Nassif, E., & Malloy-Diniz, L. F. (2012). Assessing planning skills and executive functions in the elderly: preliminary normative data for the Tower of London Test. Arquivos de neuro-psiquiatria, 70, 828-830. 

Phillips, L. H., Wynn, V. E., McPherson, S., & Gilhooly, K. J. (2001). Mental planning and the Tower of London task. The Quarterly Journal of Experimental Psychology Section A, 54, 579-597. 

Primi, R. (2014). Developing a fluid intelligence scale through a combination of Rasch modeling and cognitive psychology. Psychological assessment, 26, 774. 

Saggino, A., Perfetti, B., Spitoni, G., & Galati, G. (2006). Fluid Intelligence and Executive Functions: New Perspectives. In L. V. Wesley (Ed.), Intelligence: New research (pp. 1–22). Nova Science Publishers.

Santarnecchi, E., Momi, D., Mencarelli, L., Plessow, F., Saxena, S., Rossi, S., … & Pascual-Leone, A. (2021). Overlapping and dissociable brain activations for fluid intelligence and executive functions. Cognitive, Affective, & Behavioral Neuroscience, 21, 327-346. 

Schmidt, F. L., Oh, I. S., & Shaffer, J. A. (2016). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 100 years. Fox School of Business Research Paper, 1-74. 

Schmidt, F. L., & Hunter, J. (2004). General mental ability in the world of work: occupational attainment and job performance. Journal of personality and social psychology, 86, 162. 

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological bulletin, 124, 262.  

Schneider, W. J., & McGrew, K. S. (2012). The Cattell-Horn-Carroll model of intelligence. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 99–144). The Guilford Press.

Shallice, T. (1982). Specific impairments of planning. Philosophical Transactions Of The Royal Society Of London. B, Biological Sciences, 298, 199-209. 

Unterrainer, J. M., Rahm, B., Kaller, C. P., Leonhart, R., Quiske, K., Hoppe-Seyler, K., … & Halsband, U. (2004). Planning abilities and the Tower of London: is this task measuring a discrete cognitive function?. Journal of clinical and experimental neuropsychology, 26, 846-856.  

van Aken, L., van der Heijden, P. T., Oomens, W., Kessels, R. P., & Egger, J. I. (2019). Predictive value of traditional measures of executive function on broad abilities of the Cattell–Horn–Carroll theory of cognitive abilities. Assessment, 26, 1375-1385. 

Wechsler, D. (2008). WAIS-IV administration and scoring manual. San Antonio, TX: Psychological Corporation. 

Our inspirational blogs, podcasts and video’s

Listen to what they say about our product offering right here