In today’s workplace, many companies operate in team-based settings, requiring employees to collaborate with one or more colleagues. This highlights the importance of interpersonal skills, which means possessing only hard skills (job-specific knowledge) may no longer guarantee success in a job.
One popular approach is to incorporate personality tests into the recruitment process to gauge future behaviour. However, most existing personality tests focus on assessing personality in a broad sense and may not adequately evaluate behaviour in specific job contexts. Consequently, it’s recommended to go beyond general personality assessments and consider behaviour that arises from contemporary personality traits activated by job-related circumstances (Lievens, 2017).
In this article, we will introduce a method for more accurately evaluating candidates’ behaviour — situational judgement tests (SJTs). We’ll delve into the following topics:
- Introduction to situational judgement tests
- Traditional SJTs vs Construct-driven SJTs
- The relationship between SJTs and job performance
Introduction to Situational Judgement Tests (SJTs)
What is a situational judgement test?
You might have already guessed it from its name.
Situational judgement tests involve presenting a series of scenarios or problems along with a list of possible responses. Test takers are then asked to assess what they should do or would do in response to these portrayed scenarios (Whetzel et al., 2020).
Test developers generally have considerable freedom in choosing the formats of SJTs. For instance, they can decide on the response format to employ. The three most common approaches are: a) asking test takers to select either the best or worst response options; b) ranking the response options from most to least effective; or c) rating the response options on scales, such as 1-5 (Corstjens et al., 2017).
Moreover, advancements in technology have expanded the options for administering SJTs. In addition to the traditional paper-and-pencil format, video-based versions (sometimes referred to as gamified versions) have recently emerged as a medium for administering SJTs in various selection settings (e.g., for police officers, De Meijer et al., 2010; for medical school applicants, Fröhlich et al., 2017).
Figure 1: An example of the questions from Dependability SJTs from Olaru and colleagues (2019).
The scientific basis of SJTs
In contrast to personality tests, which often have broad contexts, Situational Judgement Tests (SJTs) aim to assess behavioural skills within simulations of targeted contexts. These simulations are designed to be psychologically or physically similar to key aspects of the future working environment (Wernimont & Campbell, 1968; Schmitt & Ostroff, 1986; Lievens & De Soete, 2012). The underlying assumption is that candidates’ performance in these simulations will be consistent with their performance in future job positions (behavioural consistency).
For instance, if we’re hiring a Customer Success Manager, a practical method to evaluate their skills is through a simulation that describes a scenario involving customer interactions. Specifically, the candidate may be asked to work together with other candidates to solve a customer problem or be tested individually on how they would handle a customer problem. The behaviour observed in this simulation is likely to mirror how they will handle customers in the actual job role.
Approaches based on behavioural consistency can be broadly categorised into high-fidelity and low-fidelity methods depending on their similarity to targeted contexts (Lievens & Patterson, 2011).
- High-fidelity tests provide candidates with a platform to demonstrate their knowledge and skills by exhibiting actual behaviour in job contexts. Assessment centres (a combination of activities to test one’s suitability for a job) or work sample tests (job tryout day or case study), for example, closely resemble future work environments. Candidates might be invited to the company for a job try-out day, where managers evaluate their performance in real-time.
- Low-fidelity approaches assess candidates’ procedural knowledge of effective and ineffective courses of action in job-related situations. SJTs fall into this category, testing participants’ responses to various situations encountered in daily work life. For instance, candidates might be asked to evaluate the effectiveness of being rational or empathetic when dealing with a dissatisfied customer.
Assessing candidates’ behavioural tendencies or skills in specific work situations, rather than relying solely on personality, is crucial. These work situations determine whether candidates have the opportunity to “activate” their innate traits and express them through distinct behaviour triggered by specific social cues (trait-activation theory; Tett et al., 2021).
For example, someone with high creativity might only express this trait when performing tasks that require creativity, like designing a training program or writing a LinkedIn post, but not in more general tasks like administrative duties or organising customers’ data.
In essence, the more relevant the simulation is to the actual work environment, the more likely it is to provide the same situational cues. The test thereby increases the likelihood of observing behaviour during the test that will also manifest in the future work environment.
So, which to choose for recruitment processes — High or low-fidelity?
I’d say it largely depends on your recruiting budget, available resources, and the nature of the job openings. There’s a significant difference in the costs associated with developing and implementing high- versus low-fidelity recruiting methods.
High-fidelity methods, like assessment centres, offer a closer resemblance to the actual job environment but typically require more resources. This involves engaging multiple hiring managers or experts simultaneously to evaluate candidates’ behaviour, tailoring test content to different roles, and managing the risk of administrative bias. While popular for advanced-level positions, they might not be practical for entry-level positions due to the low return on investment in assessing required skills (Lievens & Patterson, 2011).
On the flip side, low-fidelity methods, such as SJTs, provide a more cost-effective and efficient approach that yields similar results without compromising validity (Christian et al., 2010). They can be standardised and easily distributed, allowing them to be implemented for all candidates across different roles with the same objectives. While effectively assessing procedural knowledge for advanced-level positions, they are also suitable for entry-level roles where candidates may lack opportunities to demonstrate their abilities through past work achievements.
In short, SJTs offer a more cost-effective solution that can be applied across various roles and seniority levels while maintaining promising validity in predicting future work performance (e.g., Lievens & Patterson, 2011; Chan & Schmitt, 2002).
Traditional SJTs vs Construct-driven SJTs
Traditional SJTs
Traditional SJTs are typically developed by generating potential workplace scenarios with input from subject matter experts (SMEs), who are professionals in the relevant fields. These scenarios are followed by a range of behavioural responses across different skill levels, gathered from example behavioural responses SMEs and novices to the workplace. Scoring algorithms are then determined by another group of SMEs (Tiffin et al., 2020).
Unlike cognitive abilities tests or personality tests that aimed at measuring cognitive abilities and personalities, traditional SJTs often failed to specify which kinds of behaviour or ability they aimed to measure (a.k.a. constructs). Particularly, traditional SJTs are generally claimed to measure behaviour or skills that are needed to succeed in workplaces but what exactly is that behaviour?
That is also why even though they may demonstrate sufficient criterion-related validity (stay tuned for the next section), they are often criticised for lacking construct-related validity.
But why does construct-related validity matter in assessment?
To ensure fairness, candidates with similar abilities should achieve similar scores regardless of the test format they are assigned. For example, individuals with the same level of creativity should score similarly on both online creativity questionnaires and hands-on creativity assessments.
Hence, it is crucial to know what constructs are the test aimed to measure so that the test developers could compare the scores to tests measuring similar constructs.
Without clarity on what is being measured, comparing traditional SJTs to similar tests becomes extremely difficult.
Solutions: Construct-driven SJTs
Differing from traditional SJTs, construct-driven SJTs are more selective in the situations they include. In the development process, test developers first define the constructs they intend to measure based on existing theories. One advantage of predetermining constructs is the flexibility to select traits proven to be highly predictive of future job performance or behavioural skills highly required for specific job roles. For instance, test developers could have chosen to develop a SJTs assessing teamwork specifically since the ability to work effectively in a team is highly required for nowadays workplace.
The next step involves psychologists screening situations highly related to the constructs, acting as “gatekeepers.” For instance, when SJTs aim to measure resilience, psychologists carefully select situations that might induce various degrees of interpersonal responses involving resilience. One of the example situations could be where they received criticism from a colleague they are not quite getting along with.
Following this, behavioural responses are generated to represent different degrees of the desired constructs based on existing theory. For instance, responses for 4 degrees of resilience in the “receiving criticism“ scenario could be a) Having a deep breath to calm oneself down; b) Having a chat with colleagues to release some stress; c) Having an attempt to try to convince ownself about the bright sides; and d) Having a workout to release some stress. Test takers will have to choose which approach they usually implement in real life situations, and their level of resilience can be determined based on the responses they have chosen.
Apart from being easier to equate to different test forms, emerging evidence suggests that construct-driven SJTs are more valid predictors of specific job performance, which we will discuss in the next section.
How good are SJTs at predicting job performance?
As mentioned earlier, SJTs are often described in terms of methods, leading to mixed results in academic research due to the lack of clear indication of what SJTs actually measure. Thus, we will present current evidence from research on how well SJTs predict job performance through two approaches.
From analyses about various formats of general SJTs
We begin by presenting results from meta-analyses that encompass all types of SJTs. For example, Webster and colleagues (2020) reported a pooled correlation between SJTs and job performance of 0.32 across 26 studies in the medical field. Similarly, McDaniel and colleagues (2001) conducted a meta-analysis of 102 correlation coefficients from published studies up to the year 2000 and estimated the population validity of SJTs to be 0.34 (ranging from 0.21 to 0.41).
However, in another attempt to investigate the validity of SJTs, McDaniel and colleagues (2007) found an estimated population correlation of 0.26, somewhat lower than the previously reported coefficient of 0.34. Similarly, in their research on all selection methods, Schmidt and colleagues (2016) also reported a correlation of 0.26 for SJTs. Both studies demonstrated that SJTs provided incremental value (explaining an additional 2% variance) in predicting job performance beyond cognitive ability measures.
Another effort was made to compare the effectiveness of methods with different fidelity (the degree of similarity to targeted contexts). Specifically, Lievens & Patterson (2011) found a significantly higher correlation between SJTs and knowledge tests (r = 0.50) compared to the correlation between assessment centres and knowledge tests (r = 0.30). Additionally, SJTs had a higher correlation with overall job performance (0.37) compared to the correlation between assessment centres and overall job performance (r = 0.30).
You and I must acknowledge that job performance is complex and may have different dimensions. Fortunately, we are not the first to consider this. For example, Chan & Schmitt (2002) demonstrated that SJTs have substantial validity in predicting various dimensions of job performance and measure a stable attribute not associated with job experiences.
- Task performance (r=0.30): Performance on fundamental job-specific responsibilities, e.g., day-to-day work tasks.
- Contextual performance: Performance on voluntary employee’s action that will benefit the company, including job dedication (e.g., persevering through long hours or detailed tasks and striving to achieve high performance, r=0.38) and interpersonal facilitation (e.g., working together with others and resolving interpersonal conflicts, r=0.27)
- Overall performance (r=0.30): Related to how well a person performs on their job in general, usually be evaluated by upper management.
From analyses about construct-driven SJTs
Although there is less research investigating construct-driven SJTs and job performance, there is promising evidence suggesting that construct-driven SJTs may have the ability to predict closely-related constructs (Tiffin et al., 2020). For instance, SJTs assessing teamwork skills could predict teamwork behaviour in the actual future work environment.
Christian and colleagues (2010) demonstrated that SJTs assessing leadership (r=0.29) and interpersonal skills (r=0.26) were significantly more associated with managerial performance than traditional SJTs (r=0.12). Furthermore, SJTs assessing leadership skills (r=0.24), interpersonal skills (r=0.21), and teamwork skills (r=0.35) showed higher correlations (though not significantly) for contextual performance compared to traditional SJTs.
Additionally, Bledow and Frese (2009) found that scores from an SJT evaluating personal initiative had a strong correlation of 0.48 with supervisors’ ratings of individual initiative and overall performance.
In the context of medical school applicant admission, Mielke and Colleagues (2022) demonstrated that SJTs assessing communal skills (one aspect of social skills) are associated with self-reported communal personality and past communal behaviour.
Lastly, for assessing teachers’ competencies, Koschmieder and Neubauer (2021) found that SJTs assessing interpersonal emotions regulation could predict higher selfless professional motives, while intrapersonal emotion regulation could predict higher teacher self-efficacy.
Overall, these studies provide prospective evidence for newly emerging construct-driven SJTs in predicting future job performance.
To recap
Situational judgement tests are designed to assess behavioural skills within simulations that mirror future work environments. They operate on the premise that behaviour remains consistent across similar situations. Additionally, SJTs offer a cost-effective method for evaluating candidates based on behaviour consistency theory without compromising validity. Construct-driven SJTs represent a newer variation of traditional SJTs aimed at addressing criticisms surrounding the lack of construct validity. Numerous studies have yielded promising results confirming the validity and predictive power of both traditional SJTs and construct-driven SJTs. Happy hiring! 🙂
References
Bledow, R., & Frese, M. (2009). A situational judgment test of personal initiative and its relationship to performance. Personnel Psychology, 62(2), 229-258. https://doi.org/10.1111/j.1744-6570.2009.01137.x
Chan, D., & Schmitt, N. (2002). Situational judgment and job performance. Human Performance, 15(3), 233-254. https://doi.org/10.1207/S15327043HUP1503_01
Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational judgment tests: Constructs assessed and a meta‐analysis of their criterion‐related validities. Personnel Psychology, 63(1), 83-117. https://doi.org/10.1111/j.1744-6570.2009.01163.x
Corstjens, J., Lievens, F., & Krumm, S. (2017). Situational judgement tests for selection. The Wiley Blackwell handbook of the psychology of recruitment, selection and employee retention, 226-246. https://doi.org/10.1002/9781118972472.ch11
De Meijer, L. A., Born, M. P., Van Zielst, J., & Van Der Molen, H. T. (2010). Construct-driven development of a video-based situational judgment test for integrity. European Psychologist. https://doi.org/10.1027/1016-9040/a000027
Fröhlich, M., Kahmann, J., & Kadmon, M. (2017). Development and psychometric examination of a German video‐based situational judgment test for social competencies in medical school applicants. International Journal of Selection and Assessment, 25(1), 94-110. https://doi.org/10.1111/ijsa.12163
Koschmieder, C., & Neubauer, A. C. (2021). Measuring emotion regulation for preservice teacher selection: A theory-driven development of a situational judgment test. Personality and Individual Differences, 168, 110363. https://doi.org/10.1016/j.paid.2020.110363
Lievens, F. (2017). Assessing personality–situation interplay in personnel selection: Toward more integration into personality research. European Journal of Personality, 31(5), 424-440. https://doi.org/10.1002/per.2111
Lievens, F., & De Soete, B. (2012). Simulations. In N. Schmitt, N. Schmitt (Eds.), The Oxford handbook of personnel assessment and selection (pp. 383-410). New York, NY, US: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199732579.013.0017
Lievens, F., & Patterson, F. (2011). The validity and incremental validity of knowledge tests, low-fidelity simulations, and high-fidelity simulations for predicting job performance in advanced-level high-stakes selection. Journal of Applied Psychology, 96(5), 927. https://doi.org/10.1037/a0023496
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & GRUBB III, W. L. (2007). Situational judgment tests, response instructions, and validity: A meta‐analysis. Personnel psychology, 60(1), 63-91. https://doi.org/10.1111/j.1744-6570.2007.00065.x
McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M. A., & Braverman, E. P. (2001). Use of situational judgment tests to predict job performance: a clarification of the literature. Journal of Applied Psychology, 86(4), 730. https://doi.org/10.1037/0021-9010.86.4.730
Mielke, I., Breil, S. M., Amelung, D., Espe, L., & Knorr, M. (2022). Assessing distinguishable social skills in medical admission: does construct-driven development solve validity issues of situational judgment tests?. BMC Medical Education, 22(1), 293. https://doi.org/10.1186/s12909-022-03305-x
Olaru, G., Burrus, J., MacCann, C., Zaromb, F. M., Wilhelm, O., & Roberts, R. D. (2019). Situational judgment tests as a method for measuring personality: Development and validity evidence for a test of dependability. PloS one, 14(2), e0211884. https://doi.org/10.1371/journal.pone.0211884
Tett, R. P., Toich, M. J., & Ozkum, S. B. (2021). Trait activation theory: A review of the literature and applications to five lines of personality dynamics research. Annual Review of Organizational Psychology and Organizational Behavior, 8, 199-233. https://doi.org/10.1146/annurev-orgpsych-012420-062228
Tiffin, P. A., Paton, L. W., O’Mara, D., MacCann, C., & WB, J. (2020) The cross-cutting edge: situational judgment tests for selection: traditional versus construct-driven approaches (Doctoral dissertation, School of Business, Singapore Management University, Singapore).
Schmidt, F. L., Oh, I. S., & Shaffer, J. A. (2016). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 100 years. Fox School of Business Research Paper, 1-74.
Schmitt, N., & Ostroff, C. (1986). Operationalizing the “behavioral consistency” approach: Selection test development based on a content‐oriented strategy. Personnel Psychology, 39(1), 91-108. https://doi.org/10.1111/j.1744-6570.1986.tb00576.x
Webster, E. S., Paton, L. W., Crampton, P. E., & Tiffin, P. A. (2020). Situational judgement test validity for selection: A systematic review and meta‐analysis. Medical Education, 54(10), 888-902. https://doi.org/10.1111/medu.14201
Wernimont, P. F., & Campbell, J. P. (1968). Signs, samples, and criteria. Journal of Applied Psychology, 52(5), 372. https://doi.org/10.1037/h0026244
Whetzel, D. L., Sullivan, T. S., & McCloy, R. A. (2020). Situational judgment tests: An overview of development practices and psychometric characteristics. Personnel Assessment and Decisions, 6(1), 1. https://doi.org/10.25035/pad.2020.01.001