Can minigames and open-ended questions replace personality questionnaires? A master's thesis exploring gamified personality assessment through behavioural data and LLM analysis.
Personality assessment has relied on questionnaires for decades. They work reasonably well, but they have real problems: people answer based on how they see themselves rather than how they actually behave, long surveys lead to fatigue and careless answers, and the format itself can push people toward socially acceptable responses.
This thesis asks a simple question: what if we assessed personality through what people do instead of what they say about themselves?
The answer took the form of three minigames and three open-ended questions, all targeting the Big Five personality trait of conscientiousness. 51 participants completed the full experiment, which combined the custom-built tool with a traditional personality assessment (the IPIP-NEO-120) for comparison. Behavioural data was collected silently during the minigames, and open-ended answers were analysed using Claude as an LLM.
The short answer: the tool is not yet ready to replace traditional assessments. But the correlations found along the way are interesting and could lead to further serious game development of personality assessment.
The experiment had three parts. First, participants answered three open-ended questions framed as everyday scenarios, such as what they would do if money disappeared from their bank account, or whether they would still go for a walk if it might rain. The scenarios were designed to draw out behaviours tied to specific facets of conscientiousness without participants realising they were being assessed for it.
Next, they played three minigames, each designed around a different facet of conscientiousness. The minigames collected behavioural data in the background: click timestamps, hover durations, paths taken, books sorted, buttons pressed. None of this was visible to the participant.
Finally, they completed a standard 32-item conscientiousness questionnaire. This was the baseline everything else was compared against.
The open-ended answers were run through Claude, which scored each response across content, writing style, and terminology. The minigame data was processed with Python and Excel to extract behavioural variables. Then everything was correlated against the traditional assessment results.
The tool built in this study is not yet reliable enough to replace traditional personality assessments. But that was somewhat expected for a first attempt. What it does show is that behaviour in minigames and the way people write carry real signals about personality, signals that questionnaires would never capture because they don't ask the right questions.
The most promising direction from here is refining the minigames, especially improving the theoretical link between specific game mechanics and the personality facets they are meant to measure. The LLM analysis also has room to improve with better prompting and more targeted questions.
The full thesis, data, code, and pre-registration are all available on the OSF page.