Welcome
Overview
This half-day tutorial provides participants with a framework and hands-on experience for conducting controlled experiments on model search behaviour using an open-source toolkit. Participants will learn how to design, run, and analyse experiments to investigate behavioural differences across large language models. By integrating techniques from human user studies with LLM experimentation, the tutorial strengthens CHIIR's methodological foundations and broadens its scope to include behavioural analysis of generative and agentic systems.
Target Audience and Learning Outcomes
Target Audience
Students, researchers, and practitioners who are familiar with LLMs but have limited experience with user studies and behavioural analysis.
Students, researchers, and practitioners who have a background in behavioural analysis or user studies but are less familiar with LLMs and retrieval-augmented applications.
Learning Outcomes
Conceptual understanding: Explain the notion of model search behaviour and its relationship to traditional user-centred studies in IR.
Experimental design: Design and execute controlled experiments with generative models, defining variables, tasks, and evaluation measures suitable for model search behaviour.
Analytical skills: Apply analytical tools to extract behavioural statistics, compare model outputs, and conduct significance testing on experimental results.
Reproducibility and community contribution: Use an open-source toolkit to design, share, and reproduce experiments and analyses for transparency and open science.
Future research directions: Identify opportunities for applying these methods to emerging CHIIR research.
Syllabus
Session Foundations and First Experiments (90 min)
Introduction and motivation: model search behaviour and its relevance to IR
Conceptual foundations: Behavioural analysis in IR and controlled experiment with test collections
Overview of geniie-lab: architecture, setup, and supported experiment types
Guided hands-on exercise: running a simple experiment (e.g., comparing query formulation across models)
Group discussion: interpreting and comparing initial results
Session Advanced Hands-On Experiments and Analysis (90 min)
Advanced hands-on exercise: multi-stage experiments involving click simulation, relevance judgement, and query reformulation
Using analytical tools: extracting behavioural statistics, computing descriptive measures, and applying significance tests
Case studies: examples of model search behaviour research in practice
Advanced topics: reproducibility study and agentic information retrieval experiment
Wrap-up: open discussion, Q&A, and identifying future directions for the community