Welcome

Overview

This half-day tutorial provides participants with a framework and hands-on experience for conducting controlled experiments on model search behaviour using an open-source toolkit. Participants will learn how to design, run, and analyse experiments to investigate behavioural differences across large language models. By integrating techniques from human user studies with LLM experimentation, the tutorial strengthens CHIIR's methodological foundations and broadens its scope to include behavioural analysis of generative and agentic systems.

Target Audience and Learning Outcomes

Target Audience

Students, researchers, and practitioners who are familiar with LLMs but have limited experience with user studies and behavioural analysis.

Students, researchers, and practitioners who have a background in behavioural analysis or user studies but are less familiar with LLMs and retrieval-augmented applications.

Learning Outcomes

Conceptual understanding: Explain the notion of model search behaviour and its relationship to traditional user-centred studies in IR.

Experimental design: Design and execute controlled experiments with generative models, defining variables, tasks, and evaluation measures suitable for model search behaviour.

Analytical skills: Apply analytical tools to extract behavioural statistics, compare model outputs, and conduct significance testing on experimental results.

Reproducibility and community contribution: Use an open-source toolkit to design, share, and reproduce experiments and analyses for transparency and open science.

Future research directions: Identify opportunities for applying these methods to emerging CHIIR research.

Syllabus

Session Foundations and First Experiments (90 min)

Introduction and motivation: model search behaviour and its relevance to IR

Conceptual foundations: Behavioural analysis in IR and controlled experiment with test collections

Overview of geniie-lab: architecture, setup, and supported experiment types

Guided hands-on exercise: running a simple experiment (e.g., comparing query formulation across models)

Group discussion: interpreting and comparing initial results

Session Advanced Hands-On Experiments and Analysis (90 min)

Advanced hands-on exercise: multi-stage experiments involving click simulation, relevance judgement, and query reformulation

Using analytical tools: extracting behavioural statistics, computing descriptive measures, and applying significance tests

Case studies: examples of model search behaviour research in practice

Advanced topics: reproducibility study and agentic information retrieval experiment

Wrap-up: open discussion, Q&A, and identifying future directions for the community

Tutorial Material

Coming Soon