CLEF 2025 Workshop
Madrid, 9-12 September 2025
Find Out MoreeRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles. Examples include potential paedophiles, stalkers, individuals that could fall into the hands of criminal organisations, people with suicidal inclinations, or people susceptible to depression.
This is the ninth year of eRisk and the lab plans to organize three tasks:
This is a continuation of eRisk 2024's Task 1.
The task consists of ranking sentences from a collection of user writings according to their relevance to a depression symptom. The participants will have to provide rankings for the 21 symptoms of depression from the BDI Questionnaire. A sentence is deemed relevant if it provides information about the user's condition regarding a particular symptom. That is, it may be relevant even when it indicates that the user is okay with the symptom.
We will release a TREC-formatted sentence-tagged dataset (based on eRisk past data) together with the BDI questionnaire. Participants are free to decide on the best strategy to derive queries from describing the BDI symptoms in the questionnaire.
After receiving the runs from the participating teams, we will create the relevance judgments with the help of human assessors using pooling. The resulting qrels will be used to evaluate the systems with classical ranking metrics (e.g., MAP, nDCG, etc.). This new corpus with annotated sentences would be a valuable resource with multiple applications beyond eRisk.
The task is organized into two different stages:
symptom_number Q0 sentence-id position_in_ranking score system_name
An example of the format of your runs should be as follows:
1 Q0 sentence-id-121 0001 10 myGroupNameMyMethodName 1 Q0 sentence-id-234 0002 9.5 myGroupNameMyMethodName 1 Q0 sentence-id-345 0003 9 myGroupNameMyMethodName ... 21 Q0 sentence-id-456 0998 1.25 myGroupNameMyMethodName 21 Q0 sentence-id-242 0999 1 myGroupNameMyMethodName 21 Q0 sentence-id-347 1000 0.9 myGroupNameMyMethodName
Participants should submit up to 1000 results sorted by estimated relevance for each of the 21 symptoms of the BDI-II questionnaire. Each line contains: symptom_number, Q0, sentence-id, position_in_ranking, score, system_name.
The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.
To have access to the collection, all participants have to fill, sign, and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2025 Labs Registration site.
Important DatesNew task introduced in eRisk 2025.
This new task focuses on detecting early signs of depression by analyzing full conversational contexts. Unlike previous tasks that focused on isolated user posts, this challenge considers the broader dynamics of interactions by incorporating writings from all individuals involved in the conversation. Participants must process user interactions sequentially, analyze natural dialogues, and detect signs of depression within these rich contexts. Texts will be processed chronologically to simulate real-world conditions, making the task applicable to monitoring user interactions in blogs, social networks, or other types of online media.
The test collection for this task follows the format described in Losada & Crestani, 2016 and is derived from the same data sources as previous eRisk tasks. The dataset includes:
There are two categories of users: individuals suffering depression and control users. For each user, the collection contains a sequence of writings from that specific user along with the rest of the users that participated in the conversation (in chronological order). This approach allows systems to monitor ongoing interactions and make timely decisions based on the evolution of the conversation.
The task is organized into two different stages:
Participants have to:
More information on connecting to the eRisk server is provided here.
Evaluation: The evaluation will consider not only the correctness of the system's output (i.e., whether or not the user is depressed) but also the delay taken to emit its decision. To meet this aim, we will consider the ERDE metric proposed in Losada & Crestani, 2016 and other alternative evaluation measures. A full description of the evaluation metrics can be found in 2021's eRisk overview.
The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.
To have access to the collection, all participants must fill, sign, and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2025 Labs Registration site.
Important DatesThis pilot task introduces a unique challenge: detecting depression through conversational agents. Participants will interact with a large language model (LLM) persona that has been fine-tuned using user writings, simulating real-world conversational exchanges. The challenge lies in determining whether the LLM persona exhibits signs of depression, accompanied by an explanation of the main symptoms that informed their decision. This task pushes participants to develop more interactive, dynamic models that can engage with users and assess their mental state through dialogue.
Key Details:
Evaluation: **Coming soon**