eRisk 2020:

Early risk prediction on the Internet

CLEF 2020 Workshop

Thessaloniki, 22-25 September 2020

Find Out More

CLEF eRisk 2020:

Early risk prediction on the Internet

eRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles. Examples include potential paedophiles, stalkers, individuals that could fall into the hands of criminal organisations, people with suicidal inclinations, or people susceptible to depression.


This is the fourth year of eRisk and the lab plans to organize two tasks:

Task 1: Early Detection of Signs of Self-Harm

This is a continuation of eRisk 2019's T2 task.

The challenge consists in performing a task on early risk detection of self-harm. The challenge consists of sequentially processing pieces of evidence and detect early traces of self-harm as soon as possible. The task is mainly concerned about evaluating Text Mining solutions and, thus, it concentrates on texts written in Social Media. Texts should be processed in the order they were created. In this way, systems that effectively perform this task could be applied to sequentially monitor user interactions in blogs, social networks, or other types of online media.

The test collection for this task has the same format as the collection described in [Losada & Crestani 2016]. The source of data is also the same used for previous eRisks. It is a collection of writings (posts or comments) from a set of Social Media users. There are two categories of users, self-harm and non-self-harm, and, for each user, the collection contains a sequence of writings (in chronological order).

In 2019, we moved from a chunk-based release of data (used in 2017 and 2018) to a item-by-item release of data. We set up a server that iteratively gives user writings to the participating teams. More information about the server is given here. In 2020, the server will be used to provide the users' writings during the test stage.

The task is organized into two different stages:

  • Training stage. Initially, the teams that participate in this task will have access to a training stage where we will release the whole history of writings for a set of training users (we will provide all writings of all training users), and we will indicate what users have explicitly mentioned that they have done self-harm. The participants can therefore tune their systems with the training data. In 2020, the training data for Task 1 is composed of all 2019's T2 users (T2 2019 training users + T2 2019 test users)
  • Test stage. The test stage will consist of a period of time where the participants have to connect to our server and iteratively get user writings and send responses. More information on the eRisk server that will be used at test time is available here.

Evaluation: The evaluation will take into account not only the correctness of the system's output (i.e. whether or not the user is depressed) but also the delay taken to emit its decision. To meet this aim, we will consider the ERDE metric proposed in [Losada & Crestani 2016] and other alternative evaluation measures. A full description of the evaluation metrics can be found at 2019's erisk overview.

The proceedings of the lab will be published in the online CEUR-WS Proceedings and on the conference website.

To have access to the collection all participants have to fill, sign and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2020 Labs Registration site

Important Dates

Task 2: Measuring the severity of the signs of depression

This is a continuation of eRisk 2019's T3 task. The task consists of estimating the level of depression from a thread of user submissions. For each user, the participants will be given a history of postings and the participants will have to fill a standard depression questionnaire (based on the evidence found in the history of postings).

The questionnaires are defined from Beck's Depression Inventory (BDI), which assesses the presence of feelings like sadness, pessimism, loss of energy, etc. The questionnaire has the following 21 questions:


This questionnaire consists of 21 groups of statements. Please read each group of statements
carefully, and then pick out the one statement in each group that best describes the way you feel.
If several statements in the group seem to apply equally well, choose the highest
number for that group.

1. Sadness
0. I do not feel sad.
1. I feel sad much of the time.
2. I am sad all the time.
3. I am so sad or unhappy that I can't stand it.

2. Pessimism
0. I am not discouraged about my future.
1. I feel more discouraged about my future than I used to be.
2. I do not expect things to work out for me.
3. I feel my future is hopeless and will only get worse.

3. Past Failure
0. I do not feel like a failure.
1. I have failed more than I should have.
2. As I look back, I see a lot of failures.
3. I feel I am a total failure as a person.

4. Loss of Pleasure
0. I get as much pleasure as I ever did from the things I enjoy.
1. I don't enjoy things as much as I used to.
2. I get very little pleasure from the things I used to enjoy.
3. I can't get any pleasure from the things I used to enjoy.

5. Guilty Feelings
0. I don't feel particularly guilty.
1. I feel guilty over many things I have done or should have done.
2. I feel quite guilty most of the time.
3. I feel guilty all of the time.

6. Punishment Feelings
0. I don't feel I am being punished.
1. I feel I may be punished.
2. I expect to be punished.
3. I feel I am being punished.

7. Self-Dislike
0. I feel the same about myself as ever.
1. I have lost confidence in myself.
2. I am disappointed in myself.
3. I dislike myself.

8. Self-Criticalness
0. I don't criticize or blame myself more than usual.
1. I am more critical of myself than I used to be.
2. I criticize myself for all of my faults.
3. I blame myself for everything bad that happens.

9. Suicidal Thoughts or Wishes
0. I don't have any thoughts of killing myself.
1. I have thoughts of killing myself, but I would not carry them out.
2. I would like to kill myself.
3. I would kill myself if I had the chance.

10. Crying
0. I don't cry anymore than I used to.
1. I cry more than I used to.
2. I cry over every little thing.
3. I feel like crying, but I can't.

11. Agitation
0. I am no more restless or wound up than usual.
1. I feel more restless or wound up than usual.
2. I am so restless or agitated that it's hard to stay still.
3. I am so restless or agitated that I have to keep moving or doing something.

12. Loss of Interest
0. I have not lost interest in other people or activities.
1. I am less interested in other people or things than before.
2. I have lost most of my interest in other people or things.
3. It's hard to get interested in anything.

13. Indecisiveness
0. I make decisions about as well as ever.
1. I find it more difficult to make decisions than usual.
2. I have much greater difficulty in making decisions than I used to.
3. I have trouble making any decisions.

14. Worthlessness
0. I do not feel I am worthless.
1. I don't consider myself as worthwhile and useful as I used to.
2. I feel more worthless as compared to other people.
3. I feel utterly worthless.

15. Loss of Energy
0. I have as much energy as ever.
1. I have less energy than I used to have.
2. I don't have enough energy to do very much.
3. I don't have enough energy to do anything.

16. Changes in Sleeping Pattern
0. I have not experienced any change in my sleeping pattern.
la. I sleep somewhat more than usual.
lb. I sleep somewhat less than usual.
2a. I sleep a lot more than usual.
2b. I sleep a Iot less than usual.
3a. I sleep most of the day.
3b. I wake up 1-2 hours early and can't get back to sleep.

17. Irritability
0. I am no more irritable than usual.
1. I am more irritable than usual.
2. I am much more irritable than usual.
3. I am irritable all the time.

18. Changes in Appetite
0. I have not experienced any change in my appetite.
la. My appetite is somewhat less than usual.
lb. My appetite is somewhat greater than usual.
2a. My appetite is much less than before.
2b. My appetite is much greater than usual.
3a. I have no appetite 
3b. I crave food all the time.

19. Concentration Difficulty
0. I can concentrate as well as ever.
1. I can't concentrate as well as usual.
2. It's hard to keep my mind on anything for very long.
3. I find I can't concentrate on anything.

20. Tiredness or Fatigue
0. I am no more tired or fatigued than usual.
1. I get more tired or fatigued more easily than usual.
2. I am too tired or fatigued to do a lot of the things I used to do.
3. I am too tired or fatigued to do most of the things I used to do.

21. Loss of Interest in Sex
0. I have not noticed any recent change in my interest in sex.
1. I am less interested in sex than I used to be.
2. I am much less interested in sex now.
3. I have lost interest in sex completely
at all.


This task aims therefore at exploring the viability of automatically estimating the severity of multiple symptoms associated with depression. Given the user's history of writings, the algorithms have to estimate the user's response to each individual question. We collected questionnaires filled by Social Media users together with their history of writings (we extracted each history of writings right after the user provided us with the filled questionnaire). The questionnaires filled by the users (ground truth) will be used to assess the quality of the responses provided by the participating systems.

The participants will be given a dataset with multiple users (for each user, his history of writings is provided) and they will be asked to produce a file with the following structure:

username1 answer1 answer2 .... answer21
username2 ....

Each line has the username and 21 values. These values correspond with the responses to the questions above (the possible values are 0, 1a, 1b, 2a, 2b, 3a, 3b -for questions 16 and 18- and 0, 1, 2, 3 -for the rest of the questions-).

The 2020 participants will be given 2019's questionnaires and the golden truth responses and, thus, 2019 users can be used for training purposes.

Evaluation will be based on:

  • the overlapping between the questionnaire filled by the real user and the questionnaire filled by the system (number of correct responses).

  • the absolute difference between the levels of depression obtained from both questionnaires (level of depression obtained from the real questionnaire vs level of depression obtained from the estimated questionnaire). The level of depression is simply obtained by summing the numeric values of the responses to the individual questions. This gives an integer value in the range 0-63.

  • the depression level obtained from this questionnaire is regularly used to categorize users as: minimal depression (0-9), mild depression (10-18), moderate depression (19-29), and severe depression (30-63). A third method of evaluation will consist of assessing the systems in terms of how many users are correctly categorized (automatic questionnaire vs real questionnaire).

20 NOV
  • Registration for lab opens
  • 20/11/2019

30 NOV
  • Release of the training data (T1,T2)
  • 30/11/2019

  • Release of the data (T2)
  • 01/02/2020

02 MAR
  • T1: Beginning of test stage (server opens).
  • 02/03/2020

24 MAY (extended)
  • T1: End of test stage (server closes). T2: deadline for submitting participants' results
  • 24/05/2020

5 JUN (extended)
  • Release of evaluation results to all participants
  • 5/06/2020

30 JUN (extended)
  • Task participant papers due
  • 30/06/2020

15 JUL (extended)
  • Notification of acceptance
  • 15/07/2020

25 JUL (extended)
  • Camera ready. Task participant papers
  • 25/07/2020

Preliminary results

Task 1

Task 1 preliminary results

ranking metrics:

Task 1 preliminary results ranking metrics

Task 2

Task 2 preliminary results

Accepted papers

Thursday Sep 24th

Overview of eRisk. David E. Losada, Fabio Crestani, Javier Parapar

Elena Campillo Ageitos, Juan Martinez-Romo and Lourdes Araujo. NLP-UNED at eRisk 2020: self-harm early risk detection with sentiment analysis and linguistic features.

Diego Maupome, Maxime D. Armstrong, Raouf Moncef Belbahar, Josselin Alezot, Rhon Balassiano, Marc Queudot, Sébastien Mosser and Marie-Jean Meurs. Early Mental Health Risk Assessment through Writing Styles, Topics and Neural Models.

Friday Sep 25th

Hojjat Bagherzadeh Hosseinabad, Ehsan Fazl-Ersi and Abedin Vahedian. Detection of early sign of self-harm on Reddit using multi-level machine.

Alina Trifan, Pedro Salgado and José Luis Oliveira. BioInfo@UAVR at eRisk 2020: on the use of psycholinguistics features and machine learning for the classification and quantification of mental diseases.

Linda Achilles, Max Kisselew, Johannes Schäfer and Ralph Koelle. Using Surface and Semantic Features for Detecting Early Signs of Self-Harm in Social Media Postings.

Ana Sabina Uban and Paolo Rosso. Deep learning architectures and strategies for early detection of self-harm and depression level prediction

Amina Madani, Fatima Boumahdi, Anfel Boukenaoui, Mohamed Chaouki Kritli and Hamza Hentabli. USDB at eRisk 2020: Deep learning models to measure the Severity of the Signs of Depression using Reddit Posts.

Rodrigo Martínez-Castaño, Amal Htait, Leif Azzopardi and Yashar Moshfeghi. Early Risk Detection of Self-Harm and Depression Severity using BERT-based Transformers.

Mario Ezra Aragon, A. Pastor López-Monroy and Manuel Montes-Y-Gómez. INAOE-CIMAT at eRisk 2020: Detecting Signs of Self-Harm using Sub-Emotions and Words.

eRisk Wrap-up Session


More information

+34 881 816 451

CLEF 2020 Conference & CLEF initiative:

CLEF 2020