CLEF eRisk: Early risk prediction on the Internet

Results

(April 24th, 2018) The tasks have finished! We received 45 contributions from 11 different institutions for Task 1 and 35 contributions from from 9 instituions. The list of participants is shown below:

Task 1: Depression

Institution	Submitted files

FH Dortmund, Germany	FHDO-BCSGA
	FHDO-BCSGB
	FHDO-BCSGC
	FHDO-BCSGD
	FHDO-BCSGE

IRIT, France	LIIRA
	LIIRB
	LIIRC
	LIIRD
	LIIRE

LIRMM, University of Montpellier, France	LIRMMA
	LIRMMB
	LIRMMC
	LIRMMD
	LIRMME

Instituto Tecnológico Superior del Oriente del Estado de Hidalgo, Mexico PEIMEXA	PEIMEXA
Instituto Tecnológico Superior del Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico	PEIMEXB
Instituto Tecnológico Superior del Universidad de Houston, USA	PEIMEXC
Instituto Tecnológico Superior del Universidad Autónoma del Estado de Hidalgo, Mexico	PEIMEXD
	PEIMEXE

Ramakrishna Mission Vivekananda Educational and Research Institute, Belur Math, West Bengal, India	RKMVERIA
	RKMVERIB
	RKMVERIC
	RKMVERID
	RKMVERIE

University of A Coruña, Spain	UDCA
	UDCB
	UDCC
	UDCD
	UDCE

Universidad Nacional de San Luis, Argentina	UNSLA
	UNSLB
	UNSLC
	UNSLD
	UNSLE

Universitat Pompeu Fabra, Spain	UPFA
	UPFB
	UPFC
	UPFD

Université du Québec à Montréal, Canada	UQAMA

The Black Swan, Taiwan	TBSA

Tokushima University, Japan	TUA1A
	TUA1B
	TUA1C
	TUA1D

We evaluated the contributed runs with Early Risk Detection Error (ERDE). This is an error measure that takes into account the accuracy of the decisions and the delay. More info about ERDE can be found in [Losada & Crestani 2016]. The following table reports the performance results (we also include standard classification metrics: F1, Precision, and Recall). Looking forward to knowing about the specifics of each early detection algorithm!

	ERDE₅	ERDE₅₀	F1	P	R

FHDO-BCSGA	9.21%	6.68%	0.61	0.56	0.67
FHDO-BCSGB	9.50%	6.44%	0.64	0.64	0.65
FHDO-BCSGC	9.58%	6.96%	0.51	0.42	0.66
FHDO-BCSGD	9.46%	7.08%	0.54	0.64	0.47
FHDO-BCSGE	9.52%	6.49%	0.53	0.42	0.72
LIIRA	9.46%	7.56%	0.50	0.61	0.42
LIIRB	10.03%	7.09%	0.48	0.38	0.67
LIIRC	10.51%	7.71%	0.42	0.31	0.66
LIIRD	10.52%	7.84%	0.42	0.31	0.66
LIIRE	9.78%	7.91%	0.55	0.66	0.47
LIRMMA	10.66%	9.16%	0.49	0.38	0.68
LIRMMB	11.81%	9.20%	0.36	0.24	0.73
LIRMMC	11.78%	9.02%	0.35	0.23	0.71
LIRMMD	11.32%	8.08%	0.32	0.22	0.57
LIRMME	10.71%	8.38%	0.37	0.29	0.52
PEIMEXA	10.30%	7.22%	0.38	0.28	0.62
PEIMEXB	10.30%	7.61%	0.45	0.37	0.57
PEIMEXC	10.07%	7.35%	0.37	0.29	0.51
PEIMEXD	10.11%	7.70%	0.39	0.35	0.44
PEIMEXE	10.77%	7.32%	0.35	0.25	0.57
RKMVERIA	10.14%	8.68%	0.52	0.49	0.54
RKMVERIB	10.66%	9.07%	0.47	0.37	0.65
RKMVERIC	9.81%	9.08%	0.48	0.67	0.38
RKMVERID	9.97%	8.63%	0.58	0.60	0.56
RKMVERIE	9.89%	9.28%	0.21	0.35	0.15
UDCA	10.93%	8.27%	0.26	0.17	0.53
UDCB	15.79%	11.95%	0.18	0.10	0.95
UDCC	9.47%	8.65%	0.18	0.13	0.29
UDCD	12.38%	8.54%	0.18	0.11	0.61
UDCE	9.51%	8.70%	0.18	0.13	0.29
UNSLA	8.78%	7.39%	0.38	0.48	0.32
UNSLB	8.94%	7.24%	0.40	0.35	0.46
UNSLC	8.82%	6.95%	0.43	0.38	0.49
UNSLD	10.68%	7.84%	0.45	0.31	0.85
UNSLE	9.86%	7.60%	0.60	0.53	0.70
UPFA	10.01%	8.28%	0.55	0.56	0.54
UPFB	10.71%	8.60%	0.48	0.37	0.70
UPFC	10.26%	9.16%	0.53	0.48	0.61
UPFD	10.16%	9.79%	0.42	0.42	0.42
UQAMA	10.04%	7.85%	0.42	0.32	0.62
TBSA	10.81%	9.22%	0.37	0.29	0.52
TUA1A	10.19%	9.70%	0.29	0.31	0.27
TUA1B	10.40%	9.54%	0.27	0.25	0.28
TUA1C	10.86%	9.51%	0.47	0.35	0.71
TUA1D	-	-	0.00	0.00	0.00

Task 2: Anorexia

Institution	Submitted files

FH Dortmund, Germany	FHDO-BCSGA
	FHDO-BCSGB
	FHDO-BCSGC
	FHDO-BCSGD
	FHDO-BCSGE

IRIT, France	LIIRA
	LIIRB
	LIIRC
	LIIRD
	LIIRE

LIRMM, University of Montpellier, France	LIRMMA
	LIRMMB
	LIRMMC
	LIRMMD
	LIRMME

Instituto Tecnológico Superior del Oriente del Estado de Hidalgo, Mexico PEIMEXA	PEIMEXA
Instituto Tecnológico Superior del Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico	PEIMEXB
Instituto Tecnológico Superior del Universidad de Houston, USA	PEIMEXC
Instituto Tecnológico Superior del Universidad Autónoma del Estado de Hidalgo, Mexico	PEIMEXD
	PEIMEXE

Ramakrishna Mission Vivekananda Educational and Research Institute, Belur Math, West Bengal, India	RKMVERIA
	RKMVERIB
	RKMVERIC
	RKMVERID
	RKMVERIE

Universidad Nacional de San Luis, Argentina	UNSLA
	UNSLB
	UNSLC
	UNSLD
	UNSLE

Universitat Pompeu Fabra, Spain	UPFA
	UPFB
	UPFC
	UPFD

The Black Swan, Taiwan	TBSA

Tokushima University, Japan	TUA1A
	TUA1B
	TUA1C
	TUA1D

	ERDE₅	ERDE₅₀	F1	P	R

FHDO-BCSGA	12.17%	7.98%	0.71	0.67	0.76
FHDO-BCSGB	11.75%	6.84%	0.81	0.84	0.78
FHDO-BCSGC	13.63%	9.64%	0.55	0.47	0.66
FHDO-BCSGD	12.15%	5.96%	0.81	0.75	0.88
FHDO-BCSGE	11.98%	6.61%	0.85	0.87	0.83
LIIRA	12.78%	10.47%	0.71	0.81	0.63
LIIRB	13.05%	10.33%	0.76	0.79	0.73
LIRMMA	13.65%	13.04%	0.54	0.52	0.56
LIRMMB	14.45%	12.62%	0.52	0.41	0.71
LIRMMC	16.06%	15.02%	0.42	0.28	0.78
LIRMMD	17.14%	14.31%	0.34	0.22	0.76
LIRMME	14.89%	12.69%	0.41	0.32	0.59
PEIMEXA	12.70%	9.25%	0.46	0.39	0.56
PEIMEXB	12.41%	7.79%	0.64	0.57	0.73
PEIMEXC	13.42%	10.50%	0.43	0.37	0.51
PEIMEXD	12.94%	9.86%	0.67	0.61	0.73
PEIMEXE	12.84%	10.82%	0.31	0.28	0.34
RKMVERIA	12.17%	8.63%	0.67	0.82	0.56
RKMVERIB	12.93%	12.31%	0.46	0.81	0.32
RKMVERIC	12.85%	12.85%	0.25	0.86	0.15
RKMVERID	12.89%	12.89%	0.31	0.80	0.20
RKMVERIE	12.93%	12.31%	0.46	0.81	0.32
UNSLA	12.48%	12.00%	0.17	0.57	0.10
UNSLB	11.40%	7.82%	0.61	0.75	0.51
UNSLC	11.61%	7.82%	0.61	0.75	0.51
UNSLD	12.93%	9.85%	0.79	0.91	0.71
UNSLE	12.93%	10.13%	0.74	0.90	0.63
UPFA	13.18%	11.34%	0.72	0.74	0.71
UPFB	13.01%	11.76%	0.65	0.81	0.54
UPFC	13.17%	11.60%	0.73	0.76	0.71
UPFD	12.93%	12.30%	0.60	0.86	0.46
TBSA	13.65%	11.14%	0.67	0.60	0.76
TUA1A	-	-	0.00	0.00	0.00
TUA1B	19.90%	19.27%	0.25	0.15	0.76
TUA1C	13.53%	12.57%	0.36	0.42	0.32

Training data

The training data will be sent to all registered participants on (tentative date) Nov 30th, 2017.

To have access to the collection all participants have to fill, sign and send a user agreement form (follow the instructions provided here). Once you have submitted the signed copyright form, you can proceed to register for the lab at CLEF 2018 Labs Registration site

The training data contain the following components:

risk_golden_truth.txt: this file contains the ground truth (one line per subject). For task 1, the code 1 means that the subject is a risk case of depression, while 0 means that the subject is a non-risk case. For task 2, the code 1 means that the subject is a risk case of anorexia, while 0 means that the subject is a non-risk case
positive_examples_anonymous_chunks: this folder, which stores all the posts of the risk cases, contains 10 subfolders. Each subfolder corresponds with one chunk. Chunk 1 contains the oldest writings of all users (first 10% of submitted posts or comments), chunk 2 contains the second oldest writings, and so forth. The name of the files follows de convention: subjectname_chunknumber.xml
negative_examples_anonymous_chunks: this folder, which stores all the posts of the non-risk cases, contains 10 subfolders. Each subfolder corresponds with one chunk. Chunk 1 contains the oldest writings of all users (first 10% of submitted posts or comments), chunk 2 contains the second oldest writings, and so forth. The name of the files follows de convention: subjectname_chunknumber.xml
scripts evaluation (see below)

This is the training data and, therefore, you get all chunks now. But you should adapt your algorithms in a way that the chunks are processed according to the sequence (for example, don't process chunk3 if you have not processed chunk1 and chunk2).

SCRIPTS FOR EVALUATION:

To facilitate your experiments, we provide two scripts that could be of help during the training stage. These scripts are in the scripts evaluation folder.

We recommend you to follow these steps:

use your early detection algorithm to process chunk1 files and produce your first output file (e.g. usc_1.txt). This file should follow the format described in the instructions for test (see the "Test" tab: 0/1/2 for each subject).

Do the same for all the chunki files (i: 2, ..., 10). When you process chunki files it is OK to use information from chunkj files (for j<=i). Note that the chunkj files (such that j=1...i) contain all posts/comments that you have seen after the ith release of data.
you now have your 10 output files (e.g. usc_1.txt ... usc_10.txt). as argued above, you need to take a decision on every subject (you cannot say 0 all the time). so, every subject needs to have 1/2 assigned in some of your output files.

use the aggregate_results.py to combine your output files into a global output file. This aggregation script has two inputs: 1) the folder where you have your 10 output files and 2) the path to the file writings_per_subject_all_train.txt. The writings_per_subject_all_train.txt file stores the number of writings per subject. This is required because we need to know how many writings where needed to take each decision. For instance, if subject_k has a total number of 500 writings in the collection then every chunk has 50 writings from subject_k. If your team needed 2 chunks to make a decision on subject_k then we will store 100 as number of writings that you needed to take this decision.

Example of usage: $ python aggregate_results.py -path path to the folder where you have your 10 files -wsource path to the writings_per_subject_all_train.txt file

This scripts creates a file, e.g. usc_global.txt, which stores your final decision on every subject and the number of writings that you saw before making the decision.
get the final performance results from the erisk_eval.py script. It has three inputs: a) path to the golden truth file (risk_golden_truth.txt), b) path to the overall output file, and c) value of o (delay parameter of the ERDE metric).

Example: $ python erisk_eval.py -gpath path to the risk_golden_truth.txt file -ppath path to the overall output file -o value of ERDE delay parameter

Example: $ python erisk_eval.py -gpath ../risk_golden_truth.txt -ppath ../folder/usc_global.txt -o 5

Test data

At test time, we will first release chunk1 for the test subjects and ask you for your output. A few days later, we will release chunk2, and so forth. The format required for the output file to be sent after each release of test data will be the following:

2-column text file. The name of the file should be ORG_n.txt (where ORG is an acronym for your organization and n is the chunk number; e.g. usc_1.txt). The file should contain one line per user in the test collection:

test_subject_id1 CODE
test_subject_id2 CODE
.......................

IMPORTANT NOTE: You have to put exactly two tabs between the subject name and the CODE (otherwise, the python evaluation script does not work!!!)

test_subject_idn is the id of the test_subject (ID field in the XML files)

CODE is your decision about the subject, three possible values:
- CODE=0 means that you don't want to emit a decision on this subject (you want to wait and see more evidence)
- For task 1, CODE=1 means that you want to emit a decision on this subject, and your decision is that he/she is a risk case of depression. For task 2, CODE=1 means that you want to emit a decision on this subject, and your decision is that he/she is a risk case of anorexia.
- For task 1, CODE=2 means that you want to emit a decision on this subject, and your decision is that he/she is NOT a risk case of depression. For task 2, CODE=2 means that you want to emit a decision on this subject, and your decision is that he/she is NOT a risk case of anorexia.

If you emit a decision on a subject then any future decision on the same subject will be ignored. For simplicity, you can include all subjects in all your submitted files but, for each user, your algorithm will be evaluated based on the first file that contains a decision on the subject. And you cannot say 0 all the time: at some point you need to make a decision on every subject (i.e. at the latest, after the 10th chunk, you need to emit your decision).

If a team does not submit the required file before the deadline then we'll take the previous file from the same team and assume that all things stay the same (no new emissions for this round).

If a team does not submit the file after the first round then we´ll assume that the team does not take any decision (all subjects set to 0 -no decision- ).

Each team can experiment with several models for this task and submit up to 5 files for each round. If you test different models then the files should be named: ORGA_n.txt (decisions after the nth chunk by model A), ORGB_n.txt (decisions after the nth chunk by model B), etc.

More info: [Losada & Crestani 2016]

eRisk 2018

Results

Task 1: Depression

Task 2: Anorexia

Training data

Test data

More information

CLEF 2018 Conference & CLEF initiative: