CLEF eRisk: Early risk prediction on the Internet

The process works as follows:

Each team that registers for the task gets a token identifier and is assigned a number of runs (variants) (up to 5 variants per team, please be aware that all your runs have to send all decisions (for all users) prior to get the next round of user writings. Otherwise, one run would access to information that is not specific to its corresponding round.
Each team has to connect to our REST API server (using its token) and the server will iteratively provide user writings (each request will be followed by a server response containing one writing per subject). More specifically:
send a GET request to:
- Url for the unofficial server. It is just for testing purposes. It will be off after the test phase starts: https://erisk.irlab.org/challenge-service/getdiscussions/<team token>
- Url for official T2 server (It will be on after the test phase) starts: https://erisk.irlab.org/challenge-t2/getdiscussions/<team token>

Important Note:

Each team has 5 runs assigned by default.
To process all rounds correctly, teams must submit results for every round.
If you are using fewer than 5 runs, submit the unused runs with all decisions as "0" to maintain synchronization.

If it is the first request then the server just outputs the first discussion of each user. You should keep for yourself a record of all the users of the first round because you will need that list in every round. For each round, you will one thread where the targetUser participated. In that thread, the targetUser can have one or more comments. The format of the output is:

                  [
                  {
                      "submissionId": "mdB60ef",
                      "author": "subject_lEQN6dA",
                      "date": "2023-03-08T17:26:33.000+00:00",
                      "body": "...",
                      "title": "...",
                      "number": 3,
                      "targetSubject": "subject_6wEJkcb",
                      "comments": [
                          {
                              "commentId": "UspY8Bg",
                              "author": "subject_6wEJkcb",
                              "date": "2023-03-08T17:51:42.000+00:00",
                              "body": "...",
                              "parent": "mdB60ef"
                          },
                          ...
                          {
                              "commentId": "nsnT1GB",
                              "author": "subject_ifthvcc",
                              "date": "2023-03-22T19:15:33.000+00:00",
                              "body": "...",
                              "parent": "bmC4ctO"
                          }
                      ]
                  },
                  {
                      "submissionId": "0F6QmWR",
                      "author": "subject_Wotqigb",
                      "date": "2024-11-02T20:53:53.000+00:00",
                      "body": "...",
                      "title": "...",
                      "number": 3,
                      "targetSubject": "subject_pypfjky",
                      "comments": [
                          {
                              "commentId": "Oeas2Wu",
                              "author": "subject_pypfjky",
                              "date": "2024-11-02T21:55:41.000+00:00",
                              "body": "...",
                              "parent": "K3Z1yt8"
                          },
                          {
                              "commentId": "5CTC18p",
                              "author": "subject_2DDad7j",
                              "date": "2024-11-02T21:03:09.000+00:00",
                              "body": "...",
                              "parent": "0F6QmWR"
                          },
                          ...
                          {
                              "commentId": "ZqEqil6",
                              "author": "subject_pypfjky",
                              "date": "2024-11-02T21:09:50.000+00:00",
                              "body": "...",
                              "parent": "0F6QmWR"
                          }
                      ]
                  }
              ]

In the dataset, there are two types of instances: Submissions and Comments.

Submissions represent the primary posts created by users. These are the main content entries, often containing a title, a body, and additional metadata such as the author and date.

Comments are responses or replies made by users to a submission or to other comments, forming a hierarchical structure. Each comment includes information about the author, its content, and its parent (which could be another comment or a submission).

Submission Fields:

submissionId: A unique string identifier for the submission.

author: The anonymous identifier of the user who created the submission.

date: The timestamp indicating when the submission was created, in ISO 8601 format.

body: The main content of the submission (the text body).

title: The title summarizing the submission's content.

number: The round number of the submission. A value of 0 indicates the first writing of the subject.

targetSubject: The anonymous identifier of the target subject (the one to classify) related to the submission.

comments: A list of comments associated with the submission, where each comment includes its own fields.

Comment Fields:

commentId: A unique string identifier for the comment, used for referencing.

author: The anonymous identifier of the user who wrote the comment.

date: The timestamp indicating when the comment was created, in ISO 8601 format.

body: The text content of the comment.

parent: The identifier of the parent item (either a submissionId or commentId) that the comment replies to.

After each request, the participants’ code has to run its own processing/prediction pipeline (e.g. text preprocessing, vectorization,...) and give back to the server its prediction about each individual. The server will only provide the next round of writings after receiving the responses (all users and all runs) for the current round. To send the responses, each team has to send a POST request to:

unofficial server (off once that test phase starts)https://erisk.irlab.org/challenge-service/submit/<team token>/<run number>

T2 (on after the start of test phase): https://erisk.irlab.org/challenge-t2/submit/<team token>/<run number>

for example, a team with 5 runs has to send 5 POST requests to:

https://erisk.irlab.org/<challenge>/submit/<team token>/0

https://erisk.irlab.org/<challenge>/submit/<team token>/1

https://erisk.irlab.org/<challenge>/submit/<team token>/2

https://erisk.irlab.org/<challenge>/submit/<team token>/3

https://erisk.irlab.org/<challenge>/submit/<team token>/4

where challenge is challenge-service (unofficial server), challenge-t2 (T2). The content of the response has to follow this format:

[
{	
"nick":"subject4170",
"decision":1,
"score":3.1	
},
{
"nick":"subject4171",
"decision":0,
"score":1.2	
}
...
]

score is a numeric estimation of the level of depression

decision=1 means "alert" and decisions equal to 1 will be considered as final (further decisions about this individual will be ignored).

decision=0 means "no alert" and they are not final (i.e. you can later submit an alert for this user if you see signs of risk).

We also want to explore ranking-based measures to evaluate the performance of the systems and, therefore, we also ask you to provide an estimated score of the level of self-harm (e.g. give us two things for each individual: the decision (0/1) and your estimation of the level of self-harm.

Once you emitted an alert (after a decision=1), please, keep up processing the user’s new writings and keep sending us the decision and score (the decision will be ignored because the first 1 is the only one that counts, but the score will be used for evaluation purposes).

When you submitted all decisions for all users and runs, you can go back to send a GET request (see above) to obtain the next sequence of writings (but the server checks that the team submitted all entries for all runs, otherwise it does not give you the next round of texts -it just gives you the current round again-).

NOTE 1: the first round contains all users in the collection (because all users have at least one writing). However, after a few rounds, some users will disappear from the server's response. For example, a user with 25 writings will only appear in the first 25 rounds.

NOTE 2: to make things realistic, the server does not inform the participants that a given user message is the last one in the user's thread (i.e. we do not tell the participants "this is the final message of this user"). For example, given a subject with 25 writings, the server will send you the 1st,...,25th message but you will only know that this subject had a total of 25 writings after noticing that this subject does not appear at the 26th round. You can fire an alert (decision=1) at any point (even after the 26th round) but your system's performance will be evaluated based on the delay (measured from the number of rounds).

NOTE 3: the client will receive an empty list from the server when all writings are processed.

In your response, you always have to include all users (otherwise the system will not give you the next round of data). If you participate with multiple runs then all your runs have to send their lists containing all users (otherwise the system will not give you the next round of data). This means that your runs need to be synchronized (we cannot allow a run getting the next round of data while the other runs are still processing the previous rounds because, technically, "the run that goes faster could inform :-) to the other runs from the same team").

Client Application

To facilitate participation in eRisk, we have prepared an example of a client application that communicates with the server. The application is available at https://gitlab.irlab.org/anxo.pvila/erisk25-t2-dummy-client

This example is written in Python and automates the process of retrieving discussions and submitting decisions to the server. Of course, you can build your own application using other programming languages (as long as you fulfill the GET/POST request requirements described above).

The main features of this client application are:

Automated Round Processing: The dummy_client.py script automates the process of retrieving discussions and submitting decisions. It loops through all discussion rounds until all discussions are processed. For each round:
- Retrieves discussions using get_discussions.py.
- Submits decisions for all runs (0 to the maximum number_of_runs) using submit_run_decisions.py.
Output Structure: The client saves its outputs in the following structure:
- Discussions: Saved as JSON files in the output_discussions/ directory. Example file: output_discussions/<team_token>_discussions_number_<discussion_number>.json.
- Target Subjects: Saved as a .txt file in the dummy_users/ directory. Example file: dummy_users/<team_token>_target_subjects.txt.
Retry Logic: All requests include retry limitations with exponential backoff to avoid saturating the server. The client retries up to 5 times, with progressively increasing wait times between attempts.
Initial Configuration: Before running the scripts:
- Update the TEAM_TOKEN in all scripts to your actual team token.
- The number_of_runs parameter allows defining how many runs (0 to 4) are processed per submission round, with the default set to 5.
Important Note: To retrieve discussions for the next round, the POST request to submit decisions must include decisions for all users, even if a user was previously classified as positive (decision = 1).

Test the server now!!!

To facilitate your participation in eRisk, you can start testing your code with some dummy data (unofficial server). We injected three dummy users into the server and we can give you a token so that you can start preparing your software. Contact Anxo perez (anxo.pvila AT udc DOT es) to get your access token.

eRisk 2025 Server (Task 2)

Submission Fields:

Comment Fields:

Client Application

Test the server now!!!

More information

CLEF 2025 Conference & CLEF initiative: