CLEF eRisk: Early risk prediction on the Internet

The process works as follows:

each team that registers for the task gets a token identifier and is assigned a number of runs (variants) (up to 5 variants per team, please be aware that all your runs have to send all decisions (for all users) prior to get the next round of user writings, so choose your number of writing wisely).

Each team has to connect to our REST API server (using its token) and the server will iteratively provide user writings (each request will be followed by a server response containing one writing per subject). More specifically,
send a GET request to:
- Url for the unofficial server. It is just for testing purposes. It will be off after the test phase starts: https://erisk.irlab.org/challenge-service/getwritings/<team token>
- Url for T2 server. t will be on after the test phase starts: https://erisk.irlab.org/challenge-t2/getwritings/<team token>

If it is the first request then the server just outputs the first writing of each user. You should keep for yourself a record of all the users of the first round because you will need that list in every round. The format of the output is:

[
    {
        "id": 18752,
        "number": 0,
        "nick": "subject3798",
        "redditor": 18702,
        "title": "...",
        "content": "...",
        "date": "..."
    },
    {
        "id": 18772,
        "number": 0,
        "nick": "subject7495",
        "redditor": 18703,
        "title": "...",
        "content": "...",
        "date": "..."    
    },
	...
]

(number stores the round number, number=0 means that this is the first writing of the subjects, id is an internal identifier of the writing, and redditor is an internal identifier of the subject)

After each request, the participants’ code has to run its own processing/prediction pipeline (e.g. text preprocessing, vectorization,...) and give back to the server its prediction about each individual. The server will only provide the next round of writings after receiving the responses (all users and all runs) for the current round. To send the responses, each team has to send a POST request to:

unofficial server (off once that test phase starts)https://erisk.irlab.org/challenge-service/submit/<team token>/<run number>

T2 (on after the start of test phase): https://erisk.irlab.org/challenge-t2/submit/<team token>/<run number>

for example, a team with 5 runs has to send 5 POST requests to:

https://erisk.irlab.org/<challenge>/submit/<team token>/0

https://erisk.irlab.org/<challenge>/submit/<team token>/1

https://erisk.irlab.org/<challenge>/submit/<team token>/2

https://erisk.irlab.org/<challenge>/submit/<team token>/3

https://erisk.irlab.org/<challenge>/submit/<team token>/4

where challenge is challenge-service (unofficial server), challenge-t2 (T2). The content of the response has to follow this format:

[
{	
"nick":"subject4170",
"decision":1,
"score":3.1	
},
{
"nick":"subject4171",
"decision":0,
"score":1.2	
}
...
]

score is a numeric estimation of the level of self-harm

decision=1 means "alert" and decisions equal to 1 will be considered as final (further decisions about this individual will be ignored).

decision=0 means "no alert" and they are not final (i.e. you can later submit an alert for this user if you see signs of risk).

We also want to explore ranking-based measures to evaluate the performance of the systems and, therefore, we also ask you to provide an estimated score of the level of self-harm (e.g. give us two things for each individual: the decision (0/1) and your estimation of the level of self-harm.

Once you emitted an alert (after a decision=1), please, keep up processing the user’s new writings and keep sending us the decision and score (the decision will be ignored because the first 1 is the only one that counts, but the score will be used for evaluation purposes).

When you submitted all decisions for all users and runs, you can go back to send a GET request (see above) to obtain the next sequence of writings (but the server checks that the team submitted all entries for all runs, otherwise it does not give you the next round of texts -it just gives you the current round again-).

NOTE 1: the first round contains all users in the collection (because all users have at least one writing). However, after a few rounds, some users will disappear from the server's response. For example, a user with 25 writings will only appear in the first 25 rounds.

NOTE 2: to make things realistic, the server does not inform the participants that a given user message is the last one in the user's thread (i.e. we do not tell the participants "this is the final message of this user"). For example, given a subject with 25 writings, the server will send you the 1st,...,25th message but you will only know that this subject had a total of 25 writings after noticing that this subject does not appear at the 26th round. You can fire an alert (decision=1) at any point (even after the 26th round) but your system's performance will be evaluated based on the delay (measured from the number of rounds).

NOTE 3: the client will receive an empty list from the server when all writings are processed.

In your response, you always have to include all users (otherwise the system will not give you the next round of data). If you participate with multiple runs then all your runs have to send their lists containing all users (otherwise the system will not give you the next round of data). This means that your runs need to be synchronized (we cannot allow a run getting the next round of data while the other runs are still processing the previous rounds because, technically, "the run that goes faster could inform :-) to the other runs from the same team").

Client Application

To facilitate participation in eRisk, we have prepared an example of a client application that communicates with the server. The application is available at http://gitlab.irlab.org/javier/erisk-dummy-client/

This example is written in Java and Python and sends random decisions to the server. Of course, you can build your own application using other programming languages (as long as you fulfill with the GET/POST request requirements described above).

The main features of this client application are:

team token and runs: these two variables store the team token that you get from the eRisk organizers and the number of runs that you are sending to the server, respectively.
getUserWritings: this method communicates with the server in order to get the next round of writings
the main method iterates until the server has sent all writings. For each round (while loop), the internal for sends the decisions associated to each run
submitDecisions: this method communicates with the server in order to send the decisions associated with the current round. The for loop generates random decisions and scores. This is the part that you would need to modify in order to send decisions according to your predictive algorithm.

Test the server now!!!

To facilitate your participation in eRisk, you can start testing your code with old data (unofficial server). We injected the 2018 test depression cases (T1 2018) into the server and we can give you a token so that you can start preparing your software. Contact Javier Parapar (javierparapar AT udc DOT es) to get your access token.

eRisk 2024 Server (Task 2)

Client Application

Test the server now!!!

More information

CLEF 2023 Conference & CLEF initiative: