This document provides all details needed to have access to the research collection eRisk 2025.

Any scientific publication derived from the use of this collection should explicitly refer to the following publication:

Crestani, F., Losada, D. E., & Parapar, J. (2022). Early Detection of Mental Health Disorders by Social Media Monitoring. Springer, Cham..

The eRisk 2025 collections are available for research purposes under proper user agreements.

Data

Tasks 1

The collection contains sentences from redditors. The collection is formated as in TREC:
<DOC> 
  <DOCNO> SENTENCE_ID </DOCNO> 
    <PRE> previous sentence text  </PRE> 
    <TEXT>  sentence text  </TEXT> 
    <POST> next sentence text  </POST> 
</DOC>

Tasks 2

In the dataset, there are two types of instances: submissions and comments. Submissions represent the primary posts created by users. They are the main content entries, often containing a title, a body, and additional metadata such as the author and date. Comments are the responses or replies made by users to a submission or to other comments, forming a hierarchical structure. Each comment includes information about the author, content, and its parent (which could be another comment or a submission).

Submission Fields:

Comment Fields:

The files are in JSON format:
        [
        {
            "submissionId": "mdB60ef",
            "author": "subject_lEQN6dA",
            "date": "2023-03-08T17:26:33.000+00:00",
            "body": "...",
            "title": "...",
            "number": 3,
            "targetSubject": "subject_6wEJkcb",
            "comments": [
                {
                    "commentId": "UspY8Bg",
                    "author": "subject_6wEJkcb",
                    "date": "2023-03-08T17:51:42.000+00:00",
                    "body": "...",
                    "parent": "mdB60ef"
                },
                ...
                {
                    "commentId": "nsnT1GB",
                    "author": "subject_ifthvcc",
                    "date": "2023-03-22T19:15:33.000+00:00",
                    "body": "...",
                    "parent": "bmC4ctO"
                }
            ]
        },
        {
            "submissionId": "0F6QmWR",
            "author": "subject_Wotqigb",
            "date": "2024-11-02T20:53:53.000+00:00",
            "body": "...",
            "title": "...",
            "number": 3,
            "targetSubject": "subject_pypfjky",
            "comments": [
                {
                    "commentId": "Oeas2Wu",
                    "author": "subject_pypfjky",
                    "date": "2024-11-02T21:55:41.000+00:00",
                    "body": "...",
                    "parent": "K3Z1yt8"
                },
                {
                    "commentId": "5CTC18p",
                    "author": "subject_2DDad7j",
                    "date": "2024-11-02T21:03:09.000+00:00",
                    "body": "...",
                    "parent": "0F6QmWR"
                },
                ...
                {
                    "commentId": "ZqEqil6",
                    "author": "subject_pypfjky",
                    "date": "2024-11-02T21:09:50.000+00:00",
                    "body": "...",
                    "parent": "0F6QmWR"
                }
            ]
        }
    ]
  

User agreement

This collection can only be used for research purposes. If you are interested in having access to this data, please fill the following user agreement and send it to anxo.pvila@udc.es.