FRENK-STYRIA-24sata is a dataset of moderated newspaper comments from the website 24sata.hr with metadata on the time of publishing, user identifier, thread identifier and whether the comment was deleted by the moderators or not. The full text of each comment is encrypted via a character-replacement method so that the comments are not readable by humans. Basic punctuation is not encrypted in order to enable tokenization. The main use of this dataset are experiments on automating comment moderation. For real-world usage, a fastText classification model trained on non-encrypted data is made available as well.
ARRS (Slovenian Research Agency)J7-8280"FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society"