Prikaži enostavni zapis vnosa

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.date.accessioned 2018-10-27T13:53:27Z
dc.date.available 2018-10-27T13:53:27Z
dc.date.issued 2018-10-27
dc.identifier.uri http://hdl.handle.net/11356/1202
dc.description FRENK-STYRIA-24sata is a dataset of moderated newspaper comments from the website 24sata.hr with metadata on the time of publishing, user identifier, thread identifier and whether the comment was deleted by the moderators or not. The full text of each comment is encrypted via a character-replacement method so that the comments are not readable by humans. Basic punctuation is not encrypted in order to enable tokenization. The main use of this dataset are experiments on automating comment moderation. For real-world usage, a fastText classification model trained on non-encrypted data is made available as well.
dc.language.iso hrv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby https://drive.google.com/file/d/13m7PFn49_tnEfFjcbqk8cugG4ZTy2A5I/view
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri http://nl.ijs.si/frenk/
dc.subject computer-mediated communication
dc.subject news comments
dc.subject content moderation
dc.title Dataset and baseline model of moderated content FRENK-STYRIA-24sata 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J7-8280 FRENK: Resources, methods, and tools for the understanding, identification, and classification of various forms of socially unacceptable discourse in the information society nationalFunds
size.info 17042965 texts
size.info 407549127 words
files.count 2
files.size 8186195223


 Datoteke v tem vnosu

Icon
Ime
frenk-sty.tbl.enc.zip
Velikost
1.45 GB
Format
application/zip
Opis
TSV dataset with encrypted texts
MD5
aafb5a1e58790722bbf75bc50ea3f2dc
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • frenk-sty.tbl.enc5 GB
    • frenk-sty.tbl.enc.readme810 B
Icon
Ime
frenk-sty.tbl.model.zip
Velikost
6.18 GB
Format
application/zip
Opis
fastText model
MD5
a8e991ccbfa9444d97e5a2f3542d029f
 Prenesi datoteko  Predogled
 Predogled datoteke  
    • frenk-sty.tbl.model.readme561 B
    • frenk-sty.tbl.model.bin6 GB

Prikaži enostavni zapis vnosa