Show simple item record

 
dc.contributor.author Mochtak, Michal
dc.contributor.author Rupnik, Peter
dc.contributor.author Ljubešić, Nikola
dc.date.accessioned 2022-06-08T08:33:44Z
dc.date.available 2022-06-08T08:33:44Z
dc.date.issued 2022-06-08
dc.identifier.uri http://hdl.handle.net/11356/1585
dc.description The dataset consists of mid-length sentences from the Bosnian, Croatian and Serbian parliamentary proceedings, annotated with a 6-level sentiment schema (defined below). The first 1,300 instances were annotated by two annotators, and a reconciliation procedure was performed if there was disagreement on the simplified 3-level schema (Positive, Negative, Neutral). The latter 1,300 instances were annotated by second annotator only. Besides having the annotations of the two annotators and potential reconciliation annotations, there is also a handy 3-level label available for all instances. Each sentence can be followed back to the original datasets (https://doi.org/10.5281/zenodo.6517697, https://doi.org/10.5281/zenodo.6521372, https://doi.org/10.5281/zenodo.6521648) via a document and sentence identifier. Date of the speech and the speaker name are given as well. If the speaker is MP, information on party, gender and year of birth are available as well. The dataset is split into a training (2,150 instances), development (150 instances) and testing subset (300 instances). The full 6-level annotation schema is the following: - Positive for sentences that are entirely or predominantly positive - Negative for sentences that are entirely or predominantly negative - M_Positive for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the positive sentiment in a strict binary classification - M_Negative for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the negative sentiment in a strict binary classification - P_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the positive sentiment in a strict binary classification - N_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the negative sentiment in a strict binary classification
dc.language.iso bos
dc.language.iso hrv
dc.language.iso srp
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.eu/parlamint
dc.subject sentiment classification
dc.subject parliamentary debates
dc.title The sentiment corpus of parliamentary debates ParlaSent-BCS v1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://huggingface.co/classla/bcms-bertic-parlasent-bcs-ter
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds
size.info 2600 sentences
files.count 1
files.size 1187107


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
ParlaSent-BCS.jsonl
Size
1.13 MB
Format
Unknown
Description
JSONL dataset
MD5
8617eac2b69bf9198e6566b379d80833
 Download file

Show simple item record