Show simple item record

 
dc.contributor.author Mochtak, Michal
dc.contributor.author Rupnik, Peter
dc.contributor.author Meden, Katja
dc.contributor.author Ljubešić, Nikola
dc.date.accessioned 2023-09-19T16:19:23Z
dc.date.available 2023-09-19T16:19:23Z
dc.date.issued 2023-09-18
dc.identifier.uri http://hdl.handle.net/11356/1868
dc.description The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom, annotated with a 6-level sentiment schema (defined below). The data coming from the parliaments of Bosnia and Herzegovina, Croatia and Serbia are organised as a single parliament group, named "BCS", due to the similarity of the official languages in these countries. For each of the six parliaments / parliament groups, 2,600 training instances were annotated by two annotators, with one additional conflict resolution step. While these training instances were sampled via sentiment lexicons to contain more sentiment-loaded sentences, two test sets were randomly sampled from selected parliaments, one from the BCS parliament group, another from the parliament of the United Kingdom. Each test set consists of 2,600 sentences, annotated by one highly trained annotator. Training datasets were internally split into "train", "dev" and "test" portions" for performing language-specific experiments. The 6-level annotation schema is the following: - Positive for sentences that are entirely or predominantly positive - Negative for sentences that are entirely or predominantly negative - M_Positive for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the positive sentiment - M_Negative for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the negative sentiment - P_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the positive sentiment - N_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the negative sentiment
dc.language.iso bos
dc.language.iso hrv
dc.language.iso ces
dc.language.iso eng
dc.language.iso srp
dc.language.iso slk
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby http://arxiv.org/abs/2309.09783
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.eu/parlamint
dc.subject sentiment classification
dc.subject sentiment analysis
dc.subject parliamentary debates
dc.subject Bosnian Parliament
dc.subject Croatian Parliament
dc.subject Czech Parliament
dc.subject English Parliament
dc.subject Serbian Parliament
dc.subject Slovak Parliament
dc.subject Slovenian Parliament
dc.title The multilingual sentiment dataset of parliamentary debates ParlaSent 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://huggingface.co/classla/xlm-r-parlasent
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) N6-0099 LiLaH: Linguistic Landscape of Hate Speech nationalFunds
sponsor ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds
size.info 18200 sentences
size.info 7 files
files.count 8
files.size 7793411


 Files in this item

 Download all files in item (7.43 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
ParlaSent_BCS.jsonl
Size
1.13 MB
Format
Unknown
Description
BCS train file
MD5
c8b59c84c476b031cc553bc3c768e627
 Download file
Icon
Name
ParlaSent_CZ.jsonl
Size
1.15 MB
Format
Unknown
Description
Czech train file
MD5
ff633c11f3d0e1e8fc544db0732e8104
 Download file
Icon
Name
ParlaSent_EN.jsonl
Size
1.1 MB
Format
Unknown
Description
English train file
MD5
9c011abd994c14dc53afb37013fdac05
 Download file
Icon
Name
ParlaSent_SK.jsonl
Size
1.13 MB
Format
Unknown
Description
Slovak train file
MD5
2e2944d8edaa2021b361e3ec3d23a5ee
 Download file
Icon
Name
ParlaSent_BCS_test.jsonl
Size
948.03 KB
Format
Unknown
Description
BCS test file
MD5
ee8699a4a7b1a834f79fe74b8ebdfaf1
 Download file
Icon
Name
ParlaSent_EN_test.jsonl
Size
940.29 KB
Format
Unknown
Description
English test file
MD5
003f0aeded7001574e79c49b09401e83
 Download file
Icon
Name
ParlaSent_SL.jsonl
Size
1.07 MB
Format
Unknown
Description
Slovenian train file
MD5
1117ec542bd1812681a2fff7f0eae1e2
 Download file
Icon
Name
README.txt
Size
2.15 KB
Format
Text file
Description
README with attribute descriptions
MD5
583856c8d470334e5638f6a078f727d5
 Download file  Preview
 File Preview  
The multilingual sentiment dataset of parliamentary debates ParlaSent 1.0
http://hdl.handle.net/11356/1868

The dataset consists of five training datasets and two test sets. The test sets have a _test.jsonl suffix.

The attributes in training data are the following:
- sentence - the sentence labeled for sentiment
- country - the country of the parliament the sentence comes form
- annotator1 - first annotator's annotation
- annotator2 - second annotator's annotation
- reconciliation - the final label agreed upon after reconciliation
- label - three level (positive, negative, neutral) label based on the reconciliation label
- document_id - internal identifier of the document the sentence comes form
- sentence_id - internal identifier of the sentence inside the document
- term - the term of the parliament the sentence comes from
- date - the date the sentence was uttered as part of a speech in the parliament
- name - name of the MP giving the speech
- party - the party of the MP
- gender . . .
                                            

Show simple item record