dc.contributor.author | Vintar, Špela |
dc.date.accessioned | 2025-10-05T11:16:58Z |
dc.date.available | 2025-10-05T11:16:58Z |
dc.date.issued | 2025-09-18 |
dc.identifier.uri | http://hdl.handle.net/11356/2049 |
dc.description | The EEC-SL dataset is a localised and adapted version of the Equity Evaluation Corpus (EEC, Kiritchenko and Mohammad, 2018, https://aclanthology.org/S18-2005/). It consists of 8,640 sentences which were automatically generated to evaluate social bias in sentiment analysis systems. The sentences are created from 22 templates, with each template containing a reference to <person>, where the slot can be filled either by a name (female and male, Slovenian and non-Slovenian), or by a generic noun phrase (e.g., moja sestra [my sister], ta moški [this man], moj oče [my dad]). The second and third variables that are present in 7 out of 11 templates are <emotional state word> and <emotional situation word>, which can be filled by words expressing four basic emotional states: Anger, Fear, Joy and Sadness. Template example: Zaradi te situacije se <person_F_1> počuti <emotional_state_word_S_4>. The selection of names was conceptualised to represent the current social reality in Slovenia, so that the foreign names were carefully selected to match the demographic situation in the country, and at the same time be perceived as non-Slovenian. Hence, we selected 10 female and 10 male Slovenian names, 6 female and 6 male names from former Yugoslavia, 2 female and 2 male names from EU countries, and 2 female and 2 male names from non-EU countries. All the names were selected from the registry of names available at the Statistical Office of Slovenia. The emotional state and emotional situation words were selected to represent various intensities of the basic emotions. Their emotional valence was taken from SloEmoLex (http://hdl.handle.net/11356/1875). The templates, names, generic forms and adjectives have been linguistically adapted to Slovenian which is a highly inflected language with agreement in number, gender and case. Thus, instead of the original 11 templates in English, Slovenian uses 22 templates as each English example was translated into a female and male version, depending on the gender of the <person> variable. Along similar lines, each variable can appear in different cases and numbers, which is reflected in the sentence templates. More details are given in the README file. The dataset was originally designed to tease out bias in sentiment analysis systems, because it allows for testing the hypothesis that a system should equally rate the intensity of the emotion expressed by two sentences that differ only in the gender/nationality of the person mentioned (e.g., "Anja je jezna." vs. "Snježana je jezna."). |
dc.language.iso | slv |
dc.publisher | Jožef Stefan Institute |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.subject | sentiment analysis |
dc.subject | social bias |
dc.subject | gender bias |
dc.title | Slovenian Equity Evaluation Corpus EEC-SL 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Špela Vintar spela.vintar@ijs.si Jožef Stefan Institute |
size.info | 8640 sentences |
files.count | 1 |
files.size | 151034 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)



- Name
- EEC_SL_corpus_v1.zip
- Size
- 147.49 KB
- Format
- application/zip
- Description
- CSV dataset and README
- MD5
- 1721a32367982975665fa43309a4e3f1
- EEC_SL_corpus_v1
- EEC_SL_corpus_v1.csv-1 B
- README.txt-1 B