Show simple item record

 
dc.contributor.author Pollak, Senja
dc.contributor.author Vulić, Ivan
dc.contributor.author Pelicon, Andraž
dc.contributor.author Repar, Andraž
dc.contributor.author Armendariz, Carlos
dc.contributor.author Matthew, Purver
dc.contributor.author Ljubešić, Nikola
dc.date.accessioned 2021-03-09T08:38:40Z
dc.date.available 2021-03-09T08:38:40Z
dc.date.issued 2020-05-15
dc.identifier.uri http://hdl.handle.net/11356/1309
dc.description The resource contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators independently, and next, for the examples where the translations differed, the final translations were chosen in a consensus meeting. The translators had also access to Croatian Simlex-999 translations (Mrkšić et al. 2017) and received translation guidelines (see next sheet) inspired by guidelines of Multi-SimLex (Vulić et al. 2020). The resources was used for building the CoSimLex resource (Armendariz et al. 2020). The list contains English original pair of words (Word1 and Word2), their part-of-speech, followed by Slovene translations (Trans1 and Trans2). The last column Comment relates to special cases: - "multiword_translation" -> translators were asked to opt for single-word equivalents, in some cases the only appropriate translation was a multi-word expression (for example, "birthday" -> "rojstni dan"). - "no_translation" -> pairs without a proper translation, i.e. translation pair contains two identical words. Although the translators were asked to find two different translations for the words, in a few examples that was not possible. For example, for the English pair "taxi" and "cab", only "taksi" was considered a good Slovene equivalent. - "duplicated_translation" -> in cases where a pair of words is repeated for two different English original pairs, both occurrences are marked as duplicate translations. - "duplicated_original" -> in one case, the original word pair was a duplicate, which is also marked. Cite: If you use the dataset, please cite the Clarin handle and the following paper: Armendariz, Carlos Santos, Purver, Matthew, Ulčar, Matej, Pollak, Senja, Ljubešić, Nikola, Granroth-Wilding, Mark, and Vaik, Kristiina (2020). CoSimLex: A Resource for Evaluating Graded Word Similarity in Context. In Proceedings of the 12th Language Resources and Evaluation Conference, p. 5878--5886. https://www.aclweb.org/anthology/2020.lrec-1.720/ References: Armendariz, Carlos Santos, Purver, Matthew, Ulčar, Matej, Pollak, Senja, Ljubešić, Nikola, Granroth-Wilding, Mark, and Vaik, Kristiina (2020). CoSimLex: A Resource for Evaluating Graded Word Similarity in Context. In Proceedings of the 12th Language Resources and Evaluation Conference, p. 5878--5886. https://www.aclweb.org/anthology/2020.lrec-1.720/ Hill, F., Reichart, R., and Korhonen, A. (2015). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 41(4):665–695. https://www.aclweb.org/anthology/J15-4004/ Mrkšić, Nikola, Ivan Vulić, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gašić, Anna Korhonen, and Steve Young. (2017). Semantic specialisation of distributional word vector spaces using monolingual and cross-lingual constraints. Transactions of the ACL, 5:309–324. https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00063 Vulić, Ivan, Baker, Simon, Ponti, Edoardo Maria, Petti, Ulla, Leviant, Ira, Wing, Kelly, Majewska, Olga, Bar, Eden, Malone, Matt, Poibeau, Thierry, Reichart, Roi and Anna Korhonen (2020). Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity. Computational Linguistics. https://doi.org/10.1162/coli_a_00391
dc.language.iso slv
dc.language.iso eng
dc.publisher University of Ljubljana
dc.relation info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreferencedby https://www.aclweb.org/anthology/2020.lrec-1.720/
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://embeddia.eu/
dc.subject similarity
dc.subject word embeddings
dc.subject evaluation
dc.title SimLex-999 Slovenian translation SimLex-999-sl 1.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Senja Pollak senja.pollak@ijs.s University of Ljubljana
sponsor European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
size.info 999 entries
files.count 3
files.size 38193


 Files in this item

 Download all files in item (37.3 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
SimLex-999_Slovene.csv
Size
31.57 KB
Format
CSV file
Description
Slovene translation of the pairs of words in SimLex-999
MD5
2ceccfa2f3847c7f9941a53b907989aa
 Download file
Icon
Name
description_resource.txt
Size
3.43 KB
Format
Text file
Description
Text file with the description of the resource
MD5
1230123664a7654fbb5d89a4e988e57c
 Download file  Preview
 File Preview  
Slovene SimLex-999  (Pollak, Senja ; Vulić, Ivan; Pelicon, Andraž ; Repar, Andraž; Armendariz, Carlos ; Matthew, Purver; Ljubešić, Nikola)

Description of the resource: The list contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators independently, and next, for the examples where the translations differed, the final translations were chosen in a consensus meeting. 
The translators had also access to Croatian Simlex-999 translations (Mrkšić et al. 2017) and received translation guidelines (see next sheet) inspired by guidelines of Multi-SimLex (Vulić et al. 2020). The resources was used for building the CoSimLex resource (Armendariz et al. 2020).

The list contains English original pair of words (Word1 and Word2), their part-of-speech, followed by Slovene translations (Trans1 and Trans2). The last column Comment relates to special cases: 

- "multiword_translation" -> translators . . .
                                            
Icon
Name
guidelines_translators.txt
Size
2.29 KB
Format
Text file
Description
Text file with the instructions to the translators
MD5
95745f91bd0ef3cb5fbe87f1dc0a03dd
 Download file  Preview
 File Preview  
Guidelines for translators:

a) Guidelines for individual translations:

For the first set of translations performed by each translator separately, the guidelines were as follows: 
- It is not obligatory to use the same target translation for the same source word in different word pairs.
- Flag any difficult translations, and comment in general if you have doubts or any remarks
- If you cannot find two distinct words for source and target word, please flag it as difficult, and add a comment
- If you find yourself using identical translation for two pairs, please add a comment
- You can use the provided Croatian translation as an additional source
- if gender is not marked in English (e.g. cat), try to pick the most natural one in Slovene, and if there is no clear gender interpretation, follow the Croatian one
- Translate word pairs (and not single words): constituent words in a pair act as a disambiguation signal. Regarding multiple senses, try to pick the one that is most true to the . . .
                                            

Show simple item record