Show simple item record

 
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Koloski, Boshko
dc.contributor.author Zdravkovska, Kristina
dc.contributor.author Kuzman, Taja
dc.date.accessioned 2022-10-22T12:13:05Z
dc.date.available 2022-10-22T12:13:05Z
dc.date.issued 2022-09-26
dc.identifier.uri http://hdl.handle.net/11356/1687
dc.description The COPA-MK dataset (Choice of plausible alternatives in Macedonian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology (https://arxiv.org/abs/2005.00333). The dataset consists of 1,000 premises (My body cast a shadow over the grass), each given a question (What is the cause? / What happened as a result?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising). The dataset follows the same format as the Croatian COPA-HR dataset (http://hdl.handle.net/11356/1404). It is split into training (400 instances), validation (100 instances) and test (500 instances) JSONL files. Translation quality was ensured with the help of the ReLDI Centre Belgrade (https://reldi.spur.uzh.ch).
dc.language.iso mkd
dc.publisher Jožef Stefan Institute
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.si/info/k-centre/
dc.subject commonsense reasoning
dc.subject manual annotation
dc.subject manual translation
dc.title Choice of plausible alternatives dataset in Macedonian COPA-MK
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Connecting Europe Facility (CEF) Telecom INEA/CEF/ICT/A2020/2278341 MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages Other
size.info 3 files
size.info 1000 items
size.info 258350 bytes
files.count 3
files.size 259292


 Files in this item

 Download all files in item (253.21 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
train.jsonl
Size
103.31 KB
Format
Unknown
Description
Training dataset
MD5
d7577c3804a32edf7169f5f060afa6e4
 Download file
Icon
Name
val.jsonl
Size
25.47 KB
Format
Unknown
Description
Validation dataset
MD5
dcfcdad1cabb3e2ee08415e4d460d62e
 Download file
Icon
Name
test.jsonl
Size
124.43 KB
Format
Unknown
Description
Test dataset
MD5
cc6011a17a24c1e8f233aeeb797620d5
 Download file

Show simple item record