Prikaži enostavni zapis vnosa

 
dc.contributor.author Žagar, Aleš
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Munda, Tina
dc.contributor.author Brglez, Mojca
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2024-11-15T09:33:43Z
dc.date.available 2024-11-15T09:33:43Z
dc.date.issued 2024-11-15
dc.identifier.uri http://hdl.handle.net/11356/1988
dc.description Knowledge-Enhanced Winograd Schema Challenge KE-WSC is an upgraded version of the original WSC dataset. It includes the following extensions: - Annotation of semantically or syntactically solvable examples: Some samples from the original dataset can be solved without deeper semantic processing due to the morphologically richness of Slovene. For example, the sentence: “Riba je pojedla črva. Bila je lačna.” requires only the knowledge of gender and does not require any deep semantical processing to infer that the fish was hungry and not the worm. To have a representative set of syntactical samples, we decided to create 197 new examples by modifying the existing ones. - Two-Level Knowledge ontology: We developed a hierarchical scheme to categorize knowledge required to successfully solve a problem. In our analysis, we detected 9 high-level knowledge categories (social knowledge, psychological knowledge, etc.) and 37 lower-level more nuanced knowledge (physical laws/the laws of nature, social roles, causal relationships, etc.). - Semi-Automatic Explanation Generation: Textual explanations were generated using GPT-4, followed by verification and correction by human annotators to ensure accuracy and clarity. For instance, a textual explanation for the sentence “Pokal ne gre v rjav kovček, ker je prevelik.” is “Če je nekaj preveliko, se ne prilega v manjši prostor.”. - Translation to English: The finalized explanations were translated into English using a trained translator, enabling broader applicability. - SPO Triplet Generation: Subject-Predicate-Object triplets were extracted using GPT-4 to highlight key semantic relationships within each example. The dataset can be used to study knowledge explanation in models and enables knowledge-enhanced machine learning. It can be used to train a classification or generative models. It comprises 601 training samples, 200 validation samples, and 200 test samples, and is released in a tabular TSV format. The README.txt file contains a description of the attributes. The test set labels are private, as the dataset is integrated into the SloBENCH evaluation framework (https://slobench.cjvt.si/). If you use the dataset to train your models, please consider submitting the test set predictions to SloBENCH to get the evaluation score and see how it compares to others. References: Levesque, H., Davis, E., & Morgenstern, L. (2012, May). The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning.
dc.language.iso slv
dc.language.iso eng
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.subject coreference resolution
dc.subject explanations
dc.title Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Aleš Žagar ales.zagar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info 1001 entries
files.count 5
files.size 816034


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (796.91 KB)
To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
KE-WSC-train.tsv
Velikost
388.36 KB
Format
Neznano
Opis
Neznano
MD5
7d79bd460a60555d538def4e205c7771
 Prenesi datoteko
Icon
Ime
KE-WSC-test.tsv
Velikost
82.32 KB
Format
Neznano
Opis
Neznano
MD5
e5933193d4c2236e0c785a48820967f2
 Prenesi datoteko
Icon
Ime
KE-WSC-val.tsv
Velikost
124.39 KB
Format
Neznano
Opis
Neznano
MD5
4247f4c480bc9bb402d9abe9ca50dea5
 Prenesi datoteko
Icon
Ime
Knowledge-ontology.pdf
Velikost
200.5 KB
Format
PDF
Opis
Neznano
MD5
b2996ebec821c359a8a29b72fcf48e90
 Prenesi datoteko
Icon
Ime
README.md
Velikost
1.35 KB
Format
Neznano
Opis
Neznano
MD5
a8cff5b6f013b6ac90b45dd7b913c51e
 Prenesi datoteko

Prikaži enostavni zapis vnosa