Files in this item

 Download all files in item (1.74 MB)
Icon
Name
coref149_corefud_train.conllu
Size
1.19 MB
Format
Unknown
Description
Labeled coref149 training set in CoNLL-U format.
MD5
52637114f35442028eb178647c71a872
 Download file
Icon
Name
coref149_corefud_test_unlabeled.conllu
Size
563.83 KB
Format
Unknown
Description
Unlabeled coref149 test set in CoNLL-U format.
MD5
f72542c02f9149250a19010d0e835428
 Download file
Icon
Name
README.txt
Size
1.91 KB
Format
Text file
Description
Description of the resource.
MD5
bf4de0b48e5d082dac5e4cb71121b209
 Download file  Preview
 File Preview  
CorefUD conversion of Slovene coreference resolution corpus coref149
v1.0
http://hdl.handle.net/11356/1989
CC BY-NC-SA 4.0

This corpus is the CorefUD conversion of the coref149 corpus for coreference resolution in Slovene (http://hdl.handle.net/11356/1182). It contains 149 documents annotated with coreference information: 100 training and 49 test documents. The test documents were selected according to the underlying cluster distribution: most documents contain a small to medium amount of clusters while a few contain a large amount of clusters.

Coreference in Universal Dependencies (CorefUD) is an initiative to collect coreference corpora in various languages and harmonize them to the same scheme and data format (CoNLL-U).
The coreference information is stored in the MISC column. More concretely, the start and end of each coreference mention is marked with the "Entity=" attribute. For example, "Entity=(e0" marks the start of the entity e0 at the current token while "Entity=e0) marks . . .