CorefUD conversion of Slovene coreference resolution corpus coref149 v1.0 CC BY-NC-SA 4.0 This corpus is the CorefUD conversion of the coref149 corpus for coreference resolution in Slovene ( It contains 149 documents annotated with coreference information: 100 training and 49 test documents. The test documents were selected according to the underlying cluster distribution: most documents contain a small to medium amount of clusters while a few contain a large amount of clusters. Coreference in Universal Dependencies (CorefUD) is an initiative to collect coreference corpora in various languages and harmonize them to the same scheme and data format (CoNLL-U). The coreference information is stored in the MISC column. More concretely, the start and end of each coreference mention is marked with the "Entity=" attribute. For example, "Entity=(e0" marks the start of the entity e0 at the current token while "Entity=e0) marks the end of the entity e0 at the current token. For full details on the format, please see To ensure compliance with the CoNLL-U format, corpus annotations were automatically obtained with trankit v1.1.2. CoNLL-U column information: - ID: Word index, integer starting at 1 for each new sentence. - FORM: Word form. - LEMMA: Lemma of word form. - UPOS: Universal part-of-speech tag. - XPOS: Slovene language-specific (MULTEXT-East V6) morphosyntactic tag. - FEATS: Morphological features. - HEAD: Dependency head of the current word. - DEPREL: Universal dependency relation to the head of the current word. - DEPS: Enhanced dependency graph information; not present, always "_". - MISC: Coreference information in the CorefUD format (see above for an example); copied and converted from the original corpus. For more information, please see the CoNLL-U format specification: