What's New
corpus

Description:
This corpus collects and annotates the extensive and highly valuable diachronic collection of 37,390 Slovenian proverbs, 50 years and more in the making at the ZRC SAZU Institute of Slovenian Ethnology. Each proverb is ...
This item contains 3 files (22.19
MB).
Publicly Available


toolService

Description:
This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation ...
This item contains 1 file (142.95
MB).
Publicly Available
corpus

Description:
The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom, annotated with a 6-level sentiment schema ...
This item contains 8 files (7.43
MB).
Publicly Available



Most Viewed Items
Top Last Week
corpus

Description:
The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities.
The ...
This item contains 3 files (10.91
MB).
Publicly Available



corpus

Description:
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also ...
This item contains 3 files (91.53
MB).
Publicly Available



corpus

Description:
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora being between 9 and 125 million words in size.
The ...
This item contains 27 files (5.22
GB).
Publicly Available

