What's New
corpus

Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 77 publishers. Trendi 2025-02 covers the period from January 2019 to February 2025, complementing the ...
This item contains no files.
corpus

Description:
Gos 2.1 is the reference speech corpus of the Slovenian language. This edition contains about 300 hours of speech, or 2.4 million words, 127 thousand utterances and 1,500 texts. It is composed from three different ...
This item contains 4 files (100.88
GB).
Restricted Use


toolService

Description:
This model for lemmatisation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/ ...
This item contains 1 file (2.09
MB).
Publicly Available



Most Viewed Items
Top Last Week
lexicalConceptualResource

Description:
Словарь современного русского литературного языка. This is the largest completed Academic dictionary of the Russian language. It spans 17 volumes and covers the vocabulary from the end of the 18th century until the ...
This item contains no files.
corpus

Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 30 files (5.87
GB).
Publicly Available


corpus

Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
This item contains 18 files (23.37
GB).
Publicly Available

