What's New
corpus

Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-05 covers the period from January 2019 to May 2025, complementing the Gigafida ...
Ta vnos ne vsebuje datotek.
lexicalConceptualResource

Description:
This dataset contains lists of delexicalized dependency trees and subtrees extracted from the English UD GUM corpus, version 2.15 (http://hdl.handle.net/11234/1-5787), using the STARK tool (https://github.com/clarinsi/STARK). ...
Ta vnos vsebuje 6 datotek(e) (42.39
MB).
Publicly Available


lexicalConceptualResource

Description:
This dataset contains lists of delexicalized dependency trees and subtrees extracted from the Slovenian UD corpora SSJ (written) and SST (spoken), version 2.15 (http://hdl.handle.net/11234/1-5787), using the STARK tool ...
Ta vnos vsebuje 6 datotek(e) (74.12
MB).
Publicly Available


Največ ogledov
V preteklem tednu
lexicalConceptualResource

Description:
A list of headwords from the collection "Besede slovenskega jezika" (Words of Slovenian Language).
Ta vnos vsebuje 1 datoteko (997.48
KB).
Publicly Available



corpus

Description:
This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian.
Description of the Datasets.
There ...
Ta vnos vsebuje 12 datotek(e) (9.95
GB).
Publicly Available




corpus

Description:
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations ...
Ta vnos vsebuje 1 datoteko (14.12
MB).
Academic Use

