What's New
corpus

Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-05 covers the period from January 2019 to May 2025, complementing the Gigafida ...
Ta vnos ne vsebuje datotek.
lexicalConceptualResource

Description:
This dataset contains lists of delexicalized dependency trees and subtrees extracted from the English UD GUM corpus, version 2.15 (http://hdl.handle.net/11234/1-5787), using the STARK tool (https://github.com/clarinsi/STARK). ...
Ta vnos vsebuje 6 datotek(e) (42.39
MB).
Publicly Available


lexicalConceptualResource

Description:
This dataset contains lists of delexicalized dependency trees and subtrees extracted from the Slovenian UD corpora SSJ (written) and SST (spoken), version 2.15 (http://hdl.handle.net/11234/1-5787), using the STARK tool ...
Ta vnos vsebuje 6 datotek(e) (74.12
MB).
Publicly Available


Največ ogledov
V preteklem tednu
corpus

Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available


corpus

Description:
This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian.
Description of the Datasets.
There ...
Ta vnos vsebuje 12 datotek(e) (9.95
GB).
Publicly Available




lexicalConceptualResource

Description:
A lexicon of 751 emoji characters with automatically assigned sentiment.
The sentiment is computed from 70,000 tweets, labeled by 83 human annotators
in 13 European languages.
The process and analysis of emoji sentiment ...
Ta vnos vsebuje 3 datotek(e) (93.95
KB).
Publicly Available


