What's New
corpus
Description:
The multilingual IPTC Media Topic dataset EMMediaTopic 1.0 is a collection of news articles in Catalan, Croatian, Greek, and Slovenian, automatically annotated with the 17 top-level topic labels from the IPTC NewsCodes ...
Ta vnos vsebuje 1 datoteko (71.3
MB).
Publicly Available
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-11 covers the period from January 2019 to November 2024, complementing the ...
Ta vnos ne vsebuje datotek.
lexicalConceptualResource
Description:
ArboSloleks is a dataset containing Slovene word formation trees that have been automatically constructed from word relations (http://hdl.handle.net/11356/1986) extracted from Sloleks 2.0 (http://hdl.handle.net/11356/1230). ...
Ta vnos vsebuje 1 datoteko (2.53
MB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available
corpus
Description:
The Serbian web corpus srWaC was built by crawling the .rs top-level domain in 2014. The corpus was near-deduplicated on paragraph level, normalised via diacritic restoration, morphosyntactically annotated and lemmatised. ...
Ta vnos vsebuje 6 datotek(e) (3.51
GB).
Publicly Available
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (65.97
GB).
Publicly Available