What's New
corpus

Description:
The SI-IUS collection of older law texts is meant to be used both as a digital library and as a language corpus. For the former, each text has been carefully annotated in TEI preserving e.g. different types of divisions ...
This item contains 3 files (931.56
MB).
Publicly Available


corpus

Description:
The EEC-SL dataset is a localised and adapted version of the Equity Evaluation Corpus (EEC, Kiritchenko and Mohammad, 2018, https://aclanthology.org/S18-2005/). It consists of 8,640 sentences which were automatically ...
This item contains 1 file (147.49
KB).
Publicly Available


corpus

Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-09 covers the period from January 2019 to September 2025, complementing the ...
This item contains no files.
Most Viewed Items
Top Last Week
corpus

Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (5.94
GB).
Publicly Available


corpus

Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
This item contains 3 files (1.89
GB).
Publicly Available




corpus

Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 31 files (69.17
GB).
Publicly Available

