What's New
corpus
Description:
The Berta Spoken Corpus contains six hours of recorded speech across a variety of interactional settings. These settings include 57 different speech events, with some captured on video and others, such as telephone or ...
Ta vnos vsebuje 4 datotek(e) (5.62
GB).
Publicly Available
corpus
Description:
This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter ...
Ta vnos vsebuje 1 datoteko (1.7
MB).
Publicly Available
corpus
Description:
The genre-enriched MaCoCu-Genre corpus collection comprises web corpora that have been automatically annotated with genre labels. The corpora can be very useful for genre-based creation of subcorpora that can be used for ...
Ta vnos vsebuje 14 datotek(e) (101.43
GB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
DGT-UD is a 2 billion word 23-language parallel syntactically parsed corpus, which consists of the JRC DGT translation memory of European law, automatically annotated with UD-Pipe 1.2 (http://ufal.mff.cuni.cz/udpipe) using ...
Ta vnos vsebuje 24 datotek(e) (24.42
GB).
Publicly Available
corpus
Description:
The Montenegrin web corpus MaCoCu-cnr 1.0 was built by crawling the ".me" internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains as well. The crawler is available at https://github.c ...
Ta vnos vsebuje 2 datotek(e) (500.14
MB).
Publicly Available
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available