CLARIN.SI repository

What's New

corpus

Monitor corpus of Slovene Trendi 2025-12

Author(s):

Kosem, Iztok ; et al.show everyone

Kosem, Iztok ; Čibej, Jaka ; Dobrovoljc, Kaja ; Erjavec, Tomaž ; Ljubešić, Nikola ; Ponikvar, Primož ; Šinkec, Mihael ; Krek, Simon

Description:

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-12 covers the period from January 2019 to December 2025, complementing the ...

This item contains no files.

lexicalConceptualResource

CLARIN.SI data & tools

Terminological dictionary of papermaking

Author(s):

Humar, Marjeta ; et al.show everyone

Humar, Marjeta ; Kokalj, Zdenko ; Bonač, Stane ; Čekada, Andrej ; Drev, Vladimir ; Iglič, Božo ; Svetlin, Miha ; Učakar, Breda

Description:

This digital dictionary of papermaking was made on the basis of the printed edition, i.e. Marjeta Humar (ed.) Papirniški terminološki slovar. 1996. ZRC SAZU (https://doi.org/10.3986/961618220X). It is an explanatory, ...

This item contains 3 files (3.84 MB).

Publicly Available Distributed under Creative Commons

lexicalConceptualResource

CLARIN.SI data & tools

Dataset of annotated collocation-distractor pairs COLLDIST

Author(s):

Kosem, Iztok ; Arhar Holdt, Špela ; Zgaga, Karolina and Arčon, Tjaša

Description:

The dataset contains 59,598 collocation-distractor pairs for 2,856 headwords. Distractor is defined as an incorrect answer/alternative to collocation, which can be similar to collocation meaning and/or form. Headwords and ...

This item contains 1 file (1.46 MB).

Publicly Available Distributed under Creative Commons

Most Viewed Items

Top Last Week

corpus

CLARIN.SI data & tools

Map task corpus of heritage BCMS 1.0

Author(s):

Lemmenmeier-Batinić, Dolores

Description:

The Map task corpus of heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS) consists of elicited conversations (map tasks) by 29 second-generation BCMS speakers originating from different regions of former Yugoslavia and ...

This item contains 2 files (751.91 KB).

Publicly Available Distributed under Creative Commons

corpus

CLARIN.SI data & tools

ASR training dataset for Serbian JuzneVesti-SR v1.0

Author(s):

Rupnik, Peter and Ljubešić, Nikola

Description:

The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta' (https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html). The processing ...

This item contains 7 files (4.64 GB).

Publicly Available Distributed under Creative Commons

corpus

CLARIN.SI data & tools

Multilingual comparable corpora of parliamentary debates ParlaMint 2.1

Author(s):

Erjavec, Tomaž ; et al.show everyone

Erjavec, Tomaž ; Ogrodniczuk, Maciej ; Osenova, Petya ; Ljubešić, Nikola ; Simov, Kiril ; Grigorova, Vladislava ; Rudolf, Michał ; Pančur, Andrej ; Kopp, Matyáš ; Barkarson, Starkaður ; Steingrímsson, Steinþór ; van der Pol, Henk ; Depoorter, Griet ; de Does, Jesse ; Jongejan, Bart ; Haltrup Hansen, Dorte ; Navarretta, Costanza ; Calzada Pérez, María ; de Macedo, Luciana D. ; van Heusden, Ruben ; Marx, Maarten ; Çöltekin, Çağrı ; Coole, Matthew ; Agnoloni, Tommaso ; Frontini, Francesca ; Montemagni, Simonetta ; Quochi, Valeria ; Venturi, Giulia ; Ruisi, Manuela ; Marchetti, Carlo ; Battistoni, Roberto ; Sebők, Miklós ; Ring, Orsolya ; Darģis, Roberts ; Utka, Andrius ; Petkevičius, Mindaugas ; Briedienė, Monika ; Krilavičius, Tomas ; Morkevičius, Vaidas ; Diwersy, Sascha ; Luxardo, Giancarlo ; Rayson, Paul

Description:

ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...

This item contains 18 files (2.17 GB).

Publicly Available Distributed under Creative Commons

Linguistic Data and NLP Tools

Find

Citation Support (with Persistent IDs)

Deposit Free and Safe

License of your Choice (Open licenses encouraged)

Easy to Find

Easy to Cite

What's New

Most Viewed Items

Partners

Partners

Repository