What's New
corpus

Description:
Gos 2.1 is the reference speech corpus of the Slovenian language. This edition contains about 300 hours of speech, or 2.4 million words, 127 thousand utterances and 1,500 texts. It is composed from three different ...
Ta vnos vsebuje 4 datotek(e) (100.88
GB).
Restricted Use


toolService

Description:
This model for lemmatisation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/ ...
Ta vnos vsebuje 1 datoteko (2.09
MB).
Publicly Available



toolService

Description:
This model for morphosyntactic annotation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/Universal ...
Ta vnos vsebuje 2 datotek(e) (514.74
MB).
Publicly Available



Največ ogledov
V preteklem tednu
corpus

Description:
The FRENK dataset consists of comments to Facebook posts (news articles) of mainstream media outlets from Croatia, Great Britain, and Slovenia, on the topics of migrants and LGBT. The dataset contains whole discussion ...
Ta vnos vsebuje 1 datoteko (4.17
MB).
Academic Use



corpus

Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available


corpus

Description:
The LiLaH-HAG dataset (HAG is short for hate-age-gender) consists of metadata on Facebook comments to Facebook posts of mainstream media in Great Britain, Flanders, Slovenia and Croatia. The metadata available in the dataset ...
Ta vnos vsebuje 1 datoteko (128.23
KB).
Publicly Available



