What's New
corpus

Description:
Gos 2.1 is the reference speech corpus of the Slovenian language. This edition contains about 300 hours of speech, or 2.4 million words, 127 thousand utterances and 1,500 texts. It is composed from three different ...
This item contains 4 files (100.88
GB).
Restricted Use


toolService

Description:
This model for lemmatisation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/ ...
This item contains 1 file (2.09
MB).
Publicly Available



toolService

Description:
This model for morphosyntactic annotation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/Universal ...
This item contains 2 files (514.74
MB).
Publicly Available



Most Viewed Items
Top Last Week
corpus

Description:
The FRENK dataset consists of comments to Facebook posts (news articles) of mainstream media outlets from Croatia, Great Britain, and Slovenia, on the topics of migrants and LGBT. The dataset contains whole discussion ...
This item contains 1 file (4.17
MB).
Academic Use



corpus

Description:
The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. The submission contains 7 files: ...
This item contains 8 files (616.88
MB).
Publicly Available



corpus

Description:
The LiLaH-HAG dataset (HAG is short for hate-age-gender) consists of metadata on Facebook comments to Facebook posts of mainstream media in Great Britain, Flanders, Slovenia and Croatia. The metadata available in the dataset ...
This item contains 1 file (128.23
KB).
Publicly Available



