What's New
toolService

Description:
Docker image with ASR evaluation tool that has support for WER calculation on punctuated and capitalised transcripts. The UI allows uploading the reference and predicted transcripts, and choice to perform WER calculation ...
Ta vnos vsebuje 1 datoteko (185.58
MB).
Publicly Available
corpus

Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-08 covers the period from January 2019 to August 2025, complementing the Gigafida ...
Ta vnos ne vsebuje datotek.
corpus

Description:
GaMS-Instruct-MED is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain. It consists of units of prompts, instrutions and responses from the ...
Ta vnos vsebuje 1 datoteko (41.6
MB).
Publicly Available


Največ ogledov
V preteklem tednu
corpus

Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
Ta vnos vsebuje 3 datotek(e) (1.89
GB).
Publicly Available




corpus

Description:
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 31 datotek(e) (5.94
GB).
Publicly Available


toolService

Description:
The model for lemmatisation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), ...
Ta vnos vsebuje 1 datoteko (90.05
MB).
Publicly Available


