What's New
corpus
Description:
GaMS-Instruct-DH is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions. It consists of pairs of prompts and responses, some of which contain an additional context ...
This item contains 1 file (888.96
KB).
Publicly Available
corpus
Description:
GaMS-Instruct-GEN is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions. It consists of pairs of prompts and responses, some of which contain an additional input ...
This item contains 1 file (3.12
MB).
Publicly Available
toolService
Description:
The X-GENRE classifier is a text classification model that can be used for automatic genre identification. The model classifies texts to one of 9 genre labels: Information/Explanation, News, Instruction, Opinion/Argumentation, ...
This item contains 1 file (779.93
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 30 files (5.87
GB).
Publicly Available
corpus
Description:
The CVET corpus contains 230 texts (around 175 thousand words) of varying length, published in the religious journal "Cvetje z vertov sv. Frančiška" between 1887 and 1916, when the magazine was edited by the linguist Fr. ...
This item contains 4 files (15.02
MB).
Publicly Available
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
This item contains 18 files (23.37
GB).
Publicly Available