What's New
toolService
Description:
The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that ...
Ta vnos vsebuje 1 datoteko (231.07
MB).
Publicly Available
corpus
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 107 media websites, published by 77 publishers. Trendi 2024-08 covers the period from January 2019 to August 2024, complementing the Gigafida ...
Ta vnos ne vsebuje datotek.
toolService
Description:
This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation of the SSJ UD treebank of written Slovenian ...
Ta vnos vsebuje 1 datoteko (145.44
MB).
Publicly Available
Največ ogledov
V preteklem tednu
corpus
Description:
The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. The submission contains 7 files: ...
Ta vnos vsebuje 8 datotek(e) (616.88
MB).
Publicly Available
corpus
Description:
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the ...
Ta vnos vsebuje 18 datotek(e) (2.17
GB).
Publicly Available
corpus
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available