What's New
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-12 covers the period from January 2019 to December 2024, complementing the ...
This item contains no files.
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such as part-of-speech taggers, parsers, prosodic ...
This item contains 2 files (1.33
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Share Alike Share Alike](/repository/xmlui/themes/UFAL/images/licenses/sa.png)
lexicalConceptualResource
![lexicalConceptualResource](themes/UFALHome/lib/images/lexicalConceptualResource.png)
Description:
The Western South Slavic verbal database (WeSoSlaV) contains 3000 most frequent Slovenian and 5300 most frequent BCMS verbs which are all coded for a number of properties spanning from their phonology, morphology to their ...
This item contains 3 files (11.43
MB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
Most Viewed Items
Top Last Week
lexicalConceptualResource
![lexicalConceptualResource](themes/UFALHome/lib/images/lexicalConceptualResource.png)
Description:
A lexicon of 751 emoji characters with automatically assigned sentiment.
The sentiment is computed from 70,000 tweets, labeled by 83 human annotators
in 13 European languages.
The process and analysis of emoji sentiment ...
This item contains 3 files (93.95
KB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Share Alike Share Alike](/repository/xmlui/themes/UFAL/images/licenses/sa.png)
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 30 files (5.87
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,7 billion tokens) written 2000 - 2018 and gathered from the digital ...
This item contains 6 files (42.11
GB).
Academic Use
![Inform Before Use Inform Before Use](/repository/xmlui/themes/UFAL/images/licenses/inf.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Noncommercial Noncommercial](/repository/xmlui/themes/UFAL/images/licenses/nc.png)