What's New
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-12 covers the period from January 2019 to December 2024, complementing the ...
Ta vnos ne vsebuje datotek.
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such as part-of-speech taggers, parsers, prosodic ...
Ta vnos vsebuje 2 datotek(e) (1.33
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Share Alike Share Alike](/repository/xmlui/themes/UFAL/images/licenses/sa.png)
lexicalConceptualResource
![lexicalConceptualResource](themes/UFALHome/lib/images/lexicalConceptualResource.png)
Description:
The Western South Slavic verbal database (WeSoSlaV) contains 3000 most frequent Slovenian and 5300 most frequent BCMS verbs which are all coded for a number of properties spanning from their phonology, morphology to their ...
Ta vnos vsebuje 3 datotek(e) (11.43
MB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
Največ ogledov
V preteklem tednu
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation ...
Ta vnos vsebuje 3 datotek(e) (1.89
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Noncommercial Noncommercial](/repository/xmlui/themes/UFAL/images/licenses/nc.png)
![No Derivative Works No Derivative Works](/repository/xmlui/themes/UFAL/images/licenses/nd.png)
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
Ta vnos vsebuje 30 datotek(e) (5.87
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
lexicalConceptualResource
![lexicalConceptualResource](themes/UFALHome/lib/images/lexicalConceptualResource.png)
Description:
A lexicon of 751 emoji characters with automatically assigned sentiment.
The sentiment is computed from 70,000 tweets, labeled by 83 human annotators
in 13 European languages.
The process and analysis of emoji sentiment ...
Ta vnos vsebuje 3 datotek(e) (93.95
KB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Share Alike Share Alike](/repository/xmlui/themes/UFAL/images/licenses/sa.png)