What's New
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-11 covers the period from January 2019 to December 2024, complementing the ...
This item contains no files.
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
Training corpus of spoken Slovenian ROG 1.0 is the main resource for Slovenian language to train and evaluate technologies aimed at processing speech or speech transcripts, such as part-of-speech taggers, parsers, prosodic ...
This item contains 2 files (1.33
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Share Alike Share Alike](/repository/xmlui/themes/UFAL/images/licenses/sa.png)
lexicalConceptualResource
![lexicalConceptualResource](themes/UFALHome/lib/images/lexicalConceptualResource.png)
Description:
The Western South Slavic verbal database (WeSoSlaV) contains 3000 most frequent Slovenian and 5300 most frequent BCMS verbs which are all coded for a number of properties spanning from their phonology, morphology to their ...
This item contains 3 files (11.43
MB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
Most Viewed Items
Top Last Week
lexicalConceptualResource
![lexicalConceptualResource](themes/UFALHome/lib/images/lexicalConceptualResource.png)
Description:
A lexicon of 751 emoji characters with automatically assigned sentiment.
The sentiment is computed from 70,000 tweets, labeled by 83 human annotators
in 13 European languages.
The process and analysis of emoji sentiment ...
This item contains 3 files (93.95
KB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Share Alike Share Alike](/repository/xmlui/themes/UFAL/images/licenses/sa.png)
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora ...
This item contains 30 files (5.87
GB).
Publicly Available
![Distributed under Creative Commons Distributed under Creative Commons](/repository/xmlui/themes/UFAL/images/licenses/cc.png)
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
corpus
![corpus](themes/UFALHome/lib/images/corpus.png)
Description:
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations ...
This item contains 1 file (14.12
MB).
Academic Use
![Attribution Required Attribution Required](/repository/xmlui/themes/UFAL/images/licenses/by.png)
![Noncommercial Noncommercial](/repository/xmlui/themes/UFAL/images/licenses/nc.png)