Show simple item record

 
dc.contributor.author Čibej, Jaka
dc.contributor.author Gantar, Kaja
dc.contributor.author Dobrovoljc, Kaja
dc.contributor.author Krek, Simon
dc.contributor.author Holozan, Peter
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Romih, Miro
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Krsnik, Luka
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2022-12-05T13:07:01Z
dc.date.available 2022-12-05T13:07:01Z
dc.date.issued 2022-12-05
dc.identifier.uri http://hdl.handle.net/11356/1745
dc.description Sloleks is a reference morphological lexicon of Slovene that was developed to be used in various NLP applications and language manuals. It contains Slovene lemmas, their inflected or derivative word forms and the corresponding grammatical description. In addition to the approx. 100,000 entries already available in Sloleks 2.0 (http://hdl.handle.net/11356/1230), Sloleks 3.0 contains an additional cca. 265,000 newly generated entries from the most frequent lemmas in Gigafida 2.0 (http://hdl.handle.net/11356/1320) not yet included in previous versions of Sloleks. For verbs, adjectives, adverbs, and common nouns, the lemmas were checked manually by three annotators and included in Sloleks only if confirmed as legitimate by at least one annotator. No manual checking was performed on proper nouns. Lemmatization rules, part-of-speech categorization and the set of feature-value pairs follow the MULTEXT-East morphosyntactic specifications for Slovenian (https://nl.ijs.si/ME/V6/msd/html/msd-sl.html). In addition to grammatical information, each word form is also given the information on its absolute corpus frequency and its compliance with the reference language standard. In addition, most entries contain information on their morphological patterns (see http://hdl.handle.net/11356/1411 for more on morphological patterns). The lexicon also includes accentuated word forms automatically generated through neural networks (Krsnik 2017). For the 100,000 entries from Sloleks 2.0, the accentuated forms were manually corrected, whereas the accentuated forms for the other 265,000 entries are fully automatic. IPA and SAMPA phonetic transcriptions were generated automatically using an improved G2P system for Slovene developed within the RSDO project (see https://github.com/clarinsi/slovene_g2p). Version 3.0 is encoded in XML, but unlike 2.0, which used the LMF format, the new version uses a custom XML format developed for the morphological lexicon by the Centre for Language Resources and Technologies of the University of Ljubljana (see the included .xsd files and "00README.txt" for details). Reference: Krsnik, Luka. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja: magistrsko delo: magistrski program druge stopnje Računalništvo in informatika. Ljubljana: [L. Krsnik], 2017. http://eprints.fri.uni-lj.si/3978/
dc.language.iso slv
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.replaces http://hdl.handle.net/11356/1230
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/en/language-resources
dc.subject morphology
dc.subject inflection
dc.subject word forms
dc.subject derivation
dc.subject lemmatisation
dc.subject word accents
dc.subject IPA
dc.subject SAMPA
dc.subject morphological patterns
dc.title Morphological lexicon Sloleks 3.0
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType lexicon
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Jaka Čibej jaka.cibej@cjvt.si Centre for Language Resources and Technologies, University of Ljubljana
sponsor Ministry of Education, Science and Sport 3311-08-986003 Communication in Slovene Other
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
size.info 365340 entries
files.count 1
files.size 251391521


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
Sloleks.3.0.zip
Size
239.75 MB
Format
application/zip
Description
Lexicon in XML
MD5
650d90e780c454f523490fac0ae84487
 Download file  Preview
 File Preview  
  • Sloleks.3.0
    • sloleks_3.0_010.xml174 MB
    • sloleks_3.0_058.xml140 MB
    • sloleks_3.0_096.xml97 MB
    • sloleks_3.0_026.xml147 MB
    • sloleks_3.0_085.xml96 MB
    • sloleks_3.0_015.xml196 MB
    • sloleks_3.0_053.xml141 MB
    • sloleks_3.0_074.xml138 MB
    • sloleks_3.0_004.xml141 MB
    • sloleks_3.0_042.xml100 MB
    • sloleks_3.0_063.xml155 MB
    • sloleks_3.0_031.xml96 MB
    • sloleks_3.0_079.xml137 MB
    • sloleks_3.0_090.xml97 MB
    • sloleks_3.0_009.xml135 MB
    • sloleks_3.0_047.xml104 MB
    • sloleks_3.0_020.xml196 MB
    • sloleks_3.0_068.xml187 MB
    • sloleks_3.0_036.xml97 MB
    • sloleks_3.0_057.xml99 MB
    • xml_schemas
      • morphological_lexicon.xsd1 kB
      • inventory.xsd29 kB
    • sloleks_3.0_095.xml96 MB
    • sloleks_3.0_025.xml172 MB
    • sloleks_3.0_102.xml47 MB
    • sloleks_3.0_084.xml97 MB
    • sloleks_3.0_014.xml105 MB
    • sloleks_3.0_052.xml108 MB
    • sloleks_3.0_073.xml133 MB
    • sloleks_3.0_003.xml178 MB
    • sloleks_3.0_041.xml98 MB
    • sloleks_3.0_089.xml97 MB
    • sloleks_3.0_062.xml124 MB
    • sloleks_3.0_019.xml174 MB
    • sloleks_3.0_030.xml97 MB
    • sloleks_3.0_078.xml100 MB
    • sloleks_3.0_008.xml182 MB
    • sloleks_3.0_046.xml101 MB
    • sloleks_3.0_067.xml97 MB
    • sloleks_3.0_035.xml97 MB
    • sloleks_3.0_056.xml110 MB
    • sloleks_3.0_094.xml97 MB
    • sloleks_3.0_024.xml163 MB
    • sloleks_3.0_101.xml97 MB
    • sloleks_3.0_083.xml97 MB
    • sloleks_3.0_013.xml191 MB
    • sloleks_3.0_051.xml96 MB
    • sloleks_3.0_099.xml97 MB
    • sloleks_3.0_072.xml131 MB
    • sloleks_3.0_029.xml98 MB
    • sloleks_3.0_002.xml138 MB
    • sloleks_3.0_040.xml96 MB
    • sloleks_3.0_088.xml96 MB
    • sloleks_3.0_061.xml118 MB
    • sloleks_3.0_018.xml93 MB
    • sloleks_3.0_077.xml155 MB
    • sloleks_3.0_007.xml133 MB
    • sloleks_3.0_045.xml96 MB
    • sloleks_3.0_066.xml153 MB
    • sloleks_3.0_034.xml98 MB
    • sloleks_3.0_055.xml137 MB
    • sloleks_3.0_093.xml96 MB
    • sloleks_3.0_023.xml125 MB
    • sloleks_3.0_100.xml97 MB
    • sloleks_3.0_082.xml98 MB
    • sloleks_3.0_039.xml96 MB
    • sloleks_3.0_012.xml152 MB
    • sloleks_3.0_050.xml100 MB
    • sloleks_3.0_098.xml96 MB
    • sloleks_3.0_071.xml76 MB
    • sloleks_3.0_028.xml88 MB
    • sloleks_3.0_001.xml181 MB
    • sloleks_3.0_087.xml96 MB
    • sloleks_3.0_060.xml127 MB
    • sloleks_3.0_017.xml88 MB
    • sloleks_3.0_076.xml137 MB
    • sloleks_3.0_006.xml177 MB
    • sloleks_3.0_044.xml114 MB
    • sloleks_3.0_065.xml118 MB
    • sloleks_3.0_033.xml100 MB
    • sloleks_3.0_054.xml97 MB
    • sloleks_3.0_092.xml97 MB
    • sloleks_3.0_049.xml97 MB
    • sloleks_3.0_022.xml156 MB
    • 00README.txt9 kB
    • sloleks_3.0_081.xml97 MB
    • sloleks_3.0_038.xml99 MB
    • sloleks_3.0_011.xml188 MB
    • sloleks_3.0_059.xml128 MB
    • sloleks_3.0_097.xml97 MB
    • sloleks_3.0_070.xml254 MB
    • sloleks_3.0_027.xml81 MB
    • sloleks_3.0_086.xml97 MB
    • sloleks_3.0_016.xml263 MB
    • sloleks_3.0_075.xml96 MB
    • sloleks_3.0_005.xml149 MB
    • sloleks_3.0_043.xml103 MB
    • sloleks_3.0_064.xml105 MB
    • sloleks_3.0_032.xml97 MB
    • sloleks_3.0_091.xml97 MB
    • sloleks_3.0_048.xml96 MB
    • sloleks_3.0_021.xml134 MB
    • sloleks_3.0_069.xml135 MB
    • sloleks_3.0_080.xml122 MB
    • sloleks_3.0_037.xml100 MB

Show simple item record