Show simple item record

 
dc.contributor.author Klemen, Matej
dc.contributor.author Kosem, Iztok
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Pollak, Senja
dc.contributor.author Huber, Damjan
dc.contributor.author Lutar, Mateja
dc.date.accessioned 2023-10-20T16:17:51Z
dc.date.available 2023-10-20T16:17:51Z
dc.date.issued 2023-10-19
dc.identifier.uri http://hdl.handle.net/11356/1877
dc.description The KUUS corpus comprises 17 textbooks and 7 workbooks (over 700,000 words) for Slovenian as a second and foreign language. Published between 2002 and 2023 at the Centre for Slovene as a Second and Foreign Language (Faculty of Arts, University of Ljubljana), these textbooks were widely used in the teaching of Slovenian as a second and foreign language to children, adolescents and adults in Slovenia and abroad at the time of the creation of the corpus. The metadata for each text includes its title, subtitle, authors, year of publication, publisher, CEFR level, target group and, for the textbooks, the number of estimated hours of the lessons. The corpus is linguistically annotated with the CLASSLA pipeline (https://github.com/clarinsi/classla/) at the levels of tokenization, sentence segmentation, lemmatization, MULTEXT-East v6 MSD-tags (https://nl.ijs.si/ME/V6/msd/html/msd-sl.html), JOS dependency syntax (https://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf), and named entities (https://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf). As opposed to the previous 1.0 version of the corpus, the 2.0 version has been enlarged by 7 workbooks from sets whose textbooks were already part of KUUS 1.0. It is available not only in CoNLL-U format but also in TEI XML, and in vertical encoding. The corpus KUUS 1.0 is presented in more detail in: KLEMEN, Matej, ARHAR HOLDT, Špela, POLLAK, Senja, KOSEM, Iztok, HUBER, Damjan, LUTAR, Mateja, 2022: Korpus učbenikov za učenje slovenščine kot drugega in tujega jezika. Nataša Pirih Svetina, Ina Ferbežar (eds.): Na stičišču svetov: slovenščina kot drugi in tuji jezik. Obdobja 41. Ljubljana: Založba Univerze v Ljubljani. 165–174. DOI: https://doi.org/10.4312/Obdobja.41.2784-7152 Note that a sample of the KUUS corpus, ccKUUS (http://hdl.handle.net/11356/1878) is available under a more premissive licence than KUUS and also searchabe via the CLARIN.SI concordancers.
dc.language.iso slv
dc.publisher Centre for Slovene as a Second and Foreign Language, University of Ljubljana
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://doi.org/10.4312/Obdobja.41.2784-7152
dc.relation.replaces http://hdl.handle.net/11356/1696
dc.rights CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
dc.rights.uri https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0
dc.rights.label ACA
dc.source.uri https://centerslo.si/KUUS
dc.subject textbook corpus
dc.subject L2
dc.subject language learning
dc.title Corpus of textbooks for learning Slovenian as L2 KUUS 2.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Matej Klemen matej.klemen@ff.uni-lj.si Centre for Slovene as a Second and Foreign Language, University of Ljubljana
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS J7-3159 Empirical foundations for digitally-supported development of writing skills nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor Centre for Slovene as a Second and Foreign Language, University of Ljubljana - KUUS ownFunds
sponsor Ministry of Culture - Upgrade of KOST and KUUS corpora for Slovenian as second of foreign language nationalFunds
size.info 24 texts
size.info 727393 words
files.count 3
files.size 40841193


 Files in this item

 Download all files in item (38.95 MB)
This item is
Academic Use
and licensed under:
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
Inform Before Use Attribution Required Noncommercial
Icon
Name
KUUS.TEI.ana.zip
Size
18.8 MB
Format
application/zip
Description
Corpus in source TEI format with linguistic annotations
MD5
bee69f8fe3531e9e40ed62a17e951c1f
 Download file
Icon
Name
KUUS.conll.zip
Size
9.39 MB
Format
application/zip
Description
Corpus in derived CoNLL-U format with TSV metadata
MD5
f9a8ef5bd34796bb3f61255ee12d697e
 Download file
Icon
Name
KUUS.vert.zip
Size
10.77 MB
Format
application/zip
Description
Corpus in derived vertical format with registry file
MD5
c15b0ff6f99c9faf3cab45febb946a82
 Download file

Show simple item record