Show simple item record

 
dc.contributor.author Klemen, Matej
dc.contributor.author Kosem, Iztok
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Pollak, Senja
dc.contributor.author Huber, Damjan
dc.contributor.author Lutar, Mateja
dc.date.accessioned 2022-11-14T09:50:02Z
dc.date.available 2022-11-14T09:50:02Z
dc.date.issued 2022-11-14
dc.identifier.uri http://hdl.handle.net/11356/1696
dc.description The KUUS corpus comprises 17 textbooks for Slovenian as a second and foreign language published between 2002 and 2022 at the Centre for Slovene as a Second and Foreign Language (Faculty of Arts, University of Ljubljana). These textbooks were widely used in the teaching of Slovenian as a second and foreign language to children, adolescents and adults in Slovenia and abroad at the time of the creation of the corpus. The KUUS consists of 520,796 words. It was linguistically annotated with the CLASSLA v1.1.1 pipeline (https://github.com/clarinsi/classla/) at the levels of tokenization, sentence segmentation, lemmatization, MULTEXT-East v6 MSD-tags (https://nl.ijs.si/ME/V6/msd/html/msd-sl.html), JOS dependency syntax (https://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf), and named entities (https://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf). The metadata for each of the textbooks includes the information about the title, subtitle, authors, year of publication, publisher, CEFR level, target audience, and the estimated number of lessons for the textbook. The corpus is presented in more detail in: KLEMEN, Matej, ARHAR HOLDT, Špela, POLLAK, Senja, KOSEM, Iztok, HUBER, Damjan, LUTAR, Mateja, 2022: Korpus učbenikov za učenje slovenščine kot drugega in tujega jezika. Nataša Pirih Svetina, Ina Ferbežar (eds.): Na stičišču svetov: slovenščina kot drugi in tuji jezik. Obdobja 41. Ljubljana: Založba Univerze v Ljubljani. 165–174. DOI: https://doi.org/10.4312/Obdobja.41.2784-7152
dc.language.iso slv
dc.publisher Centre for Slovene as a Second and Foreign Language, University of Ljubljana
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.isreferencedby https://doi.org/10.4312/Obdobja.41.2784-7152
dc.relation.isreplacedby http://hdl.handle.net/11356/1877
dc.rights CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
dc.rights.uri https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0
dc.rights.label ACA
dc.source.uri https://centerslo.si/KUUS
dc.subject textbook corpus
dc.subject L2
dc.subject language learning
dc.title Corpus of textbooks for learning Slovenian as L2 KUUS 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Matej Klemen matej.klemen@ff.uni-lj.si Centre for Slovene as a Second and Foreign Language, University of Ljubljana
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS J7-3159 Empirical foundations for digitally-supported development of writing skills nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor Centre for Slovene as a Second and Foreign Language, University of Ljubljana - KUUS ownFunds
size.info 520796 words
files.count 1
files.size 56636904


 Files in this item

This item is
Academic Use
and licensed under:
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
Inform Before Use Attribution Required Noncommercial
Icon
Name
L2SLO-textbook-korpus.conllu
Size
54.01 MB
Format
Unknown
Description
Corpus in CoNLL-U format
MD5
1c599bf73549ad4b99c8e965fed4056c
 Download file

Show simple item record