Show simple item record

 
dc.contributor.author Kosem, Iztok
dc.contributor.author Arhar Holdt, Špela
dc.contributor.author Stritar Kučuk, Mojca
dc.contributor.author Krek, Simon
dc.contributor.author Krapš Vodopivec, Irena
dc.contributor.author Stabej, Marko
dc.contributor.author Kocjančič, Polonca
dc.contributor.author Laskowski, Cyprian
dc.contributor.author Klemenc, Bojan
dc.contributor.author Pori, Eva
dc.contributor.author Rozman, Tadeja
dc.date.accessioned 2019-11-08T07:58:49Z
dc.date.available 2019-11-08T07:58:49Z
dc.date.issued 2019-07-08
dc.identifier.uri http://hdl.handle.net/11356/1219
dc.description Šolar 2.0 Clear is an adapted version of the Šolar 2.0 corpus, cf. http://hdl.handle.net/11356/1214. The Šolar 2.0 Clear corpus consists of texts written by students in Slovene primary and secondary schools. School essays form the majority of the corpus while other material includes texts created during lessons, such as text recapitulations or descriptions, examples of formal applications etc. For each text, the information on school (elementary or secondary), subject, level (grade or year), type of text, region and date of production is provided. Unlike the original Šolar 2.0 corpus (http://hdl.handle.net/11356/1214), Šolar 2.0 Clear includes student texts only: error annotations and other types of feedback from the teachers have been removed. The corpus can thus be used for processing tasks where the inclusion of corrections hinders or complicates the procedures (e.g. for comparative data extraction, training of language models etc).
dc.language.iso slv
dc.publisher Trojina, Institute for Applied Slovene Studies
dc.publisher Centre for Language Resources and Technologies, University of Ljubljana
dc.relation.replaces http://hdl.handle.net/11356/1150
dc.relation.isreplacedby http://hdl.handle.net/11356/1589
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri https://www.cjvt.si/raziskovalno-delo/projekti-cjvt/korpus-solar/
dc.subject student writing
dc.subject developmental corpus
dc.title Developmental corpus (without language corrections) Šolar 2.0 Clear
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Iztok Kosem iztok.kosem@trojina.si Trojina, Institute for Applied Slovene Studies
sponsor ARRS (Slovenian Research Agency) I0-0051 Centre for Applied Linguistics (CUJ) nationalFunds
sponsor Ministry of Culture 3340-15-141006 Upgrade of Šolar Corpus nationalFunds
sponsor University of Ljubljana I0-0022 Network of Research Infrastructure Centres (MRIC) nationalFunds
size.info 5485 texts
size.info 1638229 words
size.info 1907731 tokens
files.count 2
files.size 30636315


 Files in this item

 Download all files in item (29.22 MB)
Icon
Name
Solar2.0-Clear.zip
Size
20.01 MB
Format
application/zip
Description
Corpus in TEI format
MD5
d64dcf5c3ddbb851771f435a5d2af58a
 Download file  Preview
 File Preview  
  • Solar2.0-Clear
    • solar2-clear.xml156 MB
    • schema
      • tei_clarin_schema.xml3 kB
      • tei_clarin.rnc305 kB
      • tei_clarin.dtd239 kB
      • tei_clarin.sch496 B
      • tei_clarin.xsd667 kB
      • tei_clarin.rng612 kB
      • dcr.tmp1 kB
    • 00README.txt237 B
Icon
Name
Solar2.0-Clear.vert.zip
Size
9.21 MB
Format
application/zip
Description
Corpus in derived vertical (Sketch Engine / CQP) format
MD5
025edfded5d2e17c58697ea5a55d7d09
 Download file  Preview
 File Preview  

Show simple item record