Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Ferme, Marko
dc.contributor.author Borovič, Mladen
dc.contributor.author Boškovič, Borko
dc.contributor.author Ojsteršek, Milan
dc.contributor.author Hrovat, Goran
dc.date.accessioned 2019-12-24T14:42:48Z
dc.date.available 2019-12-24T14:42:48Z
dc.date.issued 2019-11-28
dc.identifier.uri http://hdl.handle.net/11356/1265
dc.description The KAS-dr corpus of Slovene PhD theses consists of almost 1,600 texts (266 thousand pages or 100 million tokens) written 2000 - 2018 and gathered from the digital libraries of Slovene higher education institutions via the Slovene Open Science portal (http://openscience.si). The theses have associated with them significant metadata, while each thesis in the corpus contains its textual body, i.e. without their front and back matter. The body is divided into pages, these into paragraphs, and then into sentences. The sentence tokens are morphosyntactically annotated, words are lemmatised and English-Slovene pairs of term candidates are marked up and linked. Slovene monolingual term candidates are also marked up. The corpus is distributed in the canonical TEI encoding, in the so called vertical format used by the (no)Sketch Engine and CWB concordancers, and as plain text files. Each format distribution also contains a file with thesis metadata. This repository entry contains the corpus of PhD theses only; separate entries are available that contain MSc/MA theses (KAS-mag: http://hdl.handle.net/11356/1266), BSc/BA theses (KAS-dipl: http://hdl.handle.net/11356/1267) and the complete KAS corpus with all three (KAS: http://hdl.handle.net/11356/1244).
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.publisher Faculty of Electrical Engineering and Computer Science, University of Maribor
dc.relation.isreferencedby https://rdcu.be/b7GrB
dc.rights CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
dc.rights.uri https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0
dc.rights.label ACA
dc.source.uri http://nl.ijs.si/kas/
dc.subject PhD theses
dc.subject academic writing
dc.subject terminology
dc.subject TEI
dc.title Corpus of Academic Slovene (PhD theses) KAS-dr 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds
size.info 1569 texts
size.info 266423 pages
size.info 101473395 tokens
files.count 3
files.size 2706506388
featuredService.kontext search|https://www.clarin.si/kontext/first_form?corpname=kas_dr
featuredService.noske search|https://www.clarin.si/ske/#dashboard?corpname=kas&struct_attr_stats=1&subcorpora=1


 Files in this item

This item is
Academic Use
and licensed under:
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
Inform Before Use Attribution Required Noncommercial
Icon
Name
kasDr.tei.tar.gz
Size
1.09 GB
Format
application/gzip
Description
Corpus in TEI format
MD5
f8a47a10d144fd40bcf8b35bc72c8dc9
 Download file
Icon
Name
kasDr.vert.tar.gz
Size
1.08 GB
Format
application/gzip
Description
Corpus in derived vertical format
MD5
328f7b9ff7c7c7184fe1ba074c550785
 Download file
Icon
Name
kasDr.txt.tar.gz
Size
351.73 MB
Format
application/gzip
Description
Corpus in plain text format
MD5
4f37b0d4c1436bd841c7ace62e74517a
 Download file

Show simple item record