Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Fišer, Darja
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Ferme, Marko
dc.contributor.author Borovič, Mladen
dc.contributor.author Boškovič, Borko
dc.contributor.author Ojsteršek, Milan
dc.contributor.author Hrovat, Goran
dc.date.accessioned 2019-12-24T14:42:09Z
dc.date.available 2019-12-24T14:42:09Z
dc.date.issued 2019-11-28
dc.identifier.uri http://hdl.handle.net/11356/1266
dc.description The KAS-mag corpus of Slovene MSc/MA theses consists of almost 16,000 texts (1,360 thousand pages or 500 million tokens) written 2000 - 2018 and gathered from the digital libraries of Slovene higher education institutions via the Slovene Open Science portal (http://openscience.si). The theses have associated with them significant metadata, while each thesis in the corpus contains its textual body, i.e. without their front and back matter. The body is divided into pages, these into paragraphs, and then into sentences. The sentence tokens are morphosyntactically annotated, words are lemmatised and English-Slovene pairs of term candidates are marked up and linked. The corpus is distributed in the canonical TEI encoding, in the so called vertical format used by the (no)Sketch Engine and CWB concordancers, and as plain text files. Each distribution format also contains a file with thesis metadata. This repository entry contains the corpus of MSc/MA theses only; separate entries are available that contain PhD theses (KAS-dr: http://hdl.handle.net/11356/1265), BSc/BA theses (KAS-dipl: http://hdl.handle.net/11356/1267) and the complete KAS corpus with all three (KAS: http://hdl.handle.net/11356/1244).
dc.language.iso slv
dc.publisher Jožef Stefan Institute
dc.publisher Faculty of Electrical Engineering and Computer Science, University of Maribor
dc.relation.isreferencedby https://rdcu.be/b7GrB
dc.rights CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
dc.rights.uri https://clarin.si/repository/xmlui/page/licence-aca-id-by-nc-inf-nored-1.0
dc.rights.label ACA
dc.source.uri http://nl.ijs.si/kas/
dc.subject MSc/MA theses
dc.subject academic writing
dc.subject terminology
dc.title Corpus of Academic Slovene (MSc/MA theses) KAS-mag 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-7094 Slovene scientific texts: resources and description nationalFunds
size.info 15996 texts
size.info 1361606 pages
size.info 495827656 tokens
files.count 3
files.size 12849715044
featuredService.kontext search|https://www.clarin.si/kontext/first_form?corpname=kas_mag
featuredService.noske search|https://www.clarin.si/ske/#dashboard?corpname=kas_mag&struct_attr_stats=1&subcorpora=1


 Files in this item

This item is
Academic Use
and licensed under:
CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0
Inform Before Use Attribution Required Noncommercial
Icon
Name
kasMag.tei.tar.gz
Size
5.28 GB
Format
application/gzip
Description
Corpus in TEI format
MD5
1173eb49e4f34d726bbeff439d2e68e9
 Download file
Icon
Name
kasMag.vert.tar.gz
Size
4.93 GB
Format
application/gzip
Description
Corpus in derived vertical format
MD5
9a6c0d431de062ae710b5ca15ae1fbda
 Download file
Icon
Name
kasMag.txt.tar.gz
Size
1.75 GB
Format
application/gzip
Description
Corpus in plain text format
MD5
e5ecf8726e0d9b05e28b71f4323dba25
 Download file

Show simple item record