Show simple item record

 
dc.contributor.author Jemec Tomazin, Mateja
dc.contributor.author Trojar, Mitja
dc.contributor.author Atelšek, Simon
dc.contributor.author Fajfar, Tanja
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Žagar Karer, Mojca
dc.date.accessioned 2021-12-07T16:51:49Z
dc.date.available 2021-12-07T16:51:49Z
dc.date.issued 2021-12-07
dc.identifier.uri http://hdl.handle.net/11356/1470
dc.description The RSDO5 corpus was compiled in order to serve as a training set for automatic term identification. It consists of 12 texts with 250,000 words and almost 38,000 manually annotated terms, each marked to be either in- or out-domain. The corpus texts were published between 2000 and 2019, are either PhD theses (3), a scientific book based on a PhD thesis (1), graduate level text books (4), or journal articles (4) and belong to the fields of biomechanics (3), linguistics (3), chemistry (3), or veterinary science (3). Apart from the manually annotated terms, the corpus was automatically annotated with Universal Dependencies annotations, i.e. tokenisation, sentence segmentation, lemmatisation, morpological features and dependency syntax. As opposed to the previous version, this one adds in- and out-domain marking on terms in the TEI and vertical files.
dc.language.iso slv
dc.publisher ZRC SAZU
dc.relation.replaces http://hdl.handle.net/11356/1400
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/terminoloski-portal
dc.subject terminology
dc.subject manual annotation
dc.subject TEI
dc.title Corpus of term-annotated texts RSDO5 1.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Mateja Jemec Tomazin mjt@zrc-sazu.si ZRC SAZU
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
size.info 12 texts
size.info 37985 terms
size.info 257029 words
size.info 310588 tokens
files.count 4
files.size 16376588
featuredService.kontext search|https://www.clarin.si/kontext/first_form?corpname=rsdo5
featuredService.noske search|https://www.clarin.si/ske/#dashboard?corpname=rsdo5&struct_attr_stats=1&subcorpora=1


 Files in this item

 Download all files in item (15.62 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
rsdo5.TEI.zip
Size
7.61 MB
Format
application/zip
Description
Corpus in source TEI format
MD5
b03fcfb68ccb30a6a9ce874d8b182732
 Download file  Preview
 File Preview  
  • rsdo5.TEI
    • rsdo5kemucb.xml911 kB
    • rsdo5kemcla.xml361 kB
    • rsdo5bimucb.xml2 MB
    • rsdo5bimdis.xml8 MB
    • schema
      • tei_clarin.rng662 kB
      • tei_clarin.sch504 B
      • dcr.tmp1 kB
      • tei_clarin.dtd248 kB
      • tei_clarin_doc.xml8 MB
      • tei_clarin_doc.html8 MB
      • tei_clarin.rnc316 kB
      • tei_clarin_example.xml31 kB
      • xml.tmp2 kB
      • tei_clarin.xsd741 kB
      • tei_clarin_schema.xml3 kB
    • rsdo5bimcla.xml839 kB
    • rsdo5vetucb.xml7 MB
    • rsdo5jezucb.xml3 MB
    • rsdo5kemdis.xml11 MB
    • rsdo5vetdis.xml6 MB
    • rsdo5jezdis.xml17 MB
    • rsdo5vetcla.xml668 kB
    • rsdo5jezcla.xml1021 kB
    • 00README.txt287 B
    • rsdo5.xml18 kB
Icon
Name
rsdo5.conllu.zip
Size
3.46 MB
Format
application/zip
Description
Corpus in CoNLL-U format
MD5
db275613749265e48451d50e4b1984b3
 Download file  Preview
 File Preview  
  • rsdo5.conllu
    • rsdo5vetcla.conllu250 kB
    • rsdo5vetucb.conllu2 MB
    • rsdo5-meta.tsv3 kB
    • rsdo5kemucb.conllu342 kB
    • rsdo5jezucb.conllu1 MB
    • rsdo5bimdis.conllu3 MB
    • rsdo5bimcla.conllu313 kB
    • rsdo5bimucb.conllu1 MB
    • rsdo5kemdis.conllu4 MB
    • 00README.txt419 B
    • rsdo5jezdis.conllu6 MB
    • rsdo5vetdis.conllu2 MB
    • rsdo5kemcla.conllu135 kB
    • rsdo5jezcla.conllu391 kB
Icon
Name
rsdo5.vert.zip
Size
3.97 MB
Format
application/zip
Description
Corpus in vertical format
MD5
6547f91f4aaabac1b426823335b8d7ca
 Download file  Preview
 File Preview  
  • rsdo5.vert
    • rsdo5jezdis.vert14 MB
    • rsdo5bimdis.vert7 MB
    • rsdo5kemdis.vert9 MB
    • rsdo5jezcla.vert887 kB
    • rsdo5vetdis.vert5 MB
    • rsdo5kemcla.vert307 kB
    • rsdo5jezucb.vert2 MB
    • rsdo5bimucb.vert2 MB
    • rsdo5vetcla.vert562 kB
    • rsdo5kemucb.vert770 kB
    • 00README.txt571 B
    • rsdo5bimcla.vert695 kB
    • rsdo5.regi2 kB
    • rsdo5vetucb.vert6 MB
Icon
Name
rsdo5.txt.zip
Size
597.15 KB
Format
application/zip
Description
Corpus in plain text format
MD5
a045c94d6f4dd9db67ba11ff978b6c4b
 Download file  Preview
 File Preview  
  • rsdo5.txt
    • rsdo5kemcla.txt11 kB
    • rsdo5bimucb.txt87 kB
    • rsdo5-meta.tsv3 kB
    • rsdo5bimdis.txt269 kB
    • rsdo5bimcla.txt25 kB
    • rsdo5jezucb.txt111 kB
    • rsdo5vetucb.txt252 kB
    • rsdo5kemdis.txt383 kB
    • rsdo5jezdis.txt557 kB
    • rsdo5vetdis.txt215 kB
    • rsdo5vetcla.txt21 kB
    • rsdo5jezcla.txt34 kB
    • 00README.txt536 B
    • rsdo5kemucb.txt28 kB

Show simple item record