Show simple item record

 
dc.contributor.author Batanović, Vuk
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Samardžić, Tanja
dc.contributor.author Erjavec, Tomaž
dc.date.accessioned 2018-08-25T15:02:15Z
dc.date.available 2018-08-25T15:02:15Z
dc.date.issued 2018-08-20
dc.identifier.uri http://hdl.handle.net/11356/1200
dc.description The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities. The annotations (and other aspects) of the corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications, http://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf.
dc.language.iso srp
dc.publisher Regional Linguistic Data Initiative Centre ReLDI
dc.relation.isreferencedby http://www.aclweb.org/anthology/W17-1407
dc.relation.isreplacedby http://hdl.handle.net/11356/1843
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/vukbatanovic/SETimes.SR
dc.subject part-of-speech tagging
dc.subject dependency treebank
dc.subject parsing
dc.subject named entities
dc.subject tokenisation
dc.subject manual annotation
dc.subject TEI
dc.title Training corpus SETimes.SR 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor Swiss National Science Foundation 160501 ReLDI Other
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info 163 texts
size.info 3891 sentences
size.info 86726 tokens
files.count 3
files.size 11443341
featuredService.kontext Search|https://www.clarin.si/kontext/first_form?corpname=setimes_sr
featuredService.noske Search|https://www.clarin.si/ske/#dashboard?corpname=setimes_sr


 Files in this item

 Download all files in item (10.91 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
setimes-sr.TEI.zip
Size
8.35 MB
Format
application/zip
Description
Corpus in TEI format
MD5
f4046d35d69e50abe5d52611d6cc5af9
 Download file  Preview
 File Preview  
  • setimes-sr.TEI
    • setimes-sr.back.xml82 kB
    • setimes-sr.xml12 kB
    • setimes-sr.body.xml14 MB
    • TEI-schema
      • tei_clarin_schema.xml2 kB
      • tei_clarin_example.xml31 kB
      • README.md442 B
      • schema
        • tei_clarin.zip47 kB
        • tei_clarin.rnc206 kB
        • tei_clarin.dtd167 kB
        • tei_clarin.rng424 kB
      • doc
        • tei_clarin_doc.xml3 MB
        • tei_clarin_doc.html2 MB
        • tei_clarin_doc.docx698 kB
        • tei_clarin_doc.pdf5 MB
    • 00README.txt204 B
Icon
Name
setimes-sr.vert.zip
Size
1.2 MB
Format
application/zip
Description
Corpus in derived vertical format
MD5
8837a291416db22a50bc76e44631e4a7
 Download file  Preview
 File Preview  
Icon
Name
setimes-sr.conll.zip
Size
1.36 MB
Format
application/zip
Description
Source CoNLL-like format from GitHub (commit f682c1b)
MD5
7c58d701908a5d92ac1547fef4f347a9
 Download file  Preview
 File Preview  
  • setimes-sr.conll
    • set.sr.conll11 MB
    • README.md1 kB
    • msd_mapper.py6 kB
    • 00README.txt204 B
    • mte5-udv2.mapping122 kB

Show simple item record