dc.contributor.author | Batanović, Vuk |
dc.contributor.author | Ljubešić, Nikola |
dc.contributor.author | Samardžić, Tanja |
dc.contributor.author | Erjavec, Tomaž |
dc.date.accessioned | 2018-08-25T15:02:15Z |
dc.date.available | 2018-08-25T15:02:15Z |
dc.date.issued | 2018-08-20 |
dc.identifier.uri | http://hdl.handle.net/11356/1200 |
dc.description | The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities. The annotations (and other aspects) of the corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications, http://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf. |
dc.language.iso | srp |
dc.publisher | Regional Linguistic Data Initiative Centre ReLDI |
dc.relation.isreferencedby | http://www.aclweb.org/anthology/W17-1407 |
dc.relation.isreplacedby | http://hdl.handle.net/11356/1843 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/vukbatanovic/SETimes.SR |
dc.subject | part-of-speech tagging |
dc.subject | dependency treebank |
dc.subject | parsing |
dc.subject | named entities |
dc.subject | tokenisation |
dc.subject | manual annotation |
dc.subject | TEI |
dc.title | Training corpus SETimes.SR 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute |
sponsor | Swiss National Science Foundation 160501 ReLDI Other |
sponsor | Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds |
size.info | 163 texts |
size.info | 3891 sentences |
size.info | 86726 tokens |
files.count | 3 |
files.size | 11443341 |
featuredService.kontext | Search|https://www.clarin.si/kontext/first_form?corpname=setimes_sr |
featuredService.noske | Search|https://www.clarin.si/ske/#dashboard?corpname=setimes_sr |
Files in this item
Download all files in item (10.91 MB)This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- setimes-sr.TEI.zip
- Size
- 8.35 MB
- Format
- application/zip
- Description
- Corpus in TEI format
- MD5
- f4046d35d69e50abe5d52611d6cc5af9
- setimes-sr.TEI
- setimes-sr.back.xml82 kB
- setimes-sr.xml12 kB
- setimes-sr.body.xml14 MB
- TEI-schema
- 00README.txt204 B
- Name
- setimes-sr.vert.zip
- Size
- 1.2 MB
- Format
- application/zip
- Description
- Corpus in derived vertical format
- MD5
- 8837a291416db22a50bc76e44631e4a7
- setimes-sr.vert
- setimes_sr.regi2 kB
- setimes-sr.vert8 MB
- 00README.txt204 B
- Name
- setimes-sr.conll.zip
- Size
- 1.36 MB
- Format
- application/zip
- Description
- Source CoNLL-like format from GitHub (commit f682c1b)
- MD5
- 7c58d701908a5d92ac1547fef4f347a9
- setimes-sr.conll
- set.sr.conll11 MB
- README.md1 kB
- msd_mapper.py6 kB
- 00README.txt204 B
- mte5-udv2.mapping122 kB