Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Barbu, Ana-Maria
dc.contributor.author Derzhanski, Ivan
dc.contributor.author Dimitrova, Ludmila
dc.contributor.author Garabík, Radovan
dc.contributor.author Ide, Nancy
dc.contributor.author Kaalep, Heiki-Jaan
dc.contributor.author Kotsyba, Natalia
dc.contributor.author Krstev, Cvetana
dc.contributor.author Oravecz, Csaba
dc.contributor.author Petkevič, Vladimír
dc.contributor.author Priest-Dorman, Greg
dc.contributor.author QasemiZadeh, Behrang
dc.contributor.author Radziszewski, Adam
dc.contributor.author Simov, Kiril
dc.contributor.author Tufiş, Dan
dc.contributor.author Zdravkova, Katerina
dc.date.accessioned 2015-06-15T08:51:55Z
dc.date.available 2015-06-15T08:51:55Z
dc.date.issued 2010-05-14
dc.identifier.uri http://hdl.handle.net/11356/1043
dc.description The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. This version of the corpus contains the linguistically annotated texts, with each word tagged by its lemma and its MULTEXT(-East) morphosyntactic description (MSD, i.e., a fine-grained feature-structure based PoS tag). The structurally annotated texts are a separate submission (http://hdl.handle.net/11356/1044), also with somewhat different languages.
dc.language.iso bul
dc.language.iso ces
dc.language.iso eng
dc.language.iso est
dc.language.iso fas
dc.language.iso hun
dc.language.iso mkd
dc.language.iso pol
dc.language.iso ron
dc.language.iso slk
dc.language.iso slv
dc.language.iso srp
dc.publisher Jožef Stefan Institute
dc.relation info:eu-repo/grantAgreement/EC/FP7/211938
dc.relation.isreferencedby https://doi.org/10.1007/s10579-011-9174-8
dc.rights MULTEXT-East licence
dc.rights.uri https://nl.ijs.si/ME/mte-licence.txt
dc.rights.label ACA
dc.source.uri http://nl.ijs.si/ME/Vault/V4/
dc.subject parallel corpus
dc.subject part-of-speech tagging
dc.subject multilingual
dc.subject Slavic languages
dc.subject manual annotation
dc.subject TEI
dc.title MULTEXT-East "1984" annotated corpus 4.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri http://nl.ijs.si/ME/Vault/V4/doc/#sec-orwell
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor EU Copernicus COP-106 MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages Other
sponsor EU Copernicus TELRI Trans-European Language Resources Infrastructure Other
sponsor EU Copernicus CONCEDE Consortium for Central European Dictionary Encoding Other
sponsor FP7 Capacities MONDILEX Conceptual Modelling of Networking of Centres for High-Quality Research in Slavic Lexicography and Their Digital Resources euFunds info:eu-repo/grantAgreement/EC/FP7/211938
size.info 12 texts
size.info 79718 sentences
size.info 1064424 words
files.count 1
files.size 14800805


 Files in this item

This item is
Academic Use
and licensed under:
MULTEXT-East licence
Attribution Required Noncommercial
Icon
Name
MTE1984-ana.zip
Size
14.12 MB
Format
application/zip
Description
TEI encoded texts and sentence alignments
MD5
16a2fefbda7763d07531d2ba052e390b
 Download file

Show simple item record