dc.contributor.author |
Erjavec, Tomaž |
dc.contributor.author |
Barbu, Ana-Maria |
dc.contributor.author |
Derzhanski, Ivan |
dc.contributor.author |
Dimitrova, Ludmila |
dc.contributor.author |
Garabík, Radovan |
dc.contributor.author |
Ide, Nancy |
dc.contributor.author |
Kaalep, Heiki-Jaan |
dc.contributor.author |
Kotsyba, Natalia |
dc.contributor.author |
Krstev, Cvetana |
dc.contributor.author |
Oravecz, Csaba |
dc.contributor.author |
Petkevič, Vladimír |
dc.contributor.author |
Priest-Dorman, Greg |
dc.contributor.author |
QasemiZadeh, Behrang |
dc.contributor.author |
Radziszewski, Adam |
dc.contributor.author |
Simov, Kiril |
dc.contributor.author |
Tufiş, Dan |
dc.contributor.author |
Zdravkova, Katerina |
dc.date.accessioned |
2015-06-15T08:51:55Z |
dc.date.available |
2015-06-15T08:51:55Z |
dc.date.issued |
2010-05-14 |
dc.identifier.uri |
http://hdl.handle.net/11356/1043 |
dc.description |
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages.
This version of the corpus contains the linguistically annotated texts, with each word tagged by its lemma and its MULTEXT(-East) morphosyntactic description (MSD, i.e., a fine-grained feature-structure based PoS tag).
The structurally annotated texts are a separate submission (http://hdl.handle.net/11356/1044), also with somewhat different languages. |
dc.language.iso |
bul |
dc.language.iso |
ces |
dc.language.iso |
eng |
dc.language.iso |
est |
dc.language.iso |
fas |
dc.language.iso |
hun |
dc.language.iso |
mkd |
dc.language.iso |
pol |
dc.language.iso |
ron |
dc.language.iso |
slk |
dc.language.iso |
slv |
dc.language.iso |
srp |
dc.publisher |
Jožef Stefan Institute |
dc.relation |
info:eu-repo/grantAgreement/EC/FP7/211938
|
dc.relation.isreferencedby |
https://doi.org/10.1007/s10579-011-9174-8 |
dc.rights |
MULTEXT-East licence |
dc.rights.uri |
https://nl.ijs.si/ME/mte-licence.txt |
dc.rights.label |
ACA |
dc.source.uri |
http://nl.ijs.si/ME/Vault/V4/ |
dc.subject |
parallel corpus |
dc.subject |
part-of-speech tagging |
dc.subject |
multilingual |
dc.subject |
Slavic languages |
dc.subject |
manual annotation |
dc.subject |
TEI |
dc.title |
MULTEXT-East "1984" annotated corpus 4.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN.SI data & tools |
demo.uri |
http://nl.ijs.si/ME/Vault/V4/doc/#sec-orwell |
contact.person |
Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute |
sponsor |
EU Copernicus COP-106 MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages Other |
sponsor |
EU Copernicus TELRI Trans-European Language Resources Infrastructure Other |
sponsor |
EU Copernicus CONCEDE Consortium for Central European Dictionary Encoding Other |
sponsor |
FP7 Capacities MONDILEX Conceptual Modelling of Networking of Centres for High-Quality Research in Slavic Lexicography and Their Digital Resources euFunds info:eu-repo/grantAgreement/EC/FP7/211938 |
size.info |
12 texts |
size.info |
79718 sentences |
size.info |
1064424 words |
files.count |
1 |
files.size |
14800805 |