Prikaži enostavni zapis vnosa
dc.contributor.author
Rei, Luis
dc.contributor.author
Krek, Simon
dc.contributor.author
Mladenić, Dunja
dc.date.accessioned
2016-11-28T13:47:36Z
dc.date.available
2016-11-28T13:47:36Z
dc.date.issued
2016-11-28
dc.identifier.uri
http://hdl.handle.net/11356/1078
dc.description
The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total, the corpus contains almost 20K annotated messages and 350K tokens.
The corpus is described in
Luis Rei, Dunja Mladenić, Simon Krek. A Multilingual Social Media Linguistic Corpus. Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities. 27–28 September 2016, Ljubljana, Slovenia. https://nl.ijs.si/janes/cmc-corpora2016/proceedings/
dc.language.iso
spa
dc.language.iso
ita
dc.language.iso
deu
dc.publisher
Jožef Stefan Institute
dc.relation
info:eu-repo/grantAgreement/EC/FP7/611346
dc.rights
The MIT License (MIT)
dc.rights.uri
https://opensource.org/licenses/mit-license.php
dc.rights.label
PUB
dc.source.uri
https://github.com/lrei/xlime_twitter_corpus
dc.subject
social media
dc.subject
computer-mediated communication
dc.subject
Twitter
dc.subject
part-of-speech tagging
dc.subject
named entities
dc.subject
sentiment classification
dc.subject
multilingual
dc.subject
manual annotation
dc.title
xLiMe Twitter Corpus XTC 1.0.1
dc.type
corpus
metashare.ResourceInfo#ContentInfo.mediaType
text
hidden
false
hasMetadata
false
has.files
yes
branding
CLARIN.SI data & tools
contact.person
Luis Rei luis.rei@ijs.si Jožef Stefan Institute
sponsor
ICT Programme FP7-ICT-611346 xLiMe euFunds info:eu-repo/grantAgreement/EC/FP7/611346
size.info
363994 tokens
size.info
19669 texts
files.count
2
files.size
6592396
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (6.29
MB)
×
Large Size
The requested files are being packed into one large file. This process can take some time, please be patient.
Continue
Cancel
To je vnos
Publicly Available
z licenco:
The MIT License (MIT)
Ime
xlime_twitter_corpus-master.zip
Velikost
6.14
MB
Format
application/zip
Opis
The full xLiMe Twitter Corpus data and code
MD5
a65651e185d92b7aa76de9f52a6aa442
Prenesi datoteko
Predogled
xlime_twitter_corpus-master corpus_task spanish_pos.txt 1 MB german_ner.txt 454 kB german_pos.txt 565 kB german_sentiment.tsv 422 kB spanish_sentiment.tsv 890 kB italian_sentiment.tsv 1 MB italian_ner.txt 1 MB italian_pos.txt 1 MB spanish_ner.txt 964 kB README.md 13 kB code __init__.py 0 B twokenize.py 11 kB stats_task.py 1 kB xlime2conll.py 2 kB extract_sentiment.py 1 kB data.py 1 kB agreement.py 4 kB xlime2iaa.py 2 kB stats.py 2 kB experiment.py 4 kB seq.py 4 kB pretag.py 5 kB requirements.txt 25 B data italian_task_1442142987.tsv 10 MB german_task_1442142996.tsv 4 MB spanish_task_1440847551.tsv 9 MB agreement german_sent.iaa 1 kB italian_ner.iaa 9 kB italian_pos.iaa 22 kB spanish_ner.iaa 6 kB german_ner.iaa 8 kB spanish_pos.iaa 15 kB german_pos.iaa 17 kB spanish_sent.iaa 1 kB italian_sent.iaa 1 kB guidelines.md 8 kB LICENSE.md 1 kB
Ime
CMC-2016_Rei_et_al_Multilingual-Social-Media-Linguistic-Corpus.pdf
Velikost
151.65
KB
Format
PDF
Opis
Paper describing the corpus
MD5
d2e2a0b00a4d389f40b55f4972901a37
Prenesi datoteko
Prikaži enostavni zapis vnosa