dc.contributor.author | Mozetič, Igor |
dc.contributor.author | Grčar, Miha |
dc.contributor.author | Smailović, Jasmina |
dc.date.accessioned | 2016-02-23T10:08:53Z |
dc.date.available | 2016-04-25T21:45:18Z |
dc.date.issued | 2016-02-23 |
dc.identifier.uri | http://hdl.handle.net/11356/1054 |
dc.description | The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate Twitter sentiment classifiers, to compute annotator agreement, or to study the differences between language usage on Twitter. The data analysis is described in the following papers: I. Mozetič, M. Grčar, J. Smailović. Multilingual Twitter sentiment classification: The role of human annotators, PLoS ONE 11(5): e0155036, doi: 10.1371/journal.pone.e0155036, 2016. (http://dx.doi.org/10.1371/journal.pone.0155036) I. Mozetič, L. Torgo, V. Cerqueira, J. Smailović. How to evaluate sentiment classifiers for Twitter time-ordered data?, PLoS ONE 13(3): e0194317, doi: 10.1371/journal.pone.0194317, 2018. (https://dx.doi.org/10.1371/journal.pone.0194317) |
dc.language.iso | sqi |
dc.language.iso | bos |
dc.language.iso | bul |
dc.language.iso | hrv |
dc.language.iso | eng |
dc.language.iso | deu |
dc.language.iso | hun |
dc.language.iso | pol |
dc.language.iso | por |
dc.language.iso | srp |
dc.language.iso | rus |
dc.language.iso | slk |
dc.language.iso | slv |
dc.language.iso | spa |
dc.language.iso | swe |
dc.publisher | Jožef Stefan Institute |
dc.relation | info:eu-repo/grantAgreement/EC/FP7/610704 |
dc.relation | info:eu-repo/grantAgreement/EC/FP7/317532 |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/640772 |
dc.relation.isreferencedby | https://dx.doi.org/10.1371/journal.pone.0155036 |
dc.relation.isreferencedby | https://dx.doi.org/10.1371/journal.pone.0194317 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.subject | sentiment classification |
dc.subject | |
dc.subject | inter-annotator agreement |
dc.subject | annotator self-agreement |
dc.subject | multilingual |
dc.title | Twitter sentiment for 15 European languages |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN.SI data & tools |
contact.person | Igor Mozetic igor.mozetic@ijs.si Jožef Stefan Institute |
sponsor | EC 610704 SIMPOL euFunds info:eu-repo/grantAgreement/EC/FP7/610704 |
sponsor | EC 317532 MULTIPLEX euFunds info:eu-repo/grantAgreement/EC/FP7/317532 |
sponsor | EC 640772 DOLFINS euFunds info:eu-repo/grantAgreement/EC/H2020/640772 |
sponsor | ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds |
size.info | 1643735 items |
files.count | 16 |
files.size | 51781021 |
Datoteke v tem vnosu
Prenesi vse datoteke v vnosu (49.38 MB)To je vnos
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
z licenco:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Ime
- README.txt
- Velikost
- 665 bajtov
- Format
- Besedilna datoteka
- Opis
- Unknown
- MD5
- 8b2d34f643b73d8a44557dc8d2ba6d2f
There are 15 files for the corresponding 15 European languages: Albanian, Bosnian, Bulgarian, Croatian, English, German, Hungarian, Polish, Portuguese, Russian, Serbian, Slovak, Slovenian, Spanish, and Swedish. Files are in the standard csv format, each line has the following form: TweetID,HandLabel,AnnotatorID TweetID is assigned by Twitter and can be used to retreive the tweet. HandLabel is the sentimen label as assigned by the human annotator (Negative, Neutral, or Positive). AnnotatorID is a 3-digit integer assigned to anonymous annotators, and can be used to identify tweets annotated several times by the same or by different annotators. . . .
- Ime
- German_Twitter_sentiment.csv
- Velikost
- 3.27 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- b6b766a80454a928ce0b90211dd60bab
- Ime
- English_Twitter_sentiment.csv
- Velikost
- 3.1 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 8407a2302f20336a8809ba74f6d0112a
- Ime
- Croatian_Twitter_sentiment.csv
- Velikost
- 2.95 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- d28be685aa56adf237a3d59e6043ddd7
- Ime
- Bulgarian_Twitter_sentiment.csv
- Velikost
- 2.02 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- c1248f10e9130b70036a994a11c44018
- Ime
- Albanian_Twitter_sentiment.csv
- Velikost
- 1.65 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- aaebf885a823be2e941a7bf58e1aeb5b
- Ime
- Russian_Twitter_sentiment.csv
- Velikost
- 3.14 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 199a3dd666abbb193d5541aac35eb9d6
- Ime
- Bosnian_Twitter_sentiment.csv
- Velikost
- 1.35 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 8dcfbeb77c8ae28b1f3211831705e14a
- Ime
- Portuguese_Twitter_sentiment.csv
- Velikost
- 4.6 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 446fe4c9be94b69b419cc8c81aea284e
- Ime
- Polish_Twitter_sentiment.csv
- Velikost
- 6.77 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 647396610cce71dda658924111a3833b
- Ime
- Hungarian_Twitter_sentiment.csv
- Velikost
- 2.07 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- cb7301ef7a528cf4360c8a7303d5d723
- Ime
- Swedish_Twitter_sentiment.csv
- Velikost
- 1.77 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 7e7f6885784b4e195bcf02a25d4881dd
- Ime
- Spanish_Twitter_sentiment.csv
- Velikost
- 8.31 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 8b0a56106e3764d17787eecc502484f3
- Ime
- Slovenian_Twitter_sentiment.csv
- Velikost
- 4.03 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 5717253537f2241c80be514ed135c612
- Ime
- Slovak_Twitter_sentiment.csv
- Velikost
- 2.14 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- 8889f24b7fbf7662324612440fa8d723
- Ime
- Serbian_Twitter_sentiment.csv
- Velikost
- 2.22 MB
- Format
- Neznano
- Opis
- CSV file
- MD5
- faeefe277de414e78e27ef679864c0e9