Prikaži enostavni zapis vnosa

 
dc.contributor.author Ulčar, Matej
dc.contributor.author Robnik-Šikonja, Marko
dc.date.accessioned 2020-07-09T12:32:41Z
dc.date.available 2020-07-09T12:32:41Z
dc.date.issued 2020-07-09
dc.identifier.uri http://hdl.handle.net/11356/1330
dc.description Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, used for various NLP classification tasks by finetuning the model end-to-end. CroSloEngual BERT are neural network weights and configuration files in pytorch format (i.e. to be used with pytorch library). Changes in version 1.1: fixed vocab.txt file, as previous verson had an error causing very bad results during fine-tuning and/or evaluation.
dc.language.iso hrv
dc.language.iso slv
dc.language.iso eng
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation info:eu-repo/grantAgreement/EC/H2020/825153
dc.relation.isreferencedby https://arxiv.org/abs/2006.07890
dc.relation.replaces http://hdl.handle.net/11356/1317
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri http://embeddia.eu
dc.subject word embeddings
dc.subject multilingual
dc.subject contextual embeddings
dc.subject BERT
dc.subject language model
dc.title CroSloEngual BERT 1.1
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
contact.person Matej Ulčar matej.ulcar@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor European Union EC/H2020/825153 EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media euFunds info:eu-repo/grantAgreement/EC/H2020/825153
files.count 3
files.size 499491056


 Datoteke v tem vnosu

 Prenesi vse datoteke v vnosu (476.35 MB)
To je vnos
Publicly Available
z licenco:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Ime
config.json
Velikost
520 bajtov
Format
Neznano
Opis
Configuration file, describing the model's architecture
MD5
db3bdd5c4db6ffffa9bf3edab2e7be70
 Prenesi datoteko
Icon
Ime
pytorch_model.bin
Velikost
476.04 MB
Format
Neznano
Opis
CroSloEngual BERT model
MD5
6b26401118943bf61b66a70d2ae68b9d
 Prenesi datoteko
Icon
Ime
vocab.txt
Velikost
321.42 KB
Format
Besedilna datoteka
Opis
Subword token (WordPiece) vocabulary
MD5
08ab5bc48cb5a041611ed062eb368790
 Prenesi datoteko  Predogled
 Predogled datoteke  
[PAD]
[EOS]
[unused00]
[unused0]
[unused1]
[unused2]
[unused3]
[unused4]
[unused5]
[unused6]
[unused7]
[unused8]
[unused9]
[unused10]
[unused11]
[unused12]
[unused13]
[unused14]
[unused15]
[unused16]
[unused17]
[unused18]
[unused19]
[unused20]
[unused21]
[unused22]
[unused23]
[unused24]
[unused25]
[unused26]
[unused27]
[unused28]
[unused29]
[unused30]
[unused31]
[unused32]
[unused33]
[unused34]
[unused35]
[unused36]
[unused37]
[unused38]
[unused39]
[unused40]
[unused41]
[unused42]
[unused43]
[unused44]
[unused45]
[unused46]
[unused47]
[unused48]
[unused49]
[unused50]
[unused51]
[unused52]
[unused53]
[unused54]
[unused55]
[unused56]
[unused57]
[unused58]
[unused59]
[unused60]
[unused61]
[unused62]
[unused63]
[unused64]
[unused65]
[unused66]
[unused67]
[unused68]
[unused69]
[unused70]
[unused71]
[unused72]
[unused73]
[unused74]
[unused75]
[unused76]
[unused77]
[unused78]
[unused79]
[unused80]
[unused81]
[unused82]
[unused83]
[unused84]
[unused85]
[unused86]
[unused87]
[unused88]
[unused8 . . .
                                            

Prikaži enostavni zapis vnosa