Show simple item record

 
dc.contributor.author Lebar Bajec, Iztok
dc.contributor.author Repar, Andraž
dc.contributor.author Demšar, Jure
dc.contributor.author Bajec, Žan
dc.contributor.author Rizvič, Mitja
dc.contributor.author Kumperščak, Borut
dc.contributor.author Bajec, Marko
dc.date.accessioned 2022-12-02T10:46:08Z
dc.date.available 2022-12-02T10:46:08Z
dc.date.issued 2022-12-01
dc.identifier.uri http://hdl.handle.net/11356/1736
dc.description This Neural Machine Translation model for Slovene-English language pair was trained following the NVIDIA NeMo NMT AAYN recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation/machine_translation.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for translating text written in Slovene language to English and vice versa. The training corpus was built from publicly available datasets, including Parallel corpus EN-SL RSDO4 1.0 (https://www.clarin.si/repository/xmlui/handle/11356/1457), as well as a small portion of proprietary data. In total the training corpus consisted of 32.638.758 translation pairs and the validation corpus consisted of 8.163 translation pairs. The model was trained on 64GPUs and on the validation corpus reached a SacreBleu score of 48.3191 (at epoch 37) for translation from Slovene to English and a SacreBleu score of 53.8191 (at epoch 47) for translation from English to Slovene.
dc.language.iso slv
dc.language.iso eng
dc.publisher Faculty of Computer and Information Science, University of Ljubljana
dc.relation.isreferencedby https://github.com/clarinsi/Slovene_NMT
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/licenses/Apache-2.0
dc.rights.label PUB
dc.source.uri https://rsdo.slovenscina.eu/en/machine-translation
dc.subject machine translation
dc.subject NeMo
dc.subject model
dc.title Neural Machine Translation model for Slovene-English language pair RSDO-DS4-NMT 1.2.6
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding CLARIN.SI data & tools
demo.uri https://www.slovenscina.eu/en/prevajalnik
contact.person Iztok Lebar Bajec ilb@fri.uni-lj.si Faculty of Computer and Information Science, University of Ljubljana
sponsor Ministry of Culture C3340-20-278001 Development of Slovene in a Digital Environment Other
files.count 2
files.size 3968004854


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
slen_GEN_nemo-1.2.6.tar.zst
Size
1.85 GB
Format
Unknown
Description
RSDO DS4 NMT SLEN 1.2.6
MD5
e8ccb661e27aa3469b7b943a928282f2
 Download file
Icon
Name
ensl_GEN_nemo-1.2.6.tar.zst
Size
1.85 GB
Format
Unknown
Description
RSDO DS4 NMT ENSL 1.2.6
MD5
ea697f3fbc2f8ccb22c594c74b4a1cfe
 Download file

Show simple item record