Gos 2.1 is the reference speech corpus of the Slovenian language. This edition contains about 300 hours of speech, or 2.4 million words, 127 thousand utterances and 1,500 texts. It is composed from three different sources:
(1) Spoken corpus Gos 1.1 (http://hdl.handle.net/11356/1438), 112 hours, 1 million words
(2) Spoken corpus Gos VideoLectures 4.2 (http://hdl.handle.net/11356/1222), 22 hours, 179,000 words
(3) A selection from the ASR database ARTUR 1.0 (http://hdl.handle.net/11356/1776), 185 hours, 1.2 mllion words, including:
(3a) Artur-J-Splosni, 62 hours, 422,000 words: media recordings, online recordings of conferences, workshops, education videos, etc.
(3b) Artur-N-Prosti, 61 hours, 324,000 words: monologues and dialogues between two persons, recorded for the purposes of the Artur database. Speakers were asked to freely conversate or freely explain on casual topics.
(3c) Artur-P-SejeDZ, 62 hours, 450,000 words: a selection speeches from the Slovene National Assembly. The maximum length of single speaker speech is 4,000 words.
This entry includes audio files and additionally video files for the television recordings only. The format of the audio files is wav, pcm, 16-bit, mono, 44.1 kHz. Video files are in mp4 format. Transcript files are available at http://hdl.handle.net/11356/1863.
Ministry of Education, Science and Sport3311-08-986003"Communication in Slovene"Ministry of CultureC3340-20-278001"Development of Slovene in a Digital Environment"Republic of Slovenia, Ministry of Culture3340-15-141005"Project Gos Videolectures"Jožef Stefan InstituteCLARIN"CLARIN.SI"ARRS (Slovenian Research Agency)J7-4642"MEZZANINE"