ASR training dataset for Croatian ParlaSpeech-HR v2.0 http://hdl.handle.net/11356/1914 The ParlaSpeech-HR.v2.0.jsonl (JSON lines) file consists of entries with the following attributes: id: ParlaMint utterance ID with zero-based character offsets pointing to the specific part of the utterance words: List of character and milisecond offsets to specific words in the trasncript, especially useful for further segmentation of each entry audio: path to the FLAC file (available from the part*.tgz files), the folder name corresponding to the YouTube video ID audio_length: length of the recording in seconds text: transcript of the audio text_start: starting character position in the original ParlaMint 4.0 utterance text_end: ending character position in the original ParlaMint 4.0 utterance audio_start: starting milisecond position in the original YouTube video audio_end: ending milisecond position in the original YouTube video speaker_info: full information on the speaker (and speech) from the ParlaMint 4.0 corpus