Show simple item record

 
dc.contributor.author Babič, Saša
dc.contributor.author Miha, Peče
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ivančič Kutin, Barbara
dc.contributor.author Šrimpf Vendramin, Katarina
dc.contributor.author Kropej Telban, Monika
dc.contributor.author Jakop, Nataša
dc.contributor.author Stanonik, Marija
dc.date.accessioned 2023-10-03T10:15:40Z
dc.date.available 2023-10-03T10:15:40Z
dc.date.issued 2023-09-30
dc.identifier.uri http://hdl.handle.net/11356/1853
dc.description This corpus collects and annotates the extensive and highly valuable diachronic collection of 37,390 Slovenian proverbs, 50 years and more in the making at the ZRC SAZU Institute of Slovenian Ethnology. Each proverb is linked to its source, and the sources comprise 2,630 bibliographical items (1578-2010): printed books, journals, calendars, collecting campaigns in different journals, folklore collecting field-works, personal notes, etc. Each proverb is represented in two ways: in its diplomatic transcription faithful to its source (due to the technical difficulties of the transcribers and human errors in transcription, the transcription of older texts is inconsistent) and as the critical transcription which modernises the alphabet used. The words of the critical transcriptions have also been automatically modernised to contemporary spelling using cSMTiser (https://github.com/clarinsi/csmtiser) trained on the goo300k corpus of historical Slovenian (http://hdl.handle.net/11356/1025), and these words further annotated with lemmas, MULTEXT-East morphosyntactic descriptions (https://nl.ijs.si/ME/V6/msd/html/msd-sl.html) and Universal dependencies (https://universaldependencies.org/) with the CLASSLA toolchain (https://github.com/clarinsi/classla). The canonical encoding of the corpus is TEI, but the corpus is also distributed in two derived encodings. One is the proverbs and teh bibliography as two TSV files, and the other the vertical file with the proverbs, as used by CQP-type concordancers, such as Sketch Engine. As opposed to the previous version 1.0, this version includes 1,183 more proverbs and 115 more bibliographical items and corrects some errors.
dc.language.iso slv
dc.publisher ZRC SAZU
dc.publisher Jožef Stefan Institute
dc.relation.replaces http://hdl.handle.net/11356/1455
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://isn2.zrc-sazu.si/en/programi-in-projekti/traditional-paremiological-units-in-dialogue-with-contemporary-use
dc.subject paremiology
dc.subject folk sayings
dc.subject TEI
dc.subject proverbs
dc.title Collection of Slovenian paremiological units Pregovori 1.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Saša Babič sasa.babic@zrc-sazu.si ZRC SAZU
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) J6-2579 Traditional Paremiological Units in Dialogue with Contemporary Use nationalFunds
size.info 37390 idiomaticExpressions
files.count 3
files.size 23263598
featuredService.kontext search|https://www.clarin.si/kontext/query?corpname=pregovori
featuredService.noske search|https://www.clarin.si/ske/#dashboard?corpname=pregovori


 Files in this item

 Download all files in item (22.19 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
Pregovori.TEI.zip
Size
13.02 MB
Format
application/zip
Description
Corpus in source TEI format
MD5
50637e1839241d07219ed9474a7f9f58
 Download file
Icon
Name
Pregovori.tsv.zip
Size
1.38 MB
Format
application/zip
Description
Corpus in derived tabular format
MD5
eba90c0e9c797298a9ae9f2a79bb2d7e
 Download file
Icon
Name
Pregovori.vert.zip
Size
7.78 MB
Format
application/zip
Description
Corpus in derived vertical format
MD5
de5240446cbda69bb5b468d72be250de
 Download file

Show simple item record