Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Ogrodniczuk, Maciej
dc.contributor.author Osenova, Petya
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Simov, Kiril
dc.contributor.author Grigorova, Vladislava
dc.contributor.author Rudolf, Michał
dc.contributor.author Pančur, Andrej
dc.contributor.author Kopp, Matyáš
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author van der Pol, Henk
dc.contributor.author Depoorter, Griet
dc.contributor.author de Does, Jesse
dc.contributor.author Jongejan, Bart
dc.contributor.author Haltrup Hansen, Dorte
dc.contributor.author Navarretta, Costanza
dc.contributor.author Calzada Pérez, María
dc.contributor.author de Macedo, Luciana D.
dc.contributor.author van Heusden, Ruben
dc.contributor.author Marx, Maarten
dc.contributor.author Çöltekin, Çağrı
dc.contributor.author Coole, Matthew
dc.contributor.author Agnoloni, Tommaso
dc.contributor.author Frontini, Francesca
dc.contributor.author Montemagni, Simonetta
dc.contributor.author Quochi, Valeria
dc.contributor.author Venturi, Giulia
dc.contributor.author Ruisi, Manuela
dc.contributor.author Marchetti, Carlo
dc.contributor.author Battistoni, Roberto
dc.contributor.author Sebők, Miklós
dc.contributor.author Ring, Orsolya
dc.contributor.author Darģis, Roberts
dc.contributor.author Utka, Andrius
dc.contributor.author Petkevičius, Mindaugas
dc.contributor.author Briedienė, Monika
dc.contributor.author Krilavičius, Tomas
dc.contributor.author Morkevičius, Vaidas
dc.contributor.author Diwersy, Sascha
dc.contributor.author Luxardo, Giancarlo
dc.contributor.author Rayson, Paul
dc.date.accessioned 2021-06-18T09:25:39Z
dc.date.available 2021-06-18T09:25:39Z
dc.date.issued 2021-06-18
dc.identifier.uri http://hdl.handle.net/11356/1432
dc.description ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1431.
dc.language.iso bul
dc.language.iso hrv
dc.language.iso pol
dc.language.iso slv
dc.language.iso ces
dc.language.iso isl
dc.language.iso fra
dc.language.iso nld
dc.language.iso dan
dc.language.iso spa
dc.language.iso tur
dc.language.iso eng
dc.language.iso ita
dc.language.iso hun
dc.language.iso lav
dc.language.iso lit
dc.publisher CLARIN ERIC
dc.relation.isreferencedby https://doi.org/10.1007/s10579-021-09574-0
dc.relation.replaces http://hdl.handle.net/11356/1388
dc.relation.isreplacedby http://hdl.handle.net/11356/1486
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.eu/parlamint
dc.subject parliamentary debates
dc.subject COVID-19
dc.subject TEI
dc.subject Parla-CLARIN
dc.subject Czech Parliament
dc.subject Icelandic Parliament
dc.subject Belgian Parliament
dc.subject Danish Parliament
dc.subject Spanish Parliament
dc.subject Dutch Parliament
dc.subject Turkish Parliament
dc.subject Italian Parliament
dc.subject Hungarian Parliament
dc.subject Latvian Parliament
dc.subject Lithuanian Parliament
dc.subject British Parliament
dc.subject Bulgarian Parliament
dc.subject Croatian Parliament
dc.subject Polish Parliament
dc.subject Slovenian Parliament
dc.subject French Parliament
dc.title Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://github.com/clarin-eric/ParlaMint/
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
sponsor CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor Ministry of Education and Science Republic of Bulgaria DO01-272/16.12.2019 Bulgarian National Interdisciplinary Research e-Infrastructure for Resources and Technologies CLaDA-BG nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor LINDAT/CLARIAH-CZ LM2018101 Digital Research Infrastructure for Language Technologies, Arts and Humanities nationalFunds
sponsor Spanish Ministry of Science and Innovation PID2019-108866RB-I0 / AEI / 10.13039/501100011033 Original, translated and interpreted representations of the refugee cris(e)s: methodological triangulation within corpus-based discourse studies nationalFunds
sponsor The Research Council of Lithuania P-MIP-20-373 Policy Agenda of the Lithuanian Seimas and its Framing: The Analysis of the Seimas Debates in 1990 2020 nationalFunds
sponsor CLARIN-LV, European Regional Development Fund project 1.1.1.5/18/I/016 University of Latvia and institutes in the European Research Area - Excellency, activity, mobility, capacity Other
size.info 3774204 utterances
size.info 494949904 words
files.count 18
files.size 2331584637


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
ParlaMint-BE.tgz
Size
144.77 MB
Format
Unknown
Description
Belgian corpus
MD5
d3cc1f59db6d11c39abd3b0b460e115f
 Download file
Icon
Name
ParlaMint-BG.tgz
Size
103.13 MB
Format
Unknown
Description
Bulgarian corpus
MD5
ccf6248f4f3cdf4b5e23d84699c5ee37
 Download file
Icon
Name
ParlaMint-CZ.tgz
Size
121.52 MB
Format
Unknown
Description
Czech corpus
MD5
e083de169010a4b8e6254a66eb82901a
 Download file
Icon
Name
ParlaMint-DK.tgz
Size
112.84 MB
Format
Unknown
Description
Danish corpus
MD5
81da1d5dca50b82aad77650d09f83f50
 Download file
Icon
Name
ParlaMint-ES.tgz
Size
53.45 MB
Format
Unknown
Description
Spanish corpus
MD5
9258871617a6ca66e274642109f22c96
 Download file
Icon
Name
ParlaMint-FR.tgz
Size
142.6 MB
Format
Unknown
Description
French corpus
MD5
c9416174bcc777312ccb73ffd5d97e84
 Download file
Icon
Name
ParlaMint-GB.tgz
Size
406.8 MB
Format
Unknown
Description
British corpus
MD5
106344757a9e5b0e270aed1db1d4a680
 Download file
Icon
Name
ParlaMint-HR.tgz
Size
91.69 MB
Format
Unknown
Description
Croatian corpus
MD5
cf2ce85cb3d61df368d8bbfe2635988a
 Download file
Icon
Name
ParlaMint-HU.tgz
Size
5.7 MB
Format
Unknown
Description
Hungarian corpus
MD5
377f91851dfc30d85443299b8ee1006a
 Download file
Icon
Name
ParlaMint-IS.tgz
Size
103.29 MB
Format
Unknown
Description
Icelandic corpus
MD5
7ca529ad12ff67922928d80d55ccc53b
 Download file
Icon
Name
ParlaMint-IT.tgz
Size
117.1 MB
Format
Unknown
Description
Italian corpus
MD5
2593365d4f3b28803ae5d9741e4ec7e2
 Download file
Icon
Name
ParlaMint-LT.tgz
Size
83.2 MB
Format
Unknown
Description
Lithuanian corpus
MD5
80ca9205a0a6fe3024cd629d5945b3ce
 Download file
Icon
Name
ParlaMint-LV.tgz
Size
33.8 MB
Format
Unknown
Description
Latvian corpus
MD5
2e974ae82b53a1e395100431ac6560ea
 Download file
Icon
Name
ParlaMint-NL.tgz
Size
222.88 MB
Format
Unknown
Description
Dutch corpus
MD5
0eb0f036118e9a37e901e35b7b10fb48
 Download file
Icon
Name
ParlaMint-PL.tgz
Size
141.84 MB
Format
Unknown
Description
Polish corpus
MD5
ede585e0b3e74e5c333532a64b8f404b
 Download file
Icon
Name
ParlaMint-SI.tgz
Size
89.89 MB
Format
Unknown
Description
Slovenian corpus
MD5
3f33000a5ec42d867526ac9b64d6ab5d
 Download file
Icon
Name
ParlaMint-TR.tgz
Size
244.35 MB
Format
Unknown
Description
Turkish corpus
MD5
87981a989762f806db26448d2eabb385
 Download file
Icon
Name
ParlaMint-2.1.tgz
Size
4.72 MB
Format
Unknown
Description
https://github.com/clarin-eric/ParlaMint/releases/tag/v2.1 (samples, schemas, scripts)
MD5
32280fec61af1baff34bc4c84d31461b
 Download file

Show simple item record