Show simple item record

 
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Kopp, Matyáš
dc.contributor.author Ogrodniczuk, Maciej
dc.contributor.author Osenova, Petya
dc.contributor.author Agerri, Rodrigo
dc.contributor.author Agirrezabal, Manex
dc.contributor.author Agnoloni, Tommaso
dc.contributor.author Aires, José
dc.contributor.author Albini, Monica
dc.contributor.author Alkorta, Jon
dc.contributor.author Antiba-Cartazo, Iván
dc.contributor.author Arrieta, Ekain
dc.contributor.author Barcala, Mario
dc.contributor.author Bardanca, Daniel
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Bartolini, Roberto
dc.contributor.author Battistoni, Roberto
dc.contributor.author Bel, Nuria
dc.contributor.author Bonet Ramos, Maria del Mar
dc.contributor.author Calzada Pérez, María
dc.contributor.author Cardoso, Aida
dc.contributor.author Çöltekin, Çağrı
dc.contributor.author Coole, Matthew
dc.contributor.author Darģis, Roberts
dc.contributor.author de Does, Jesse
dc.contributor.author de Libano, Ruben
dc.contributor.author Depoorter, Griet
dc.contributor.author Depuydt, Katrien
dc.contributor.author Diwersy, Sascha
dc.contributor.author Dodé, Réka
dc.contributor.author Fernandez, Kike
dc.contributor.author Fernández Rei, Elisa
dc.contributor.author Frontini, Francesca
dc.contributor.author Garcia, Marcos
dc.contributor.author García Díaz, Noelia
dc.contributor.author García Louzao, Pedro
dc.contributor.author Gavriilidou, Maria
dc.contributor.author Gkoumas, Dimitris
dc.contributor.author Grigorov, Ilko
dc.contributor.author Grigorova, Vladislava
dc.contributor.author Haltrup Hansen, Dorte
dc.contributor.author Iruskieta, Mikel
dc.contributor.author Jarlbrink, Johan
dc.contributor.author Jelencsik-Mátyus, Kinga
dc.contributor.author Jongejan, Bart
dc.contributor.author Kahusk, Neeme
dc.contributor.author Kirnbauer, Martin
dc.contributor.author Kryvenko, Anna
dc.contributor.author Ligeti-Nagy, Noémi
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Luxardo, Giancarlo
dc.contributor.author Magariños, Carmen
dc.contributor.author Magnusson, Måns
dc.contributor.author Marchetti, Carlo
dc.contributor.author Marx, Maarten
dc.contributor.author Meden, Katja
dc.contributor.author Mendes, Amália
dc.contributor.author Mochtak, Michal
dc.contributor.author Mölder, Martin
dc.contributor.author Montemagni, Simonetta
dc.contributor.author Navarretta, Costanza
dc.contributor.author Nitoń, Bartłomiej
dc.contributor.author Norén, Fredrik Mohammadi
dc.contributor.author Nwadukwe, Amanda
dc.contributor.author Ojsteršek, Mihael
dc.contributor.author Pančur, Andrej
dc.contributor.author Papavassiliou, Vassilis
dc.contributor.author Pereira, Rui
dc.contributor.author Pérez Lago, María
dc.contributor.author Piperidis, Stelios
dc.contributor.author Pirker, Hannes
dc.contributor.author Pisani, Marilina
dc.contributor.author Pol, Henk van der
dc.contributor.author Prokopidis, Prokopis
dc.contributor.author Quochi, Valeria
dc.contributor.author Rayson, Paul
dc.contributor.author Regueira, Xosé Luís
dc.contributor.author Rii, Andriana
dc.contributor.author Rudolf, Michał
dc.contributor.author Ruisi, Manuela
dc.contributor.author Rupnik, Peter
dc.contributor.author Schopper, Daniel
dc.contributor.author Simov, Kiril
dc.contributor.author Sinikallio, Laura
dc.contributor.author Skubic, Jure
dc.contributor.author Tamper, Minna
dc.contributor.author Tungland, Lars Magne
dc.contributor.author Tuominen, Jouni
dc.contributor.author van Heusden, Ruben
dc.contributor.author Varga, Zsófia
dc.contributor.author Vázquez Abuín, Marta
dc.contributor.author Venturi, Giulia
dc.contributor.author Vidal Miguéns, Adrián
dc.contributor.author Vider, Kadri
dc.contributor.author Vivel Couso, Ainhoa
dc.contributor.author Vladu, Adina Ioana
dc.contributor.author Wissik, Tanja
dc.contributor.author Yrjänäinen, Väinö
dc.contributor.author Zevallos, Rodolfo
dc.contributor.author Fišer, Darja
dc.date.accessioned 2024-06-04T18:47:44Z
dc.date.available 2024-06-04T18:47:44Z
dc.date.issued 2024-06-03
dc.identifier.uri http://hdl.handle.net/11356/1911
dc.description ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora comprise between 9 and 126 million words and the complete set contains over 1.2 billion words. The transcriptions are divided by days with information on the term, session and meeting, and contain speeches marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. The corpora have extensive metadata, most importantly on speakers (name, gender, MP and minister status, party affiliation), on their political parties and parliamentary groups (name, coalition/opposition status, Wikipedia-sourced left-to-right political orientation, and CHES variables, https://www.chesdata.eu/). Note that some corpora have further metadata, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The transcriptions are also marked with the subcorpora they belong to ("reference", until 2020-01-30, "covid", from 2020-01-31, and "war", from 2022-02-24). An overview of the statistics of the corpora is avaialable on GitHub in the folder Build/Metadata, in particular for the release 4.1 at https://github.com/clarin-eric/ParlaMint/tree/v4.1/Build/Metadata. The corpora are encoded according to the ParlaMint encoding guidelines (https://clarin-eric.github.io/ParlaMint/) and schemas (included in the distribution). The ParlaMint.ana linguistic annotation includes tokenization; sentence segmentation; lemmatisation; Universal Dependencies part-of-speech, morphological features, and syntactic dependencies; and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, in particular PoS tagging according a language-specific scheme, with their corpus TEI headers giving further details on the annotation vocabularies and tools used. This entry contains the ParlaMint.ana TEI-encoded linguistically annotated corpora; the derived CoNLL-U files along with TSV metadata of the speeches; and the derived vertical files (with their registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 4.1 release of the sample data and scripts available at the GitHub repository of the ParlaMint project at https://github.com/clarin-eric/ParlaMint and the log files produced in the process of building the corpora for this release. The log files show e.g. known errors in the corpora, while more information about known problems is available in the open issues at the GitHub repository of the project. This entry contains the linguistically marked-up version of the corpus, while the text version, i.e. without the linguistic annotation is also available at http://hdl.handle.net/11356/1912. Another related resource, namely the ParlaMint corpora machine translated to English ParlaMint-en.ana 4.1 can be found at http://hdl.handle.net/11356/1910. As opposed to the previous version 4.0, this version fixes a number of bugs and restructures the ParlaMint GitHub repository. The DK corpus has been linguistically re-annotated to remove bugs, while its speeches are now also marked with topics. The PT corpus has been extended to 2024-03 and the UA corpus to 2023-11, which also has improved language marking (uk vs. ru) on segments.
dc.language.iso bos
dc.language.iso bul
dc.language.iso cat
dc.language.iso hrv
dc.language.iso ces
dc.language.iso dan
dc.language.iso nld
dc.language.iso eng
dc.language.iso est
dc.language.iso fra
dc.language.iso glg
dc.language.iso deu
dc.language.iso hun
dc.language.iso isl
dc.language.iso ita
dc.language.iso lav
dc.language.iso ell
dc.language.iso nor
dc.language.iso pol
dc.language.iso por
dc.language.iso rus
dc.language.iso srp
dc.language.iso slv
dc.language.iso spa
dc.language.iso swe
dc.language.iso tur
dc.language.iso ukr
dc.language.iso fin
dc.language.iso eus
dc.publisher CLARIN ERIC
dc.relation.isreferencedby https://doi.org/10.21203/rs.3.rs-4176128/v1
dc.relation.replaces http://hdl.handle.net/11356/1860
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://www.clarin.eu/content/parlamint
dc.subject Parla-CLARIN
dc.subject parliamentary debates
dc.subject COVID-19
dc.subject TEI
dc.subject Bulgarian Parliament
dc.subject Croatian Parliament
dc.subject Polish Parliament
dc.subject Slovenian Parliament
dc.subject Czech Parliament
dc.subject Icelandic Parliament
dc.subject Belgian Parliament
dc.subject Danish Parliament
dc.subject Spanish Parliament
dc.subject Dutch Parliament
dc.subject Turkish Parliament
dc.subject Italian Parliament
dc.subject Hungarian Parliament
dc.subject Latvian Parliament
dc.subject French Parliament
dc.subject Bosnian Parliament
dc.subject Catalonian Parliament
dc.subject Galician Parliament
dc.subject Greek Parliament
dc.subject Norwegian Parliament
dc.subject Serbian Parliament
dc.subject Swedish Parliament
dc.subject Ukrainian Parliament
dc.subject Finnish Parliament
dc.subject Estonian Parliament
dc.subject Basque Parliament
dc.subject Portuguese parliament
dc.subject Austrian Parliament
dc.subject UK Parliament
dc.title Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 4.1
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
demo.uri https://github.com/clarin-eric/ParlaMint/
contact.person Tomaž Erjavec tomaz.erjavec@ijs.si Jožef Stefan Institute
contact.person Matyáš Kopp kopp@ufal.mff.cuni.cz Charles University
sponsor CLARIN ERIC - ParlaMint: Towards Comparable Parliamentary Corpora Other
sponsor Austrian Academy of Sciences - ÖAW nationalFunds
sponsor European Commission POIR.04.02.00-00C002/19 European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure Other
sponsor Dutch Language Institute - - nationalFunds
sponsor Ministry of Education, Youth and Sports of the Czech Republic LM2023062 LINDAT/CLARIAH-CZ: Digital Research Infrastructure for Language Technologies, Arts and Humanities nationalFunds
sponsor Department of Nordic Studies and Linguistics (NorS), University of Copenhagen CLARIN-DK CLARIN-DK nationalFunds
sponsor Galician Language Institute, University of Santiago de Compostela - - ownFunds
sponsor Xunta de Galicia - University of Santiago de Compostela 2021-CP080 Nós: Galician in the society and economy of artificial intelligence (2021-CP080), agreement between Xunta de Galicia and the University of Santiago de Compostela nationalFunds
sponsor Hungarian Research Centre for Linguistics - - nationalFunds
sponsor National Library of Norway - - nationalFunds
sponsor Institute of Computer Science, Polish Academy of Sciences - - nationalFunds
sponsor Polish Ministry of Education and Science 2022/WK/09 National contribution to CLARIN ERIC – European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure 2022–2023 (CLARIN Q) nationalFunds
sponsor Fundação para a Ciência e a Tecnologia UIDP/00214/2020 - nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
sponsor ARRS (Slovenian Research Agency) P6-0411 Language Resources and Technologies for Slovene nationalFunds
sponsor Nederlandse Organisatie voor Wetenschappelijk Onderwijs CISC.CC.016 Access to City Councils using Exploratory Search Systems nationalFunds
sponsor Bulgarian Ministry of Education and Science DO1-301/17.12.21 Bulgarian National Interdisciplinary Research e-Infrastructure for Resources and Technologies in favor of the Bulgarian Language and Cultural Heritage, part of the EU infrastructures CLARIN and DARIAH nationalFunds
sponsor Institute for Language and Speech Processing / ATHENA RC - - nationalFunds
sponsor ARRS (Slovenian Research Agency) J7-4642 MEZZANINE nationalFunds
sponsor The Árni Magnsússon Institute for Icelandic Studies - - ownFunds
sponsor Slovenian Research Agency (ARRS) P6-0436 Basic national research program 'Digital Humanities' (2022-2027) nationalFunds
sponsor ARRS (Slovenian Research Agency) N6-0099 Flemish-Slovenian bilateral basic research project ‘Linguistic landscape of hate speech online’ (2019-2023) nationalFunds
sponsor ARRS (Slovenian Research Agency) N6-0288 the MSCA Seal of Excellence postdoctoral project 'The Changing Discursive Semantics of EU Representations' (2022-2024) nationalFunds
sponsor Ministry of Science and Innovation of Spain - - nationalFunds
sponsor HiTZ - Ixa Group (UPV/EHU) - - Other
size.info 8132022 utterances
size.info 1231036093 words
files.count 31
files.size 70832756094
featuredService.noske Joint 4.1 corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_xx
featuredService.noske Austrian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_at
featuredService.noske Bosnian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_ba
featuredService.noske Belgian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_be
featuredService.noske Bulgarian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_bg
featuredService.noske Czech corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_cz
featuredService.noske Danish corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_dk
featuredService.noske Estonian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_ee
featuredService.noske Spanish Corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_es
featuredService.noske Catalan corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_es_ct
featuredService.noske Galician corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_es_ga
featuredService.noske Basque corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_es_pv
featuredService.noske Finnish corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_fi
featuredService.noske French corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_fr
featuredService.noske British corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_gb
featuredService.noske Greek corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_gr
featuredService.noske Croatian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_hr
featuredService.noske Hungarian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_hu
featuredService.noske Icelandic corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_is
featuredService.noske Italian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_it
featuredService.noske Latvian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_lv
featuredService.noske Dutch corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_nl
featuredService.noske Norwegian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_no
featuredService.noske Polish corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_pl
featuredService.noske Portuguese corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_pt
featuredService.noske Serbian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_rs
featuredService.noske Swedish corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_se
featuredService.noske Slovenian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_si
featuredService.noske Turkish corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_tr
featuredService.noske Ukrainian corpus|https://www.clarin.si/ske/#dashboard?corpname=parlamint41_ua
featuredService.teitok ParlaMint 4.1|https://lindat.mff.cuni.cz/services/teitok/parlamint-41/


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
ParlaMint-AT.ana.tgz
Size
3.27 GB
Format
Unknown
Description
Austrian corpus
MD5
78f9130e33733d805e75da9049c5b863
 Download file
Icon
Name
ParlaMint-BA.ana.tgz
Size
996.43 MB
Format
Unknown
Description
Bosnian corpus
MD5
1f05074a95668ff0b96acd43b7283f74
 Download file
Icon
Name
ParlaMint-BE.ana.tgz
Size
2.78 GB
Format
Unknown
Description
Belgian corpus
MD5
9b8ee3913b35d0dca9a20934d55b93ed
 Download file
Icon
Name
ParlaMint-BG.ana.tgz
Size
1.5 GB
Format
Unknown
Description
Bulgarian corpus
MD5
ae44c0153df36efabbfcc783717dd37a
 Download file
Icon
Name
ParlaMint-CZ.ana.tgz
Size
1.85 GB
Format
Unknown
Description
Czech corpus
MD5
c2784ebdafd94b188f4eae73af8223fc
 Download file
Icon
Name
ParlaMint-DK.ana.tgz
Size
1.85 GB
Format
Unknown
Description
Danish corpus
MD5
1abbd96121881f18af9bf45a426b2689
 Download file
Icon
Name
ParlaMint-EE.ana.tgz
Size
1.25 GB
Format
Unknown
Description
Estonian corpus
MD5
6d71a849b3d395f988f7300db04c6fc7
 Download file
Icon
Name
ParlaMint-ES.ana.tgz
Size
966.93 MB
Format
Unknown
Description
Spanish Corpus
MD5
4f6972487be30a83c1fe6105870752c3
 Download file
Icon
Name
ParlaMint-ES-CT.ana.tgz
Size
967.05 MB
Format
Unknown
Description
Catalan corpus
MD5
40a5e16dd61d6cda1d572249f7c41d16
 Download file
Icon
Name
ParlaMint-ES-GA.ana.tgz
Size
916.22 MB
Format
Unknown
Description
Galician corpus
MD5
8f4c727d94705143f51b5fad52e769b9
 Download file
Icon
Name
ParlaMint-ES-PV.ana.tgz
Size
859.77 MB
Format
Unknown
Description
Basque corpus
MD5
7c2a100171a99640656fca936b13cecf
 Download file
Icon
Name
ParlaMint-FI.ana.tgz
Size
845.5 MB
Format
Unknown
Description
Finnish corpus
MD5
61921c5cb7283f1af91739b05fe5ee2b
 Download file
Icon
Name
ParlaMint-FR.ana.tgz
Size
2.41 GB
Format
Unknown
Description
French corpus
MD5
0e65bd7a72b31b0c39ecfac66043e200
 Download file
Icon
Name
ParlaMint-GB.ana.tgz
Size
5.59 GB
Format
Unknown
Description
British corpus
MD5
c48c4e0481ddc5a5715723193bf909e1
 Download file
Icon
Name
ParlaMint-GR.ana.tgz
Size
2.94 GB
Format
Unknown
Description
Greek corpus
MD5
c008600bca2cca1e268a64f36223b1b4
 Download file
Icon
Name
ParlaMint-HR.ana.tgz
Size
4.57 GB
Format
Unknown
Description
Croatian corpus
MD5
c66e01eb321ae299f28d383416a64025
 Download file
Icon
Name
ParlaMint-HU.ana.tgz
Size
1.68 GB
Format
Unknown
Description
Hungarian corpus
MD5
59e72bc21f4f24b84daa24b02a69eed5
 Download file
Icon
Name
ParlaMint-IS.ana.tgz
Size
1.48 GB
Format
Unknown
Description
Icelandic corpus
MD5
4aa2bae3200dccfe49e9d3772b30a3eb
 Download file
Icon
Name
ParlaMint-IT.ana.tgz
Size
1.65 GB
Format
Unknown
Description
Italian corpus
MD5
b8a25860b9018ab9083a727f66819778
 Download file
Icon
Name
ParlaMint-LV.ana.tgz
Size
562.82 MB
Format
Unknown
Description
Latvian corpus
MD5
71d5b15cc94a1b2997833a12c655b423
 Download file
Icon
Name
ParlaMint-NL.ana.tgz
Size
3.04 GB
Format
Unknown
Description
Dutch corpus
MD5
3653aa4375200958e8e886ec57dc99f1
 Download file
Icon
Name
ParlaMint-NO.ana.tgz
Size
4.08 GB
Format
Unknown
Description
Norwegian corpus
MD5
b6232de904b63db366863f890446f28f
 Download file
Icon
Name
ParlaMint-PL.ana.tgz
Size
2.07 GB
Format
Unknown
Description
Polish corpus
MD5
c5c3e8fd5a15308fb63facc8a04e6b6f
 Download file
Icon
Name
ParlaMint-PT.ana.tgz
Size
1.15 GB
Format
Unknown
Description
Portuguese corpus
MD5
6e6e45bc6ace03fd6b7d67292e0df6a8
 Download file
Icon
Name
ParlaMint-RS.ana.tgz
Size
4.39 GB
Format
Unknown
Description
Serbian corpus
MD5
805610f0126213cf92811be4bd128077
 Download file
Icon
Name
ParlaMint-SE.ana.tgz
Size
2.32 GB
Format
Unknown
Description
Swedish corpus
MD5
5fd1647243c23e078bc6c6b8f0204dbb
 Download file
Icon
Name
ParlaMint-SI.ana.tgz
Size
3.86 GB
Format
Unknown
Description
Slovenian corpus
MD5
bb483408e241c364e5f0d975cc2bc587
 Download file
Icon
Name
ParlaMint-TR.ana.tgz
Size
2.82 GB
Format
Unknown
Description
Turkish corpus
MD5
16a86ab87bb9762f87c565b11d54c70f
 Download file
Icon
Name
ParlaMint-UA.ana.tgz
Size
3.39 GB
Format
Unknown
Description
Ukrainian corpus
MD5
d5fe45f11bbebf05883a66c60e690f89
 Download file
Icon
Name
ParlaMint-4.1.tgz
Size
18.77 MB
Format
Unknown
Description
https://github.com/clarin-eric/ParlaMint/releases/tag/v4.1 (samples, schemas, scripts)
MD5
91929b37c965a5c6591b1cf2eda271ea
 Download file
Icon
Name
ParlaMint-4.1-Logs.tgz
Size
23.36 MB
Format
Unknown
Description
Build log files of the corpora
MD5
4c2f2b7d5394eceab9f7dbf5a217b55a
 Download file

Show simple item record