Show simple item record

 
dc.contributor.author Božović, Petar
dc.contributor.author Erjavec, Tomaž
dc.contributor.author Tiedemann, Jörg
dc.contributor.author Ljubešić, Nikola
dc.contributor.author Gorjanc, Vojko
dc.date.accessioned 2018-03-20T10:53:17Z
dc.date.available 2018-03-20T10:53:17Z
dc.date.issued 2018-03-20
dc.identifier.uri http://hdl.handle.net/11356/1176
dc.description This corpus contains parallel English-Montenegrin subtitles collected in the scope of conducting a linguistic and translatological research by Petar Božović for his PhD thesis "Audiovisual Translation and Elements of Culture: A Comparative Analysis of Transfer with Reception Study in Montenegro". The data and permission to redistribute were obtained from the Radio and Television of Montenegro (http://www.rtcg.me), the public service broadcaster of Montenegro. The corpus consists of English and Montenegrin subtitles of three TV series: House of Cards (686 minutes), Damages (2878 minutes), and Tudors (1999 minutes). The corpus covers 10 seasons, 110 episodes, and 5,563 minutes in terms of duration. Sentence alignment and basic encoding were performed inside the OPUS project (http://opus.nlpl.eu/MontenegrinSubs.php), while MSD tagging, lemmatisation, and TEI conversion were performed by the CLARIN.SI infrastructure. The English texts were tagged by TreeTagger (http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/) and the Montenegrin texts by ReLDI Tagger (https://github.com/clarinsi/reldi-tagger) using the Serbian language model. The TreeTagger (Penn Treebank) tagset was mapped to the SPOOK MSD tagset for English (https://nl.ijs.si/spook/msd/html-en/msd-en.html). The corpus is available in TEI format and derived vertical format used by CQP and Manatee (Sketch Engine). The alignments in the vertical file are given separately as tables linking the alignment elements of the two languages.
dc.language.iso cnr
dc.language.iso eng
dc.publisher Jožef Stefan Institute
dc.relation.isreferencedby http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Bozovic-et-al_Opus-MontenegrinSubs-1-0-First-electronic-corpus-of-the-Montenegrin-language.pdf
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.source.uri https://opus.nlpl.eu/MontenegrinSubs/corpus/version/MontenegrinSubs
dc.subject parallel corpus
dc.subject subtitles
dc.subject multilingual
dc.title English-Montenegrin parallel corpus of subtitles Opus-MontenegrinSubs 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Nikola Ljubešić nikola.ljubesic@ijs.si Jožef Stefan Institute
sponsor ARRS (Slovenian Research Agency) P2-103 Knowledge Technologies nationalFunds
sponsor Jožef Stefan Institute CLARIN CLARIN.SI nationalFunds
size.info 133547 units
size.info 853165 words
files.count 2
files.size 13482052
featuredService.kontext search|https://www.clarin.si/kontext/first_form?corpname=opusmonte_cnr
featuredService.noske search|https://www.clarin.si/ske/#dashboard?corpname=opusmonte_cnr


 Files in this item

 Download all files in item (12.86 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
OpusMonte.TEI.zip
Size
8.31 MB
Format
application/zip
Description
Corpus in TEI format
MD5
82405f90314ed95950f2577d9fc5939f
 Download file  Preview
 File Preview  
  • OpusMonte.TEI
    • opusmonte_cnr.ana.xml46 MB
    • opusmonte_alg.xml6 MB
    • opus2vert.xsl2 kB
    • schema
      • tei_clarin.zip42 kB
      • tei_clarin_schema.xml2 kB
      • tei_clarin.rnc184 kB
      • tei_clarin.dtd146 kB
      • README.md396 B
      • tei_clarin_doc.html1 MB
      • tei_clarin.rng377 kB
    • opusmonte_en.ana.xml65 MB
    • 00README.txt227 B
    • opusmonte.xml6 kB
Icon
Name
OpusMonte.vert.zip
Size
4.54 MB
Format
application/zip
Description
Corpus in derived vertical format
MD5
24596dab9f565c615929b492720538df
 Download file  Preview
 File Preview  
  • OpusMonte.vert
    • opusmonte_en-cnr.tbl770 kB
    • opusmonte_en.vert11 MB
    • opusmonte_en.regi2 kB
    • opusmonte_cnr-en.tbl770 kB
    • opus2vert.xsl2 kB
    • opusmonte_cnr.vert10 MB
    • 00README.txt227 B
    • opusmonte_cnr.regi2 kB

Show simple item record