Show simple item record

 
dc.contributor.author Wasserscheidt, Philipp
dc.contributor.author Bulić, Halid
dc.contributor.author Durmišević, Elma
dc.contributor.author Hodžić-Čavkić, Azra
dc.contributor.author Bajraktarević, Enisa
dc.contributor.author Ahmetspahić-Peljto, Azra
dc.contributor.author Šabić, Belmin
dc.date.accessioned 2024-04-18T09:52:51Z
dc.date.available 2024-04-18T09:52:51Z
dc.date.issued 2024-04-17
dc.identifier.uri http://hdl.handle.net/11356/1913
dc.description This corpus is specialized, static (i.e., no future growth is planned), diachronic and covers the period from 2002 to 2022. The SMS messages included in this corpus were obtained from voluntary donors (informants). Both senders and recipients of the messages included in the corpus are Bosnian speakers, exhibiting diversity in terms of age, education and occupation, place of origin and countries of long-term residence. The Sarajevo Corpus of SMS Messages in Bosnian was originally published by University of Sarajevo – Faculty of Philosophy as an electronic book. The second phase of the work involved compiling the SMS messages into a corpus and linguistic annotation, which was done using the CLASSLA package (https://github.com/clarinsi/classla), version 2.1, with language = Serbian and type = nonstandard for tokenization, lemmatization and morpho-syntactic tagging (both MULTEXT-East and Universal Dependencies).
dc.language.iso bos
dc.publisher University of Sarajevo – Faculty of Philosophy
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://www.ff.unsa.ba/index.php/bs/projekti-centra-za-b-h-s-jezik/18335-sarajevski-korpus-sms-poruka-na-bosanskom-jeziku
dc.subject SMS
dc.subject specialised corpus
dc.title The Sarajevo Corpus of SMS Messages in Bosnian
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN.SI data & tools
contact.person Philipp Wasserscheidt philipp.wasserscheidt@hu-berlin.de Humboldt-Universität zu Berlin
contact.person Halid Bulić halid.bulic@ff.unsa.ba University of Sarajevo
size.info 10000 texts
size.info 15330 sentences
size.info 122843 tokens
files.count 1
files.size 1770084


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
SCSMS.zip
Size
1.69 MB
Format
application/zip
Description
Corpus in CoNLL-U format
MD5
ef2e8f9b358161817e238aa4bc927876
 Download file  Preview
 File Preview  
  • SCSMS
    • 096.conllu146 kB
    • 049.conllu98 kB
    • 076.conllu121 kB
    • 029.conllu135 kB
    • 056.conllu122 kB
    • 058.conllu151 kB
    • 083.conllu125 kB
    • 085.conllu121 kB
    • 038.conllu132 kB
    • 065.conllu100 kB
    • 092.conllu110 kB
    • 018.conllu135 kB
    • 045.conllu120 kB
    • 047.conllu115 kB
    • 072.conllu104 kB
    • 074.conllu96 kB
    • 027.conllu120 kB
    • 054.conllu150 kB
    • 081.conllu143 kB
    • 007.conllu119 kB
    • 009.conllu155 kB
    • 034.conllu123 kB
    • 036.conllu124 kB
    • 061.conllu117 kB
    • 063.conllu100 kB
    • 090.conllu96 kB
    • 016.conllu111 kB
    • 043.conllu114 kB
    • 070.conllu95 kB
    • 023.conllu125 kB
    • 025.conllu126 kB
    • 050.conllu93 kB
    • 052.conllu117 kB
    • 005.conllu128 kB
    • 030.conllu132 kB
    • 032.conllu116 kB
    • 012.conllu117 kB
    • 014.conllu121 kB
    • 099.conllu156 kB
    • 041.conllu118 kB
    • 021.conllu141 kB
    • 001.conllu150 kB
    • 003.conllu133 kB
    • 088.conllu134 kB
    • 010.conllu106 kB
    • 095.conllu142 kB
    • 097.conllu141 kB
    • 100.conllu161 kB
    • 077.conllu150 kB
    • 079.conllu136 kB
    • 057.conllu123 kB
    • 059.conllu119 kB
    • 084.conllu130 kB
    • 086.conllu121 kB
    • 039.conllu126 kB
    • 066.conllu91 kB
    • 068.conllu105 kB
    • 093.conllu110 kB
    • 019.conllu130 kB
    • 046.conllu122 kB
    • 048.conllu110 kB
    • 073.conllu111 kB
    • 075.conllu102 kB
    • 028.conllu126 kB
    • 055.conllu161 kB
    • 082.conllu131 kB
    • 008.conllu121 kB
    • 035.conllu103 kB
    • 037.conllu119 kB
    • 062.conllu101 kB
    • 064.conllu96 kB
    • 091.conllu149 kB
    • 017.conllu129 kB
    • 044.conllu134 kB
    • 071.conllu109 kB
    • 024.conllu133 kB
    • 026.conllu115 kB
    • 051.conllu118 kB
    • 053.conllu119 kB
    • 080.conllu125 kB
    • 006.conllu103 kB
    • 033.conllu138 kB
    • 060.conllu114 kB
    • 013.conllu126 kB
    • 015.conllu128 kB
    • 040.conllu137 kB
    • 042.conllu107 kB
    • 022.conllu128 kB
    • 002.conllu129 kB
    • 004.conllu122 kB
    • 089.conllu121 kB
    • 031.conllu140 kB
    • 011.conllu117 kB
    • 098.conllu156 kB
    • 078.conllu146 kB
    • 020.conllu125 kB
    • 087.conllu120 kB
    • 067.conllu91 kB
    • 069.conllu93 kB
    • 094.conllu147 kB

Show simple item record