Files in this item

 Download all files in item (213.86 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
readme_web_crawlers.txt
Size
1.3 KB
Format
Text file
Description
README file
MD5
4c42c0b4f2097f31cecde483d937ca29
 Download file  Preview
 File Preview  
Author: Jože Bučar, Faculty of Information Studies in Novo mesto (contact: joze.bucar@gmail.com)

Abstract:
Five web-crawlers written in the R language for retrieving Slovenian news texts from the portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and financial content. Web crawlers are written in the R language.

Keywords:
Web-crawling, Slovene

Web resources:
- Slovenian news texts with political, business, economic and financial content published between 1 September 2007 and 31 January 2016 from five Slovenian web media from five web media: www.24ur.com, www.dnevnik.si, www.finance.si, www.rtvslo.si, www.zurnal24.si

Type and size:
- .R (web-crawlers); size: 213 KB

Encoding: ANSI

Year: Last update 2016-02-14

Attributes (retrieved news):
URL main - Uniform Resource Locator (URL) of the resource (web medium) [string; www.24ur.com, www.dnevnik.si, www.finance.si, www.rtvslo.si, www.zurnal24.si]
URL - URL of . . .
                                            
Icon
Name
web_crawler_24UR.r
Size
61.36 KB
Format
Unknown
Description
Web crawler for 24ur
MD5
683b72b1265c9bc64464cbe3e8df278b
 Download file
Icon
Name
web_crawler_Dnevnik.r
Size
16.5 KB
Format
Unknown
Description
Web crawler for Dnevnik
MD5
704bbf20657684968a4260cc0d76fb5e
 Download file
Icon
Name
web_crawler_Finance.r
Size
50.96 KB
Format
Unknown
Description
Web crawler for Finance
MD5
436b1103bb5ddabf4aadd5aaee652584
 Download file
Icon
Name
web_crawler_RTVSLO.r
Size
41.08 KB
Format
Unknown
Description
Web crawler for Rtvslo
MD5
201e4b0ad829c52a06f2c0a8cf027643
 Download file
Icon
Name
web_crawler_Zurnal24.r
Size
42.66 KB
Format
Unknown
Description
Web crawler for Zurnal24
MD5
694b2ba88f6aa905563f58218e7b3b38
 Download file