On Github wudukers / Taishin_R_Crawler
為了使用這些套件,我們必須先安裝它們。
install.packages("XML")
## ## The downloaded binary packages are in ## /var/folders/5c/0p5zr2_n4xvbt2j6hkqczhph0000gn/T//Rtmpf9Hxhx/downloaded_packages
library("XML")
## Warning: package 'XML' was built under R version 3.1.2
That's it! You are ready for building simple web-crawler now.
眼見為憑,我們用一個小例子示範一下。
This way to TWSE
MOPS_URL.TWSE_ALL <- "http://www.twse.com.tw/en/listed/listed_company/apply_listing.php?page=1" web_page = htmlParse(MOPS_URL.TWSE_ALL,encoding="big5") data = readHTMLTable(web_page, which=6, stringsAsFactors=F, header = T) names(data) <- c("Application Date", "Code", "Company", "Chairman","Amount of Capital", "Underwriter") data <- data[-1,] head(data, n=3)
## Application Date Code Company Chairman Amount of Capital Underwriter ## 2 2014.10.16 3416 WinMate 610,664 ## 3 2014.10.07 8341 SF 1,000,000 ## 4 2014.09.25 1558 ZENG HSING 605,526
Magic!
不要問....很可怕 = =+