2016-04-23 108 views
0

我想從這裏http://www.bls.gov/bls/archived_sched.htm(檔案)和這裏http://www.bls.gov/schedule/news_release/empsit.htm(當前年份)刮過去非農數據的日期。Web Scrape Non Farm Payrolls在R的日期

Peter Chan爲FOMC日期取得類似成就:https://github.com/returnandrisk/r-code/blob/master/FOMC%20Dates%20-%20Scraping%20Data%20From%20Web%20Pages.R。這是他的代碼:

install.packages(c("httr", "XML"), repos = "http://cran.us.r-project.org") 

library(httr) 
library(XML) 

# get and parse web page content            
webpage <- content(GET("http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm"), as="text") 
xhtmldoc <- htmlParse(webpage) 
# get statement urls and sort them 
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr, "href") 
statements <- sort(statements) 
# get dates from statement urls 
fomcdates <- sapply(statements, function(x) substr(x, 28, 35)) 
fomcdates <- as.Date(fomcdates, format="%Y%m%d") 
# save results in working directory 
save(list = c("statements", "fomcdates"), file = "fomcdates.RData") 

我想複製NFP。正如fomcdates包含所有FOMC日期,我想創建包含所有NFP日期的NFPdates。

誰會知道如何做到這一年只? (問當前看來最簡單的一年)。謝謝。

+0

BLS有一個API和一個[相應的R包](https://cran.r-project.org/web/packages/blsAPI/index.html)。它可能以較不暴力的方式提供您需要的數據。 – hrbrmstr

+0

非常有趣,非常感謝!由於這種方法可以應用於其他數據源,因此@ jaats-by-jake響應仍然非常有用。 – Krug

回答

1

這適用於當年。

library(rvest) 

url <- 'http://www.bls.gov/schedule/news_release/empsit.htm' 
ses <- html_session(url) 
tbl <- html_table(ses, fill = T) 
nfpdates <- tbl[[2]]$`Release Date` 
nfpdates <- gsub('\\.', '', nfpdates) 
nfpdates <- as.Date(nfpdates, '%b %d, %Y')