2016-03-29 17 views
0

我想在河中提取使用正則表達式的一些財務數據如何使用正則表達式中的R

我用正則表達式測試儀,http://regexr.com/,做一個正則表達式應該抓住匹配特定字符串信息我所需要的 - 問題只是它不...

我已經提取從這個URL數據:http://finance.yahoo.com/q/cp?s=%5EOMXC20+Components

我想匹配的公司名稱(DANSKE.CO,DSV.CO等。 ),並且我創建了以下正則表達式,它在regexr.com上匹配它:

.q\?s=(\S*\\) 

但它在R中不起作用有人能幫我弄清楚如何去做這件事嗎?

+1

定義速記字符類像'\ s' - >'「\\ s」'時,在R字符串中使用雙反斜槓。 –

+0

您可能需要首先轉義特殊字符,例如\與另一個\。 –

+0

有人發佈關於正則表達式HTML的強制性回覆... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – cory

回答

2

而是用正則表達式亂搞我會使用XPath類似的東西取HTML內容:

library("XML") 
f <- tempfile() 
download.file("https://finance.yahoo.com/q/cp?s=^OMXC20+Components", f) 
doc <- htmlParse(f) 
xpathSApply(doc, "//b/a", xmlValue) 
# [1] "CARL-B.CO" "CHR.CO"  "COLO-B.CO" "DANSKE.CO" "DSV.CO"  
# [6] "FLS.CO"  "GEN.CO"  "GN.CO"  "ISS.CO"  "JYSK.CO"  
# [11] "MAERSK-A.CO" "MAERSK-B.CO" "NDA-DKK.CO" "NOVO-B.CO" "NZYM-B.CO" 
# [16] "PNDORA.CO" "TDC.CO"  "TRYG.CO"  "VWS.CO"  "WDH.CO"  
0

這是否幫助?如果沒有,請回復,我會提供另一個建議。

library(XML) 

stocks <- c("AXP","BA","CAT","CSCO") 

for (s in stocks) { 
     url <- paste0("http://finviz.com/quote.ashx?t=", s) 
     webpage <- readLines(url) 
     html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE) 
     tableNodes <- getNodeSet(html, "//table") 

     # ASSIGN TO STOCK NAMED DFS 
     assign(s, readHTMLTable(tableNodes[[9]], 
       header= c("data1", "data2", "data3", "data4", "data5", "data6", 
          "data7", "data8", "data9", "data10", "data11", "data12"))) 

     # ADD COLUMN TO IDENTIFY STOCK 
     df <- get(s) 
     df['stock'] <- s 
     assign(s, df) 
} 

# COMBINE ALL STOCK DATA 
stockdatalist <- cbind(mget(stocks)) 
stockdata <- do.call(rbind, stockdatalist) 
# MOVE STOCK ID TO FIRST COLUMN 
stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)] 

# SAVE TO CSV 
write.table(stockdata, "C:/Users/rshuell001/Desktop/MyData.csv", sep=",", 
      row.names=FALSE, col.names=FALSE) 

# REMOVE TEMP OBJECTS 
rm(df, stockdatalist)