r - 嘗試將函數寫入同一個數據幀

我是R新手，正在爲後面描述的問題苦苦掙扎。我在這裏找到的帖子看到相同的錯誤消息的問題，但找不到解決方案適用於我的問題。我希望這裏有人能幫忙。我有一個函數，將返回一個數據框（總是3列（鉛筆的名稱，價格，日期），通常約90行）。這是寫在http://pencil.land/?p=2677博客文章的後續。爲了給這個函數提供一個URL。r - 嘗試將函數寫入同一個數據幀

我有一個包含15個左右URL列表的向量，我希望所有這些都進入我的函數，並獲得一個大數據框，並返回所有結果。

我想令R調用函數向量中的每個網址，並記錄結果我認爲它可以在一個簡單的方法來完成，如：

result <- lapply(absoluteaddress, scrapearchive)

，但我得到

錯誤data.frame（pnames，價格，archivedate）：參數意味着不同的行數：0，1

absoluteaddress是帶有網址的矢量。 scrapearchive將創建數據框的函數。

...但這不起作用。

我想我可能需要rbind，以便該函數返回的所有數據幀都在同一個數據框中，但是我找不到這種工作方式。

另一種解決方案可能是scrapearchive將始終添加到相同的數據框，但我再次找不到實現此目的的方法。

如果有人能幫忙，我會很高興。

scrapearchive <- function(address) { 
    #exampleuse 
    # test4 <- scrapearchive("https://web.archive.org/web/20140417064443/http://www.cultpens.com/acatalog/Pencils.html") 

    library(rvest) 

    #get date out, ignore time as unlikely for price to have changed so not worth recording this info 
    archivedate <- regmatches(address, gregexpr("20[0-9]{6}",address)) 

    #pencils <- html("https://web.archive.org/web/20130730001143/http://www.cultpens.com/acatalog/Pencils.html") 
    pencils <- html(address) 

    # pencil product names 
    pnames <- 
    pencils %>% 
    html_nodes("p a") %>% 
    html_text() 

    # web page formatting will result in empty lines 
    # remove empty lines 
    pnames <- grep ("[a-z]", pnames, value=TRUE) 

    # product names into vector pnames 
    paragraphs <- 
    pencils %>% 
    html_nodes("p") %>% 
    html_text() 

    #remove all entries without a pound sign 
    paragraphs <- grep ("£", paragraphs, value=TRUE) 


    # only keep prices 
    t1 <- regmatches(paragraphs, gregexpr("£([0-9])+.[0-9][0-9]",paragraphs)) 

    # only keep first price 
    price = do.call("rbind", lapply(t1, "[[", 1)) 


    # both vecors into a dataframe 
    df <- data.frame(pnames,price, archivedate) 
    # names for the columns 
    names(df) <- c("name", "price", "date") 

    #spell it out so that it gets returned 
    df 
}

和

library(rvest) 
overview <- html("https://web.archive.org/web/*/http://www.cultpens.com/acatalog/Pencils.html") 

# archive.org doesn't use css, so can't use rvest? 

urls <- 
    overview %>% 
    html_nodes("date captures") %>% 
    html_text() 



overview <- readLines("http://web.archive.org/web/*/http://www.cultpens.com/acatalog/Pencils.html") 

# Get lines with links 
htmllines <- overview[grep("<a href=\"/web/20", overview)] 


# \/web.*\.html 
# \/ matches the character/literally 
# . matches any character (except newline) 
# Quantifier: * Between zero and unlimited times 
# can check at https://regex101.com 

# get address out 
# R needs escape characters escaped!!!!! 
relativeaddress <- regmatches(htmllines, gregexpr("\\/web.*\\.html",htmllines)) 



absoluteaddress <- paste0 ("https://web.archive.org", relativeaddress) 

#http://nicercode.github.io/guides/repeating-things/ 
result <- lapply(absoluteaddress, scrapearchive)

來源

2015-06-23 memm

一旦你的'你可以使用data.frames''do.call列表（rbind ，listOfData）'加入到一個。但是，這聽起來像您的'scrapearchive'功能有問題 – jenesaisquoi

您可以嘗試使用for循環來調用所有的url並將響應綁定到一個數據框中。 – Keon

Legalizelt，謝謝。問題是根據我的理解，我不能用樂聲來做到這一點。噢噢，謝謝。我想過這樣做，這對我來說更自然，因爲我習慣於PHP，而不是R，但似乎更簡單的R方法是使用lapply。我將研究如何在R中執行循環。 – memm

使用do.call到data.frame的List綁定到一個

f <- function(l) data.frame(l=l, x=sample(100, 10), y=sample(100, 10)) 
urls <- sample(letters, 10) 
do.call(rbind, lapply(urls, f))

編輯：一個方式來處理丟失的數據將包裹data.frame建設在tryCatch塊中。

替換：

# both vecors into a dataframe 
    df <- data.frame(pnames,price, archivedate) 
    # names for the columns 
    names(df) <- c("name", "price", "date") 

    #spell it out so that it gets returned 
    df

有了：

# both vecors into a dataframe 
    tryCatch({ 
     data.frame(name=pnames, price=price, date=unlist(archivedate)) 
    }, error=function(e) data.frame(name=character(0), 
            price=numeric(0), 
            date=character(0)))

然後，

result <- lapply(absoluteaddress, scrapearchive) 
head(do.call(rbind, result)) 
#           name price  date 
# 1 Caran d'Ache Bicolor Pencil 999 Blue + Red £2.00 20140518 
# 2   Caran d'Ache JASS Chalk Pencil 2152 £1.91 20140518 
# 3     Caran d'Ache Pencil Extender £2.84 20140518 
# 4    Caran d'Ache Technalo Pencil 779 £2.60 20140518 
# 5   Caran d'Ache Technograph Pencil 777 £2.60 20140518 
# 6 Caran d'Ache Technograph Pencil 777 Set of 12 £27.85 20140518

來源

2015-06-23 19:05:16 jenesaisquoi

謝謝你。我不知道爲什麼它不適合我的功能。我將在原始問題結尾處發佈我的代碼（在此發佈的時間太長）。 – memm

您編輯的答案正是我需要的。太棒了！非常感謝。 – memm

r - 嘗試將函數寫入同一個數據幀

回答

相關問題