1
在下面的代碼中,每次嘗試下載PDF時,handle
部分似乎都會給我一個錯誤。在R中下載PDF所需的幫助
url <- "http://brocktonpolice.com/wp-content/uploads/"
filename <- paste0(format(AllDays, '%Y/%m/%m%d%Y'), '.pdf')
filenames_list <- str_extract_all(filenames, 'uploads.+pdf')
downloadPDF <- function(filename, baseurl, folder, handle){
dir.create(folder, showWarnings = FALSE)
fileurl <- str_c(baseurl, filename)
if (!file.exists(str_c(folder,"/",filename))) {
content <- getBinaryURL(fileurl, curl = handle)
writeBin(content, str_c(folder,"/",filename))
Sys.sleep(1)
}
}
handle <- getCurlHandle(useragent = str_c(R.version$platform,
R.version.string, sep = ", "),
httpheader = c(from = "[email protected]"))
l_ply(filenames_list, downloadPDF,
baseurl = "http://brocktonpolice.com/wp-content/uploads/",
folder = "Police_logs")
我對如何下載這些PDF的想法已經用完了。 以下是我如何生成指向所有PDF的鏈接。
prefix <- "http://brocktonpolice.com/wp-content/uploads/"
AllDays <- seq.Date(from = as.Date('2015-01-01'), to = Sys.Date(), by = "day")
links <- paste0(prefix, format(AllDays, '%Y/%m/%m%d%Y'), '.pdf')
print(links)
ps:如果您可以想出其他方式下載PDF文件,請分享您的代碼。
注意某些URL可能會給出一個錯誤,因爲有時日和月不加零的時候都小於10