您的問題/問題是不明確的。
如果我是正確的,你想提取地址細節,在後面寫下「小企業關注:(公司名稱,郵件地址,城市/州/郵編,電話)」,對吧?如果是這樣,那麼
url <- "https://sbir.nasa.gov/SBIR/abstracts/17-1.html"
abstracts_page <- readLines(url)
abstracts_page <- gsub("<.*?>", "", abstracts_page)
abstracts_page <- gsub("\\t+", "", abstracts_page)
address_header_index <- grep("SMALL BUSINESS CONCERN:", abstracts_page)
address_list <- lapply(address_header_index, function(i) {
return(abstracts_page[(i + 2):(i + 6)])
})
address_list <- data.frame(do.call("rbind", address_list))
head(address_list)
# X1 X2 X3
# 1 Transition45 Technologies, Inc. 1739 North Case Street Orange, CA
# 2 ATSP Innovations 60 Hazelwood Drive Champaign, IL
# 3 Cornerstone Research Group, Inc. 2750 Indian Ripple Road Dayton, OH
# 4 Interdisciplinary Consulting Corporation 5745 Southwest 75th Street, #364 Gainesville, FL
# 5 CFD Research Corporation 701 McMillian Way Northwest, Suite D Huntsville, AL
# 6 LaunchPoint Technologies, Inc. 5735 Hollister Avenue, Suite B Goleta, CA
# X4 X5
# 1 92865-4211 (714) 283-2118
# 2 61820-7460 (217) 417-2374
# 3 45440-3638 (937) 320-1877
# 4 32608-5504 (352) 283-8110
# 5 35806-2923 (256) 726-4800
# 6 93117-6410 (805) 683-9659
請_edit_您的問題,並告訴我們一個你想要做的最小樣本。你的源文件很混亂,我不確定你當前的邏輯是否可行。另外,我可能不會使用R,因爲我會使用Java或者Perl之類的東西。 –