我寫的代碼,擦傷和分析本網站=> www.africancollective.come /眉毛/非洲文學/小說使用while循環與引入nokogiri導航到
require 'ruby gems'
require 'nokogiri'
require 'open-uri'
require 'ap'
require 'debugger'
require 'csv'
#collect all the authors, books, ISBN, publisher info
#====================================================
url = 'http://www.africanbookscollective.com/browse/african-literature/fiction'
page = Nokogiri::HTML(open(url))
# create an array for every book content on each page that has element of form
# [<ISBN Number>, <Book Pages>, <Book Dimensions>, <First Published>, <Publisher>,<CoverType>]
# save array into a csv file with the columns of:
# <ISBN Number> <Book Pages> <Book Dimensions> <First Published> <Publisher> <CoverType>
# opens a csv file and shovels column titles into the first row
CSV.open("bookinfo.csv", "w+") do |csv|
csv << ["ISBN Number", "Book Pages", "Book Dimensions", "First Published", "Publisher", "CoverType"]
end
# initializes another_page and page_num varaibles
page_num = 0
# the while loop runs as long as the statement below evaluates to true
#while page_num < 390
new_page = Nokogiri::HTML(open("http://www.africanbookscollective.com/browse/african-studies?b_start:int=#{page_num+10}&-C="))
# search for the context-details of each book
books = page.css('p.context-details').map do |book|
book.text.gsub(/\s{2,}/, "").chomp.split(" |")
end
#appends context-details onto the csv we already created
CSV.open("bookinfo.csv", "a+") do |csv|
books.each do |book|
csv << book
end
end
page_num += 10
#end
enter code here
此代碼的信息只在第一頁上給我提供信息;它沒有抓住所有其餘的頁面(1 - 38)。我認爲這與我的while循環的結構有關,對吧?
爲什麼是不是在繼續使用格式在NEW_PAGE提供的字符串插值 下頁?
謝謝
它應該是' 「http://www.africanbookscollective.com/browse/african-studies?b_start:int=#{page_num} & -C =」' –
嘗試張貼之前的建議,它沒有工作。 – Uzzar
'books = page.css('p.context-details')'change to'books = new_page.css('p.context-details')' –