2014-01-29 155 views
-1
<form method="post" action="/M740/Biography/History/Drama/12+Years+a+Slave"> 
    <input type="image" src="/public_site/webroot/cache/imdb/2024544_100.jpg" width="100" style="float:right;margin-left:2px;"> 
    <strong><span style="color: rgb(255, 69, 0);">12 Years a Slave</span></strong> 
    <br> 
    In the antebellum United States, Solomon Northup, a free black man from upstate New York, is abducted and sold into slavery.<br> 

    <br><strong>Century Cinemax - Junction</strong><br> 

    <a href="tel:0774136246">0774136246</a> 

     <a href="tel:0208022073">0208022073</a> 

    <br> 
    12:10, 19:10, 21:40<br> 

     <br><strong>Fox Cineplex Sarit</strong><br> 

    <a href="tel:0203753025">0203753025</a> 

    <a href="tel:0720366208">0720366208</a> 

    <br> 
     11:00, 14:00, 18:00, 20:40<br> 

    <br><strong>Planet Media - Kisumu </strong><br> 

    <a href="tel:0731999100">0731999100</a> 

     <a href="tel:0724999100 &amp; 0202629388">0724999100 &amp; 0202629388</a> 

    <br> 
    12:00, 14:30, 20:30<br> 

    <br> 
    <input type="hidden" name="cinema" value="0"> 
    <input type="hidden" name="searchMovie" value="0"> 
     <input type="hidden" name="movie" value="740"> 
    <input type="hidden" name="date" value="0"> 
    <input type="hidden" name="groupId" value="0"> 
    <input type="submit" name="ok" value="Further Details"> 
</form> 

好吧,這只是我試圖解析使用Nokogiri的一部分HTML。 html中的語義並不完整,我正在用Nokogiri獲得想要的內容。作爲參考,這是我想要廢除的網站(http://flix.co.ke/Frontpage/Listings解析內容不在html標籤Nokogiri

到目前爲止,我能夠獲得電影的標題,一個電影院和兩個電話號碼,但與我的方法我不能真正得到所有內容所需

這是我使用

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

url = "http://flix.co.ke/Frontpage/Listings" 
doc = Nokogiri::HTML(open(url)) 

doc.css(".min-width div form").each do |entry| 
    title = entry.at_css("span").text 
    puts title 

    cinema = entry.at_css("br+ strong").text 
    puts cinema 

    phone = entry.at_css("a").text 
    puts phone 

    puts entry.at_css("a").next_element.text 
end 

有了這個我目前的劇本我只能夠得到電影的titleone cinematwo contact numbers所以我的樣本輸出的模樣。

12 Years a Slave 
Century Cinemax - Junction 
0774136246 
0208022073 

47 Ronin 3D 
Century Cinemax - Junction 
0774136246 
0208022073 

Delivery Man 
Century Cinemax - Junction 
0774136246 
0208022073 

Frozen 
Century Cinemax - Junction 
0774136246 
0208022073 

(continued...) 

有,只是在休息標記後稱號後的描述,我無法得到這一點,並我怎麼通過
標籤內的所有電影院循環?以及逗號分隔的電話號碼和個人演出時間。

我只是不知道從哪裏開始。我會想取得這樣的成績對於這種情況

  • 12年從

  • 在戰前美國,所羅門·諾薩普,一個自由的黑人男子從紐約州北部,被綁架並賣入奴隸制。

  • 世紀Cinemax的 - 結 12:10,19:10,21:40
  • 福克斯影城沙立 11:00,14:00,18:00,20:40

etc

任何幫助將不勝感激。在此先感謝

+2

包含有效的HTML片段,而不是提取。爲了幫助你,我們必須跳過籃球。 –

回答

0

電影院你循環html真的不是那麼糟糕,並且你在br + strong的正確軌道上,這就是你想要迭代的東西:

doc.search('.min-width div form').each do |form| 
    title = form.at('span').text 
    description = form.at('br').next.text 

    form.search('br + strong').each do |el| 
    cinema = el.text 
    phones = [] 
    while next_el = el.at('+ a', '+ br + a') 
     el = next_el 
     phones << el.text 
    end 
    times = el.at('+ br').next.text   
    end 
end 
+0

我不能強調這是多麼有幫助。謝謝一堆! ;-) –

1

這是可怕的HTML:/它是無效的451錯誤和9警告。沒有語義,所以你必須依靠可能會改變的結構,打破你的刮擦。

然而,你可以通過使用同級方法獲得每一種:

doc.css('.min-width div form').each do |node| 
    description = node.at_css('br').next_sibling.text 
    puts description.strip 
    puts '-'*10 
end 

# >> In the antebellum United States, Solomon Northup, a free black man from upstate New York, is abducted and sold into slavery. 
# >> ---------- 
# >> A band of samurai set out to avenge the death and dishonor of their master at the hands of a ruthless shogun. 
# >> ---------- 
# >> An affable underachiever finds out he's fathered 533 children through anonymous donations to a fertility clinic 20 years ago. Now he must decide whether or not to come forward when 142 of them file a lawsuit to reveal his identity. 
# >> ---------- 
# >> Fearless optimist Anna teams up with Kristoff in an epic journey, encountering Everest-like conditions, and a hilarious snowman named Olaf in a race to find Anna's sister Elsa, whose icy powers have trapped the kingdom in eternal winter. 
# >> ---------- 
# >> A medical engineer and an astronaut work together to survive after an accident leaves them adrift in space. 
# >> ---------- 
# >> A pair of aging boxing rivals are coaxed out of retirement to fight one final bout -- 30 years after their last match. 
# >> ---------- 
# >> 
# >> ---------- 
# >> Harrison, overworked and underpaid is looking for money for bride price. A 'business' opportunity presents itself when he gets the keys to the Company house. With the CEO away on holiday, he has access to a vacant fully furnished house. He ... 
# >> ---------- 
# >> 
# >> ---------- 
# >> A chronicle of Nelson Mandela's life journey from his childhood in a rural village through to his inauguration as the first democratically elected president of South Africa. 
# >> ---------- 
# >> Author P. L. Travers reflects on her difficult childhood while meeting with filmmaker Walt Disney during production for the adaptation of her novel, Mary Poppins. 
# >> ---------- 
# >> The Manzoni family, a notorious mafia clan, is relocated to Normandy, France under the witness protection program, where fitting in soon becomes challenging as their old habits die hard. 
# >> ---------- 
# >> The dwarves, along with Bilbo Baggins and Gandalf the Grey, continue their quest to reclaim Erebor, their homeland, from Smaug. Bilbo Baggins is in possession of a mysterious and magical ring. 
# >> ---------- 
# >> The film begins as Katniss Everdeen has returned home safe after winning the 74th Annual Hunger Games along with fellow tribute Peeta Mellark. Winning means that they must turn around and leave their family and close friends, embarking on a ... 
# >> ---------- 
# >> A day-dreamer escapes his anonymous life by disappearing into a world of fantasies filled with heroism, romance and action. When his job along with that of his co-worker are threatened, he takes action in the real world embarking on a global ... 
# >> ---------- 
# >> Faced with an enemy that even Odin and Asgard cannot withstand, Thor must embark on his most perilous and personal journey yet, one that will reunite him with Jane Foster and force him to sacrifice everything to save us all. 
# >> ---------- 
# >> A journey into the lives of a mother polar bear and her two seven-month-old cubs as they navigate the changing Arctic wilderness they call home. 
# >> ---------- 
# >> See and feel what it was like when dinosaurs ruled the Earth, in a story where an underdog dino triumphs to become a hero for the ages. 
# >> ---------- 

通過使用以css代替at_css(您通過表單元素循環例如方式相同)

+0

好多了! – Bala