2016-03-05 139 views
2

我試圖用python和lxml來刮Google新聞。一切都很順利,但當我嘗試使用for循環打印每個div數據時,一切都變得糟糕起來。 這裏我的代碼:用lxml和python颳去Google新聞

# -*- coding: utf-8 -*- 

from stem import Signal 
from stem.control import Controller 
from lxml import html 
from lxml import cssselect 
from lxml import etree 
import requests 

proxies = { 
    'http' : 'http://127.0.0.1:8123' 
} 

headers = { 
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36' 
} 

url = "https://www.google.it/search?hl=en&tbm=nws&as_occt=any&tbs=cdr:1,cd_min:9/1/2014,cd_max:9/1/2014,sbd:1&as_nsrc=Daily%20Mail&start=0" 

page = requests.get(url,proxies=proxies,headers=headers) 
tree = html.fromstring(page.content) 
results = tree.xpath('//div[@class="_cnc"]') 

for div in results: 
    print(div) 

我得到這樣的輸出:

<Element div at 0x7f4154df9470> 
<Element div at 0x7f4154df94c8> 
<Element div at 0x7f4154df9520> 
<Element div at 0x7f4154df9578> 
<Element div at 0x7f4154df95d0> 
<Element div at 0x7f4154df9628> 
<Element div at 0x7f4154df9680> 
<Element div at 0x7f4154df96d8> 
<Element div at 0x7f4154df9730> 
<Element div at 0x7f4154df9788> 

我想從每個DIV提取 - >標題,href和片段,像這樣的東西:

.... 

for div in results: 
    title = div.xpath('//a[@class="l _HId"]/text()') 
    href = div.xpath('//a[@class="l _HId"]/@href') 
    snippet = div.xpath('//div[@class="st"]/text()') 
    #for example 
    print(title) 
.... 

當我嘗試打印時,我得到相同的多個輸出:

['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for '] 

有人知道我的代碼有什麼問題嗎?

回答

0

你就要成功了 - 只是前面加上點到內部XPath表達式,使之具體上下文當前節點的

for div in results: 
    title = div.xpath('.//a[@class="l _HId"]/text()') 
    href = div.xpath('.//a[@class="l _HId"]/@href') 
    snippet = div.xpath('.//div[@class="st"]/text()') 
    #for example 
    print(title) 
+0

我真的要感謝你。這是關鍵。 –