用美麗的湯在Python屬性「內容」

我使用的是從布爾「你好Python的！」下面的代碼：用美麗的湯在Python屬性「內容」

import urllib2 
from bs4 import BeautifulSoup 
import os 

def get_stock_html(ticker_name): 
    opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(),urllib2.HTTPHandler(debuglevel=0),) 
    opener.addhaders = [('User-agent', "Mozilla/4.0 (compatible; MSIE 7.0; " "Windows NT 5.1; .NET CLR 2.0.50727; " ".NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)")] 
    url = "http://finance.yahoo.com/q?s=" + ticker_name 
    response = opener.open(url) 
    return ''.join(response.readlines()) 

def find_quote_section(html): 
    soup = BeautifulSoup(html) 
    # quote = soup.find('div', attrs={'class': 'yfi_rt_quote_summary_rt_top'}) 
    quote = soup.find('div', attrs={'class': 'yfi_quote_summary'}) 
    return quote 

def parse_stock_html(html, ticker_name): 
    quote = find_quote_section(html) 
    result = {} 
    tick = ticker_name.lower() 

    result['stock_name'] = quote.find('h2').contents[0] 

if __name__ == '__main__': 
    os.system("clear") 
    html = get_stock_html('GOOG') 
    # print find_quote_section(html) 
    print parse_stock_html(html, 'GOOG')

得到以下錯誤：

Traceback (most recent call last): 
    File "dwlod.py", line 33, in <module> 
    print parse_stock_html(html, 'GOOG') 
    File "dwlod.py", line 25, in parse_stock_html 
    result['stock_name'] = quote.find('h2').contents[0] 
AttributeError: 'NoneType' object has no attribute 'contents'

我一個新手，真的不知道該怎麼做。這本書錯了嗎？

ADDED

我剛剛更換result['stock_name'] = quote.find('h2').contents[0]有：

x = BeautifulSoup(html).find('h2').contents[0] 
return x

現在，沒有東西返回，但錯誤不再出現。。那麼，原始的python語法有什麼問題嗎？

來源

2012-09-01 dwstein

很可能API在書寫和現在變更之間 –

@DavidRobinson謝謝！我在哪裏可以找到正確的API？ – dwstein

好問題。我建議你看看雅虎頁面（它所查詢的URL）返回的內容（我會在移動設備上）。你是否熟悉BeautifulSoup來改變它？ –

雖然雅虎財務在一段時間內並沒有真正改變他們的佈局，但似乎他們可能已經稍微調整了自從該書發佈後，你需要的信息，如包含股票代碼的h2信息可以在yfi_rt_quote_summary這是位於的yfi_quote_summary

def find_quote_section(html): 
    soup = BeautifulSoup(html)   
    quote = soup.find('div', attrs={'class': 'yfi_rt_quote_summary'}) 
    return quote

頂部的容器也要注意，我們需要回到result如果我們想打印任何明智返回None東西：

def parse_stock_html(html, ticker_name): 
    quote = find_quote_section(html) 
    result = {} 
    tick = ticker_name.lower() 
    result['stock_name'] = quote.find('h2').contents[0] 
    return result 

>>> print parse_stock_html(html, 'GOOG') 
{'stock_name': u'Google Inc. (GOOG)'} 
>>>

順便說一句，find只是找到第一個匹配。

>>> help(BeautifulSoup(html).find) 
find(self, name=None, attrs={}, recursive=True, text=None, **kwargs) method of BeautifulSoup.BeautifulSoup instance 
    Return only the first child of this Tag matching the given 
    criteria.

這似乎是空的，BeautifulSoup也有findall返回所有匹配。

>>> BeautifulSoup(html).findAll('h2')[3].contents[0] 
u'Google Inc. (GOOG)'

似乎第四值是我們正在尋找...仍是一個，我敢肯定，你不這樣做，但請每次不解析整個文件，這可能是相當昂貴的。

來源

2012-09-01 03:17:46

用美麗的湯在Python屬性「內容」

回答

相關問題