2016-09-16 52 views
2

我在這段代碼中遇到了麻煩,在這段代碼中,我想列出所列公司的4個股票價格。我的問題是,雖然我運行它時沒有錯誤,但代碼僅在股票價格應該去的地方打出空括號。這是我混亂的根源。Canopy上用Python進行網頁掃描

import urllib2 
import re 

symbolslist = ["aapl","spy","goog","nflx"] 
i = 0 

while i<len(symbolslist): 
    url = "http://money.cnn.com/quote/quote.html?symb=' +symbolslist[i] + '" 
    htmlfile = urllib2.urlopen(url) 
    htmltext = htmlfile.read() 
    regex = '<span stream='+symbolslist[i]+' streamformat="ToHundredth" streamfeed="SunGard">(.+?)</span>' 
    pattern = re.compile(regex) 
    price = re.findall(pattern,htmltext) 
    print "the price of", symbolslist[i], " is ", price 
    i+=1 

回答

1

因爲你不傳遞變量:

url = "http://money.cnn.com/quote/quote.html?symb=' +symbolslist[i] + '" 
                 ^^^^^ 
                 a string not the list element 

使用str.format

url = "http://money.cnn.com/quote/quote.html?symb={}".format(symbolslist[i]) 

您也可以在列表中直接迭代,無需使用while循環,從來沒有parse html with a regex,使用HTML解析像bs4和你的正則表達式也是錯誤的。沒有stream="aapl"等。你想要的是跨度在哪裏streamformat="ToHundredth"streamfeed="SunGard";

import urllib2 
from bs4 import BeautifulSoup 

symbolslist = ["aapl","spy","goog","nflx"] 


for symbol in symbolslist: 
    url = "http://money.cnn.com/quote/quote.html?symb={}".format(symbol) 
    htmlfile = urllib2.urlopen(url) 
    soup = BeautifulSoup(htmlfile.read()) 
    price = soup.find("span",streamformat="ToHundredth", streamfeed="SunGard").text 
    print "the price of {} is {}".format(symbol,price) 

你可以看到,如果我們運行代碼:

In [1]: import urllib2 

In [2]: from bs4 import BeautifulSoup 

In [3]: symbols_list = ["aapl", "spy", "goog", "nflx"] 

In [4]: for symbol in symbols_list: 
    ...:   url = "http://money.cnn.com/quote/quote.html?symb={}".format(symbol) 
    ...:   htmlfile = urllib2.urlopen(url) 
    ...:   soup = BeautifulSoup(htmlfile.read(), "html.parser") 
    ...:   price = soup.find("span",streamformat="ToHundredth", streamfeed="SunGard").text 
    ...:   print "the price of {} is {}".format(symbol,price) 
    ...:  
the price of aapl is 115.57 
the price of spy is 215.28 
the price of goog is 771.76 
the price of nflx is 97.34 

我們得到你想要的。

+0

現在我收到此錯誤後嘗試您的代碼。它標記第11行,並說: AttributeError:'NoneType'對象沒有任何屬性'text' – Kainesplain

+0

代碼輸出包含在答案中,如果使用答案中的符號得到不同的輸出,則代碼使用不正確或出於某種原因,您沒有獲得正確的來源。告訴我,如果沒有更多上下文,就會得到一個屬性錯誤,這有點難以調試 –