我已經寫以下Python代碼來提取項目的價格從flipkart.comFlipkart.com產品的「價格」和產品使用Python
import urllib2
import bs4
import re
item="Wilco Classic Library: Autobiography Of a Yogi (Hardcover)"
item.replace(" ", "+")
link = 'http://www.flipkart.com/search/a/all?query={0}&vertical=all&dd=0&autosuggest[as]=off&autosuggest[as-submittype]=entered&autosuggest[as-grouprank]=0&autosuggest[as-overallrank]=0&autosuggest[orig-query]=&autosuggest[as-shown]=off&Search=%C2%A0&otracker=start&_r=YSWdYULYzr4VBYklfpZRbw--&_l=pMHn9vNCOBi05LKC_PwHFQ--&ref=a2c6fadc-2e24-4412-be6a-ce02c9707310&selmitem=All+Categories'.format(item)
r = urllib2.Request(link, headers={"User-Agent": "Python-urlli~"})
try:
response = urllib2.urlopen(r)
except:
print "Internet connection error"
thePage = response.read()
soup = bs4.BeautifulSoup(thePage)
firstBlockSoup = soup.find('div', attrs={'class': 'fk-srch-item'})
priceSoup=firstBlockSoup.find('b',attrs={'class':'fksd-bodytext price final-price'})
price=priceSoup.contents[0]
print price
titleSoup=firstBlockSoup.find('a',attrs={'class':'fk-srch-title-text fksd-bodytext'})
title=titleSoup.findAll('b')
print title
上述代碼時指定「標題」提取執行打印價格沒有問題。
Rs. 138
但是如下獲得標題:
[<b>Wilco</b>, <b>Classic</b>, <b>Library</b>, <b>Autobiography</b>, <b>Of</b>, <b>a</b>, <b>Yogi</b>, <b>Hardcover</b>]
其理由將是顯而易見,如果你有看看product page(使用「檢查元素」)
的源代碼現在,我如何提取適當格式的TITLE以便打印:
Wilco Classic Library: Autobiography Of a Yogi (Hardcover)
我只是不知道爲什麼我會想到這一點。謝啦! – SAGAR 2013-05-04 22:38:45
沒問題,很高興它的工作。 – 2013-05-04 22:39:22