需要關於網頁抓取中的字符串匹配的幫助，python

我嘗試從網頁中提取一些東西。並且首先，我用BeautifulSoup提取一個名爲「得分」的DIV，其中包括幾個相似圖片需要關於網頁抓取中的字符串匹配的幫助，python

<img class="sprite-rating_s_fill rating_s_fill s45" src="http://e2.tacdn.com/img2/x.gif" alt="4.5 of 5 stars">

我要提取的比分在此圖像中，這種情況下，它是「4.5」。所以我嘗試做了這種方式：

pattern = re.compile('<img.*?alt="(.*?) of 5 stars">', re.S) 
items = re.findall(pattern, scores)

但它不工作。我是新來的網絡抓取，所以任何人都可以幫助我呢？

2015-04-05 dec

BeautifulSoup實際上可以很容易地從這樣的標籤中提取信息！假設scores是BeautifulSoup Tag對象（你可以閱讀有關in their documentation），你想要做的是提取從標籤src屬性：

src = scores['src']

對於你剛纔給的例子，src應該在u'4.5 out of 5 stars'。現在，你只需要剝離出' out of 5 stars'：

removeIndex = src.index(' out of 5 stars') 
score = src[:removeIndex]

而且你會留下的'4.5'一個score。（如果你想操縱它作爲一個數字，你必須做score = float(score)。

2015-04-05 02:40:59

它的工作，非常感謝你。請你也提供一些關於我匹配字符串的方式的建議？我仍然想圖爲什麼它是錯的 – dec 2015-04-06 17:22:39

回答