TypeError：在網頁上使用re.findall文本時的預期字符串 - 爲什麼？

我正試圖學習如何使用BeautifulSoup進行屏幕刮擦。TypeError：在網頁上使用re.findall文本時的預期字符串 - 爲什麼？

from urllib import urlopen 
from BeautifulSoup import BeautifulSoup 
import re 

webpage = urlopen('http://feeds.feedburner.com/zenhabits').read() 

patFinderTitle = re.compile('<h4 class="itemtitle"><a href=(.*)</a></h4>') 

findPatTitle = re.findall(patFinderTitle,webpage) 
listIterator = [] 
listIterator[:] = range(1, 5) 

for i in listIterator: 
    print findPatTitle[i] 
    print("\n")

錯誤

Traceback (most recent call last): 
File "//da-srv1/users/xxxxx/Desktop/fetcher", line 14, in <module> 
print findPatTitle[i] 
**IndexError: list index out of range**

來源

2011-05-16 jerry

'urlopen（'http://feeds.feedburner.com/zenhabits'）.read'是方法的名稱。我懷疑你要找的是'urlopen（'http://feeds.feedburner.com/zenhabits'）.read（）'。 – inspectorG4dget 2011-05-16 20:49:19

使用下面的表達式：

patFinderTitle.findall(webpage)

由於re.findall只接受正則表達式作爲字符串，因此您無法執行re.findall(re.compile(<expression>), <string>)的等效操作，而re.compile(<expression>)會返回已編譯的正則表達式對象。所以你需要使用你編譯的正則表達式對象patFinderTitle並且調用它的findall()方法（見上）。

編輯：哦。結果你可以做re.findall(re.compile(<expression>), <string>)。你懂得越多。

來源

2011-05-16 20:46:31 bluepnume

我試過了，但它出現了這個：TypeError：findall（）至少需要2個參數（給出1） – jerry 2011-05-16 20:48:21

@Jerry，發佈你的新代碼。 – 2011-05-16 20:48:53

Jerry：顯然不是 - re.compile（）.findall（）絕對只需要至少一個參數。 – bluepnume 2011-05-16 20:51:10

您在read（）函數調用中省略了括號，因此網頁是函數而不是字符串。

webpage = urlopen('http://feeds.feedburner.com/zenhabits').read()

來源

2011-05-16 20:49:34 kefeizhou

謝謝，這解決了它。不幸的是，現在我得到這個錯誤：Traceback（最近一次調用最後一次）：打印findPatTitle [i] IndexError：文件「// da-srv1/users/mwseymour/Desktop/cnnfetcher」，第14行，列表索引超出範圍。 – jerry 2011-05-16 20:52:21

@jerry：沒有匹配你的正則表達式模式。 findPatTitle是[]。 – kefeizhou 2011-05-16 21:06:39

TypeError：在網頁上使用re.findall文本時的預期字符串 - 爲什麼？

回答

相關問題