將python webcrawler從2.7轉換爲3.4

對於此代碼，我將工作的python webcrawler從2.7轉換爲3.4。我做了一些修改，但運行時仍然出現錯誤：將python webcrawler從2.7轉換爲3.4

Traceback (most recent call last): 
    File "Z:\testCrawler.py", line 11, in <module> 
    for i in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(myurl).read(), re.I): 
    File "C:\Python34\lib\re.py", line 206, in findall 
    return _compile(pattern, flags).findall(string) 
TypeError: can't use a string pattern on a bytes-like object

這是代碼本身，請告訴我，看看是什麼語法錯誤。

#! C:\python34 

import re 
import urllib.request 

textfile = open('depth_1.txt','wt') 
print ("Enter the URL you wish to crawl..") 
print ('Usage - "http://phocks.org/stumble/creepy/" <-- With the double quotes') 
myurl = input("@> ") 
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(myurl).read(), re.I): 
     print (i) 
     for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(i).read(), re.I): 
       print (ee) 
       textfile.write(ee+'\n') 
textfile.close()

來源

2014-09-19 user2167980

您需要解碼從'read'到'str'的響應。 – roippi 2014-09-19 17:39:10

雖然請 - 使用HTML解析器來解析HTML，而不是正則表達式。 – roippi 2014-09-19 17:39:42

變化

urllib.request.urlopen(myurl).read()

到例如

urllib.request.urlopen(myurl).read().decode('utf-8')

這裏會發生什麼事是.read()返回bytes，而不是str就像是在Python 2.7，所以它必須使用一些解碼編碼。

來源

2014-09-19 17:39:17

將python webcrawler從2.7轉換爲3.4

回答

相關問題