python crawler extract url not working

我試圖用imdb包編寫一個簡單的python程序從他們的數據庫中提取電影信息，但我不知道爲什麼代碼返回空列表。我的猜測是，我從網站中提取網址信息（通過使用（。*？））的方式是錯誤的。我想從網頁中提取一個url鏈接。這是代碼。謝謝！python crawler extract url not working

import urllib 
import re 
import imdb 
imdb_access = imdb.IMDb() 

top_num = 5 

movie_list = ["The Matrix","The Matrix","The Matrix","The Matrix","The Matrix"] 


for x in xrange(0,top_num): 
    contain = imdb_access.search_movie(movie_list[x]) 

    ID = contain[0].movieID #str type 

    htmltext = (urllib.urlopen("http://www.imdb.com/title/tt0133093/?ref_=nv_sr_1")).read() 
    # a pattern in the website 
    regex = regex = '<img alt="The Matrix Poster" title="The Matrix Poster" src="(.*?)" itemprop="image">' 
    pattern = re.compile(regex) 
    #print (str((pattern))) 
    result = re.findall(pattern,htmltext) 
    print result 
    #print type(htmltext)

來源

2016-03-23 781850685

我認爲這個問題是與新線，你可以有（。* \ n *。*？）

來源

2016-03-23 04:08:20

感謝，但它仍然給我相同的結果。 – 781850685

你可以嘗試正則表達式爲'' –

嗨，謝謝。這確實會返回圖片的網址，但它是網站上的錯誤圖片。我正在尋找特定代碼行內的網址。我可以在它前面添加「The Matrix Poster」嗎？ – 781850685

python crawler extract url not working

回答

相關問題