查找文件中的鏈接，不斷重複相同的鏈接

我對Python有點新，但是我已經拿到了HS級Java類。我正在嘗試編寫一個Python腳本，它會將我所有Humble Bundle下載頁面中的所有洪流鏈接吐出並放入一個.txt文件中。我目前正試圖讓它讀取所有這些文件並將其打印出來，但似乎無法讓它看起來超過第一個。我嘗試了一些不同的循環，其中一些循環吐出一次，另一些循環不斷地吐出同一個循環。這是我的代碼。查找文件中的鏈接，不斷重複相同的鏈接

f = open("Humble Bundle.htm").read() 

pos = f.find('torrents.humblebundle.com') #just to initialize it for the loop 
end = f.find('.torrent') #same here 

pos1 = f.find('torrents.humblebundle.com') #first time it appears 
end1 = f.rfind('.torrent') #last time it appears 
while pos >= pos1 and end <= end1: 
    pos = f.find('torrents.humblebundle.com') 
    end = f.find('.torrent') 
    link = f[pos:end+8]#the link in String form 
    print(link)

我想在當前問題和如何繼續到最終腳本的幫助。這是我在這裏的第一篇文章，但我在放棄和尋求幫助之前已經研究過我能做的。謝謝你的時間。

來源

2013-09-21 fredbob3

解析HTML與HTML解析器像BeautifulSoup或LXML。將HTML作爲一個字符串處理會使它比需要的複雜得多。 – Blender

即使我有你的問題的答案，我同意@Blender。你的解決方案是快速和骯髒的，檢查更好的實現。 – laltin

您可以在http://docs.python.org/2/library/string.html#string.find

找到find方法的詳細信息的問題是，當你執行這兩條線路，他們總是pos和end返回相同的值，因爲函數總是得到相同的論點。

pos = f.find('torrents.humblebundle.com') 
end = f.find('.torrent')

find方法有另一個名爲開始可選參數，它告訴函數從哪裏開始尋找指定的字符串。所以，如果你改變你的代碼：

pos = f.find('torrents.humblebundle.com', pos+1) 
end = f.find('.torrent', end+1)

它應該工作

來源

2013-09-21 20:42:44 laltin

這起作用。非常感謝。我不能投票，但我會。 – fredbob3

你可以在這裏嘗試正則表達式：

import re 

f = open('Humble Bundle.htm').read() 
pattern = re.compile(r'torrents\.humblebundle\.com.*\.torrent') 
print re.findall(pattern, f)

來源

2013-09-21 20:53:50 pkacprzak

查找文件中的鏈接，不斷重複相同的鏈接

回答

相關問題