我想製作一個程序,將打開一個目錄,然後使用正則表達式來獲取powerpoint的名稱,然後在本地創建文件並複製其內容。當我運行它時,它似乎工作,但是當我真的嘗試打開文件時,他們一直說版本是錯誤的。Python urllib下載一個在線目錄的內容
from urllib.request import urlopen
import re
urlpath = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/')
string = urlpath.read().decode('utf-8')
pattern = re.compile('ch[0-9]*.ppt') #the pattern actually creates duplicates in the list
filelist = pattern.findall(string)
print(filelist)
for filename in filelist:
remotefile = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/' + filename)
localfile = open(filename,'wb')
localfile.write(remotefile.read())
localfile.close()
remotefile.close()
您應該**從不**使用RegEx解析HTML,請參閱http://stackoverflow.com/a/1732454/851737。使用像lxml或BeautifulSoup這樣的HTML解析庫。 – schlamar
BeautifulSoup它。感謝您的推薦。 – davelupt