2012-06-04 26 views
7

我想製作一個程序,將打開一個目錄,然後使用正則表達式來獲取powerpoint的名稱,然後在本地創建文件並複製其內容。當我運行它時,它似乎工作,但是當我真的嘗試打開文件時,他們一直說版本是錯誤的。Python urllib下載一個在線目錄的內容

from urllib.request import urlopen 
import re 

urlpath = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') 
string = urlpath.read().decode('utf-8') 

pattern = re.compile('ch[0-9]*.ppt') #the pattern actually creates duplicates in the list 

filelist = pattern.findall(string) 
print(filelist) 

for filename in filelist: 
    remotefile = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/' + filename) 
    localfile = open(filename,'wb') 
    localfile.write(remotefile.read()) 
    localfile.close() 
    remotefile.close() 
+2

您應該**從不**使用RegEx解析HTML,請參閱http://stackoverflow.com/a/1732454/851737。使用像lxml或BeautifulSoup這樣的HTML解析庫。 – schlamar

+0

BeautifulSoup它。感謝您的推薦。 – davelupt

回答

8

此代碼適用於我。我只是稍微修改了一下,因爲你的每個ppt文件都是複製的。

from urllib2 import urlopen 
import re 

urlpath =urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') 
string = urlpath.read().decode('utf-8') 

pattern = re.compile('ch[0-9]*.ppt"') #the pattern actually creates duplicates in the list 

filelist = pattern.findall(string) 
print(filelist) 

for filename in filelist: 
    filename=filename[:-1] 
    remotefile = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/' + filename) 
    localfile = open(filename,'wb') 
    localfile.write(remotefile.read()) 
    localfile.close() 
    remotefile.close() 
+0

謝謝你,你是冠軍。 – davelupt

+0

查看我的評論[上述](http://stackoverflow.com/questions/10875215/python-urllib-downloading-contents-of-an-online-directory#comment14174956_10875215)爲downvote的原因。 – schlamar

+0

這是驚人的,謝謝 – Anuj