2016-12-06 52 views
0

我需要遍歷給定目錄中的.html文件,並從中刪除數據。到目前爲止,這是我的代碼,我將如何訪問裏面的腳本?迭代並使用目錄中的HTML文件 - python

import os 
directory ='/Users/xxxxx/Documents/sample/' 
for filename in os.listdir(directory): 
    if filename.endswith('.html'): 
     print(os.path.join(directory,filename)) 
    else: 
     continue 

(操作系統:Mac/Python3.x)

回答

1

你可以做這樣的事情:

import os 
from bs4 import BeautifulSoup 

directory ='/Users/xxxxx/Documents/sample/' 
for filename in os.listdir(directory): 
    if filename.endswith('.html'): 
     fname = os.path.join(directory,filename) 
     with open(fname, 'r') as f: 
      soup = BeautifulSoup(f.read(),'html.parser') 
      # parse the html as you wish