2017-07-06 50 views
0

我使用Beautifulsoup解析html文件並檢查文本是否大寫,在這種情況下,我將它更改爲小寫。當我將輸出保存到新的html文件時,更改沒有被反映出來。有人能指出我做錯了什麼嗎?Python - 使用beautifulsoup保存更改

def recursiveChildren(x): 
    if "childGenerator" in dir(x): 
     for child in x.childGenerator(): 
      name = getattr(child, "name", None) 
      if name is not None: 
      print(child.name) 
      recursiveChildren(child) 
    else: 
     if not x.isspace(): 
     print (x) 
     if(x.isupper()): 
      x.string = x.lower() 
      x=x.replace(x,x.string) 

if __name__ == "__main__": 
    with open("\path\) as fp: 
     soup = BeautifulSoup(fp) 
    for child in soup.childGenerator(): 
     recursiveChildren(child) 
    html = soup.prettify("utf-8") 
    with open("\path\") as file: 
     file.write(html) 

回答

0

我不認爲你的方式將通過類似的標記應對:

<p>TEXT<span>More Text<i>TEXT</i>TEXT</span>TEXT</p> 

而且你想用的方法replaceWith()不能代替()。您尚未打開文件進行書寫。

這是我會這樣做的方式。

from bs4 import BeautifulSoup 

filename = "test.html" 
if __name__ == "__main__": 
    # Open the file. 
    with open(filename, "r") as fp: 
     soup = BeautifulSoup(fp, "html.parser") # Or BeautifulSoup(fp, "lxml") 
     # Iterate over all the text found in the document. 
     for txt in soup.findAll(text=True): 
      # If all the case-based characters (letters) of the string are uppercase. 
      if txt.isupper(): 
       # Replace with lowercase. 
       txt.replaceWith(txt.lower()) 
    # Write the file. 
    with open(filename, "wb") as file: 
     file.write(soup.prettify("utf-8"))