2015-01-13 95 views
2
import os 
from bs4 import BeautifulSoup 

do = dir_with_original_files = 'C:\Users\ADMIN\Desktop\\new_folder' 
dm = dir_with_modified_files = 'C:\Users\ADMIN\Desktop\\new_folder\\test' 

for root, dirs, files in os.walk(do): 
    for f in files: 
     print f.title() 
     if f.endswith('~'): #you don't want to process backups 
      continue 
     original_file = os.path.join(root, f) 
     mf = f.split('.') 
     mf = ''.join(mf[:-1])+'_mod.'+mf[-1] # you can keep the same name 
              # if you omit the last two lines. 
              # They are in separate directories 
              # anyway. In that case, mf = f 
     modified_file = os.path.join(dm, mf) 
     with open(original_file, 'r') as orig_f, \ 
      open(modified_file, 'w') as modi_f: 
      soup = BeautifulSoup(orig_f.read()) 

      for t in soup.find_all('td', class_='findThisClass'): 
       for child in t.find_all("font"): 
        if child.string is not None: 
         child.string.wrap(soup.new_tag('h2')) 
      for t in soup.find_all('table', class_='tableClass'): 
       t.extract() 
      # This is where you create your new modified file. 
      modi_f.write(soup.prettify().encode(soup.original_encoding)) 

特定屬性選擇標籤此代碼將找到類<td class=findThisClass>內的所有<font>標籤和那些字體標籤中添加。在BeautifulSoup/Python的

我想什麼做的是找到與此標記所有的HTML:

<font color="#333333" face="Verdana" size="3" style="font-weight: bold; background-color: rgb(255, 255, 255);"> 

什麼是這樣做的最好的方式,如果:

(一)我放心,字體將始終遵循相同的形式(在同一順序的所有屬性,按Ctrl + F以該字符串會發現所有的比賽,我想):

<font color="#333333" face="Verdana" size="3" style="font-weight: bold; background-color: rgb(255, 255, 255);"> 

(B)如果我想即使屬性順序圍繞例如關它的工作:

<font color="#333333" face="Verdana" size="3" style="font-weight: bold; background-color: rgb(255, 255, 255);"> 

而且還改變

<font face="Verdana" color="#333333" size="3" style="font-weight: bold; background-color: rgb(255, 255, 255);"> 

非常感謝。

回答

2

提供的attrs字典具有特定的值:

t.find_all("font", attrs={'face': 'Verdana', 'color': '#333333', 'size': '3', 'style': 'font-weight: bold; background-color: rgb(255, 255, 255);'})