從字符串中刪除格式

我想用BeautifulSoup從網上解析一些數據。到目前爲止，我已經得到了我使用下面的代碼從表中所需要的數據：從字符串中刪除格式

def webParsing(canvas): 
url='http://www.cmu.edu/dining/hours/index.html' 
try: 
    page= urllib.urlopen(url) 
except: 
    print 'Error while opening html file. Please ensure that you', 
    print ' have a working internet connection.' 
    return 
sourceCode=page.read() 
soup=BeautifulSoup(sourceCode) 
#heading=soup.html.body.div 
tableData=soup.table.tbody 
parseTable(canvas,tableData) 
def parseTable(canvas,tableData): 
    canvas.data.hoursOfOperation=dict() 
    rowTag='tr' 
    colTag='td' 
    for row in tableData.find_all(rowTag): 
     row_text=[] 
     for item in row.find_all(colTag): 
      text=item.text.strip() 
      row_text.append(text) 
     (locations,hoursOpen)=(row_text[0],row_text[1]) 
     locations=locations.split(',') 
     for location in locations: 
      canvas.data.hoursOfOperation[location]=hoursOpen 
    print canvas.data.hoursOfOperation

正如你可以看到，在「項目」中的第一列映射到那些在第二列，使用字典。數據幾乎完全是我打印時想要的數據，但是在Python中，這些字符串中有很多格式，如'\ n'或'\ xe9'或'\ n \ xao'。有什麼辦法可以刪除所有的格式？換句話說，刪除所有換行符，代表特定編碼的任何內容，代表重音字符的任何內容，以及獲取字符串文字？我不需要最高效或安全的方法，我是一個初學者程序員，所以最好最簡單的方法，將不勝感激！謝謝！

來源

2013-11-25 user3029704

這裏有一個竅門：你可以把它編碼爲ascii，並刪除所有的休息：

>>> 'abc\xe9'.encode('ascii', errors='ignore') 
b'abc'

編輯：

啊，我忘了你不想要標準的特殊字符。使用此代替：

''.join(s for s in string if ord(s)>31 and ord(s)<126)

希望這有助於！

來源

2013-11-25 02:06:16 aIKid

這會刪除換行符嗎？ – ton1c

從這個question你可以嘗試sometthing這樣的：

def removeNonAscii(s): return "".join(i for i in s if ord(i)<126 and ord(i)>31)

來源

2013-11-25 02:06:00 ton1c

從字符串中刪除格式

回答

相關問題