2013-11-20 87 views
0

我試圖從文本中刪除無效符號。我有這樣的代碼:從python中刪除無效符號

def parse_documentation(filename): 
    filename=open(filename) 
    invalidsymbols=["`","~","!", "@","#","$"] 
    for lines in filename: 
     print(lines) 
     for word in lines: 
     print(word) 
      for letter in word: 
       if invalidsymbols==letter: 
        print(letter) 

首先,我只是通過打印測試信件,然後我會添加代碼,刪除(DEL()),我比在列表中看到的更多的無效符號。但它很多,所以我想用5或6來檢查。我遇到的問題是,它不僅打印無效的符號,而且打印出我的文本中的所有字母。另外,由於某種原因,它還在我的文本之前打印了額外的字符。我如何解決這個問題?

我使用的文字是:

he's a jolly good fellow# 
I want pizza! 
I'm driving to school$ 
+1

這不是''for'如何與字符串一起工作。 –

+0

@ IgnacioVazquez-Abrams我應該如何訪問每行中的字母? – user2976821

+0

也許你應該仔細檢查'for'正在做什麼。 –

回答

0
def parse_documentation(filename): 
    filename=open(filename, "r") # open file 
    lines = filename.read(); # read all the lines in the file to a list named as "lines" 
    invalidsymbols=["`","~","!", "@","#","$"] 
    for line in lines: # for each line in lines 
     for x in invalidsymbols: # loop through the list of invalid symbols 
      if x in line: # if the invalid symbols is in the line 
       print(line) # print out the line 
       print(x) # and also print out the invalid symbol you encountered in that line 
       print(line.replace(x, "")) # print out a line with invalid symbol removed 

怎麼樣?

+1

哈哈!感謝那 – JoeC

+1

這個伎倆。謝謝! – user2976821

+1

歡迎:)每當遇到無效符號時(如果符號出現多次),您仍在打印該行 –

0

JoeC已經回答了,但我想補充一點,如果你無效符號出現不止一次在該行,那麼你可能會更好做以下

def parse_documentation(filename): 
    filelines = open(filename) 
    invalidsymbols=["`","~","!", "@","#","$"] 
    for line in filelines: 
     print(lines) 
     for symbol in invalidsymbols: 
      if symbol in line: 
       print("Above line contains %s symbol"%symbol) 

至於更換的符號,請參閱JoeC's answer

3

您可以str.translate一次全部刪除不需要的符號:

>>> txt = """he's a jolly good fellow# 
... I want pizza! 
... I'm driving to school$""" 
>>> print txt.translate(None, "`[email protected]#$") 
he's a jolly good fellow 
I want pizza 
I'm driving to school 

使你的代碼可能是這樣的

def parse_documentation(filename, invalid_symbols): 
    symb_to_remove = ''.join(invalid_symbols) 
    with open(filename, 'rb') as in_file: 
     for line in in_file: 
      safe_line = line.translate(None, symb_to_remove) 
      <here comes code to do smthng with safe_line> 

,你會調用這個函數與

parse_documentation(filename, ["`","~","!", "@","#","$"])