Python檢測選項卡字符

我試圖在特定文件內拆分單詞和整數。文件的字符串，在這些形式（包含字線還沒有「\ t」字符，但INT數（所有正）有）：（有些話是包含數字「 - 」字符）Python檢測選項卡字符

-1234 
\t22 
\t44 
\t46 
absv 
\t1 
\t2 
\t4 
...

所以我想法是通過將線條對象轉換爲浮動來分割單詞和字符串。

def is_number(s): 
    try: 
     float(s) 
     return True 
    except ValueError: 
     return False 

with codecs.open("/media/New Volume/3rd_step.txt", 'Ur') as file:#open file 
    for line in file: # read line by line 
     temp_buffer = line.split() # split elements 
     for word in temp_buffer: 
      if not('-' in word or not is_number(word)): 
      ....

所以，如果這是一個詞，我會得到例外，如果不是那麼它是一個數字。該文件是50 GB，而在中間的某個地方，似乎文件的格式有問題。所以分割單詞和數字的唯一可能方法是使用\ t char。但是我怎麼能檢測到它？我的意思是我劃分線來得到字符串，並且我以這種方式丟失了特殊字符。

編輯：

我真的很愚蠢和浪費你的時間newbe遺憾。看來，我可以用這種方式可以更容易：

with codecs.open("/media/D60A6CE00A6CBEDD/InvertedIndex/1.txt", 'Ur') as file:#open file 
    for line in file: # read line by line 
    if not '\t' in line: 
      print line

來源

2014-07-09 bill

你應該試着指定你的參數split()，而不是僅僅使用默認的，這是所有空白字符。除了\t之外，您可以將其初始分割爲所有空白。試試這個：

white_str = list(string.whitespace) # string.whitespace contains all whitespace. 
white_str.remove("\t")     # Remove \t 
white_str = ''.join(white_str)   # New whitespace string, without \t

然後，而不是split()，使用split(white_str)。這將除了\t之外的所有空白處分割您的行以獲取您的字符串。然後，您可以稍後檢測\t以瞭解您的需求。

來源

2014-07-09 21:21:00 TheSoundDefense

Python檢測選項卡字符

回答

相關問題