2015-09-01 85 views
3

熊貓有什麼方法來捕捉設置error_bad_lines = False和warn_bad_lines = True產生的警告嗎?例如下面的腳本:熊貓壞道警告捕捉

import pandas as pd 
from StringIO import StringIO 
data = StringIO("""a,b,c 
        1,2,3 
        4,5,6 
        6,7,8,9 
        1,2,5 
        3,4,5""") 
pd.read_csv(data, warn_bad_lines=True, error_bad_lines=False) 

產生警告:

Skipping line 4: expected 3 fields, saw 4 

我想這個輸出存儲到字符串,以便我能最終寫入到一個日誌文件,以保持跟蹤正在跳過的記錄。

我嘗試使用警告模塊,但它並沒有出現,就好像這個「警告」具有傳統意義。我使用Python 2.7和Pandas 0.16。

任何幫助將不勝感激。

回答

3

我覺得它沒有實施熊貓。
source1source2

我的解決方案:

1.前或處理後

import pandas as pd 
import csv  

df = pd.read_csv('data.csv', warn_bad_lines=True, error_bad_lines=False) 

#compare length of rows by recommended value: 
RECOMMENDED = 3 

with open('data.csv') as csv_file: 
    reader = csv.reader(csv_file, delimiter=',') 
    for row in reader: 
     if (len(row) != RECOMMENDED): 
      print ("Length of row is: %r" % len(row)) 
      print row 

#compare length of rows by length of columns in df 
lencols = len(df.columns) 
print lencols 

with open('data.csv') as csv_file: 
    reader = csv.reader(csv_file, delimiter=',') 
    for row in reader: 
     if (len(row) != lencols): 
      print ("Length of row is: %r" % len(row)) 
      print row 

2.替代對象sys.stdout來

import pandas as pd 
import os 
import sys 

class RedirectStdStreams(object): 
    def __init__(self, stdout=None, stderr=None): 
     self._stdout = stdout or sys.stdout 
     self._stderr = stderr or sys.stderr 

    def __enter__(self): 
     self.old_stdout, self.old_stderr = sys.stdout, sys.stderr 
     self.old_stdout.flush(); self.old_stderr.flush() 
     sys.stdout, sys.stderr = self._stdout, self._stderr 

    def __exit__(self, exc_type, exc_value, traceback): 
     self._stdout.flush(); self._stderr.flush() 
     sys.stdout = self.old_stdout 
     sys.stderr = self.old_stderr 


if __name__ == '__main__': 

    devnull = open('log.txt', 'w') 

    #replaces sys.stdout, sys.stderr, see http://stackoverflow.com/a/6796752/2901002 
    with RedirectStdStreams(stdout=devnull, stderr=devnull): 
     df = pd.read_csv('data.csv', warn_bad_lines=True, error_bad_lines=False) 
+0

謝謝!我可能會使用第二種解決方案,因爲我需要遍歷多個文件,不幸的是我們仍然堅持使用這種格式。 – eroma934