2017-06-07 27 views
0

我試圖讓這個文本文件(philadelphia.txt)爲大熊貓數據幀不恆定:無法使數據幀,因爲read_csv空格分隔

STATION   STATION_NAME          DATE  TAVG  TMAX  TMIN  
----------------- -------------------------------------------------- -------- -------- -------- -------- 
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970605 -9999 74  47  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970606 -9999 68  50  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970608 -9999 72  50  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970609 -9999 83  47  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970610 -9999 86  55  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970611 -9999 88  61  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970612 -9999 83  70  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970613 -9999 80  66  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970614 -9999 80  64  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970615 -9999 77  55  
GHCND:USW00094732   PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970616 -9999 79  49 

但是,如果我用

data = pd.read_csv('philadelphia.txt', sep="\s+", header=0) 

它製作了一個正確的標題,但是卻遇到了分割電臺名稱數據的問題。我希望它包含在列名「STATION_NAME」下,但是sep =「\ s +」會將它拆分爲空格,並且出現錯誤。

pandas.errors.ParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 11 

如何將數據分成6列,而不需要將站名分成單獨的單詞?

我也希望能夠傳遞其他文本文件與不同的站名稱,如(yellowknife.txt)。

STATION   STATION_NAME          DATE  TMAX  TMIN  
----------------- -------------------------------------------------- -------- -------- -------- 
GHCND:CA002204101         YELLOWKNIFE A CA 20130117 -21  -35  
GHCND:CA002204101         YELLOWKNIFE A CA 20130118 -15  -21  
GHCND:CA002204101         YELLOWKNIFE A CA 20130119 -17  -29  
GHCND:CA002204101         YELLOWKNIFE A CA 20130120 -18  -28  
GHCND:CA002204101         YELLOWKNIFE A CA 20130121 -21  -34  
GHCND:CA002204101         YELLOWKNIFE A CA 20130122 -16  -30  
GHCND:CA002204101         YELLOWKNIFE A CA 2013-17  -28  
GHCND:CA002204101         YELLOWKNIFE A CA 20130124 -5  -17  

回答

0

使用read_fwf()方法:

In [7]: df = pd.read_fwf(r'/path/to/file.csv').drop(0) 

In [8]: df 
Out[8]: 
       STATION        STATION_NAME  DATE TAVG TMAX TMIN 
1 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970605 -9999 74 47 
2 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970606 -9999 68 50 
3 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970608 -9999 72 50 
4 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970609 -9999 83 47 
5 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970610 -9999 86 55 
6 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970611 -9999 88 61 
7 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970612 -9999 83 70 
8 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970613 -9999 80 66 
9 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970614 -9999 80 64 
10 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970615 -9999 77 55 
11 GHCND:USW00094732 PHILADELPHIA NE PHILADELPHIA AIRPORT PA US 19970616 -9999 79 49 

列:

In [9]: df.columns.tolist() 
Out[9]: ['STATION', 'STATION_NAME', 'DATE', 'TAVG', 'TMAX', 'TMIN']