2017-04-01 53 views
2

我有,NCEI海洋數據,是.dat文件頭不使用python(https://www.ncei.noaa.gov/data/marine/icoads3.0/的文件) 工作,他們看起來像:如何將單個pandas.DataFrame行分割爲多個由空格分隔的列? Python的

166210151200 4962 35378 1306 101134  NL 1585 26              165 17796730133 5 0     2FF11FF11AAAAAAAAAAAA  98150000003002199 0 NAN  NATIONAAL ARCHIEF OF THE NETHERLANDS    DEN HAAG NEDERLAND  1.11.01.01  1229    AANW         112       AAN_1229_112     DUTCH       0493800N 102600E493700N   2 1TENERIFE                                                               0 21662101512   3   VM 8UNKNOWN  MAARSEVEEN     DUTCH     VOC        M. GERRITSZ. BOOS    OPPERSTUURMAN                               ROTTERDAM         BATAVIA           0             0977.216621015WZW     7.00                          UNKNOWN  UNKNOWN          UNKNOWN360 DEGREES                                                 ZZO                                           MOU (?) KOELTE                                                                                                                                                                                                  00000000CLIWOC VERSION 1.0 
166210161300 4907 35215 1306 101134  NL 1585 26              165 17797730133 5 0     2FF11FF11AAAAAAAAAAAA  98150000013002199 0 NAN  NATIONAAL ARCHIEF OF THE NETHERLANDS    DEN HAAG NEDERLAND  1.11.01.01  1229    AANW         112       AAN_1229_112     DUTCH       0490400N 84800E    1 1TENERIFE                                                               0 21662101612   3   VM 8UNKNOWN  MAARSEVEEN     DUTCH     VOC        M. GERRITSZ. BOOS    OPPERSTUURMAN                               ROTTERDAM         BATAVIA           0             0977.216621015ZW 1/2 N    18.00                          UNKNOWN  UNKNOWN          UNKNOWN360 DEGREES                                                 ZZO                                           MOU KOELTE                                                                                                                                                                                                   00000000CLIWOC VERSION 1.0 
166210171300 4812 35000 1306 101134  NL 1695 26              165 17680730133 5 0     2FF11FF11AAAAAAAAAAAA  98150000023002199 0 NAN  NATIONAAL ARCHIEF OF THE NETHERLANDS    DEN HAAG NEDERLAND  1.11.01.01  1229    AANW         112       AAN_1229_112     DUTCH       0483000N 63900E480700N   2 1TENERIFE                                                               0 21662101712   3   VM 8UNKNOWN  MAARSEVEEN     DUTCH     VOC        M. GERRITSZ. BOOS    OPPERSTUURMAN                               ROTTERDAM         BATAVIA           0             0977.216621015ZWTW     15.00                          UNKNOWN  UNKNOWN          UNKNOWN360 DEGREES                                                 ZTO                                           MOU KOELTE                                                          MOOI WEER                                                                                                                                       00000000CLIWOC VERSION 1.0 
166210181300 4758 34925 1306 101134  NL 1695 26              165 17670730133 5 0     2FF11FF11AAAAAAAAAAAA  98150000033002199 0 NAN  NATIONAAL ARCHIEF OF THE NETHERLANDS    DEN HAAG NEDERLAND  1.11.01.01  1229    AANW         112       AAN_1229_112     DUTCH       0474100N 55400E473500N   2 1TENERIFE                                                               0 21662101812   3   VM 8UNKNOWN  MAARSEVEEN     DUTCH     VOC        M. GERRITSZ. BOOS    OPPERSTUURMAN                               ROTTERDAM         BATAVIA           0             0977.216621015ZWTW     11.00                          UNKNOWN  UNKNOWN          UNKNOWN360 DEGREES                                                 ZTO                                           MOU KOELTE                                                          'ENN MOUT'?                                   REGEN                                                                                                  01000000CLIWOC VERSION 1.0 
166210191300 4757 34795 1306 101134  NL 1805 67              165 17672730133 5 0     2FF11FF11AAAAAAAAAAAA  98150000043002199 0 NAN  NATIONAAL ARCHIEF OF THE NETHERLANDS    DEN HAAG NEDERLAND  1.11.01.01  1229    AANW         112       AAN_1229_112     DUTCH       0473400N 43600E    1 1TENERIFE                                                               0 21662101912   3   VM 8UNKNOWN  MAARSEVEEN     DUTCH     VOC        M. GERRITSZ. BOOS    OPPERSTUURMAN                               ROTTERDAM         BATAVIA           0             0977.216621015W/Z     14.00                          UNKNOWN  UNKNOWN          UNKNOWN360 DEGREES                                                 Z                                            MARSZEILSKOELTE, TOUPKOULTE                                                                                            REGEN                                                                                                  01000000CLIWOC VERSION 1.0 

這些製表符分隔的文件,這是我一直在使用

進口
data = pd.read_table('file.dat', header=None) 

將數據導入爲包含所有數據的單列的x行。在單個列中,每個數據由空格分隔。

有沒有一種方法可以將這些數據導入到列中或讀取數據變量,並根據空格將每行分割成列。我認爲這就是我使用read.table函數所做的。完整的數據集很大,所以我寧願使用一種方法將它們導入而不必處理它們。

+0

這會給你想要的輸出嗎? 'data = pd.read_table('file.dat',header = None).apply(lambda x:pd.Series(x [0] .split('')),axis = 1)' –

+0

它將它們分成列,但每行> 2000列。 – mgcrump

回答

1

我想你需要的是Fixed Width Formatted

代碼:

df = pd.read_fwf('IMMA.dat', header=None) 
print(df.dtypes) 

結果:

[17 rows x 66 columns] 
0  int64 
1  int64 
2  int64 
3  int64 
     ... 
61  object 
62  object 
63  object 
64  object 
65 float64 
dtype: object 
+0

是的,就是那個。謝謝! – mgcrump

0

你可以嘗試:

pd.read_csv('test.dat', delim_whitespace=True, engine = 'python', names = range(66)) 

這裏66是列數,你可能需要適應。

相關問題