我有,NCEI海洋數據,是.dat文件頭不使用python(https://www.ncei.noaa.gov/data/marine/icoads3.0/的文件) 工作,他們看起來像:如何將單個pandas.DataFrame行分割爲多個由空格分隔的列? Python的
166210151200 4962 35378 1306 101134 NL 1585 26 165 17796730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000003002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0493800N 102600E493700N 2 1TENERIFE 0 21662101512 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015WZW 7.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZZO MOU (?) KOELTE 00000000CLIWOC VERSION 1.0
166210161300 4907 35215 1306 101134 NL 1585 26 165 17797730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000013002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0490400N 84800E 1 1TENERIFE 0 21662101612 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZW 1/2 N 18.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZZO MOU KOELTE 00000000CLIWOC VERSION 1.0
166210171300 4812 35000 1306 101134 NL 1695 26 165 17680730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000023002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0483000N 63900E480700N 2 1TENERIFE 0 21662101712 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZWTW 15.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZTO MOU KOELTE MOOI WEER 00000000CLIWOC VERSION 1.0
166210181300 4758 34925 1306 101134 NL 1695 26 165 17670730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000033002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0474100N 55400E473500N 2 1TENERIFE 0 21662101812 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZWTW 11.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZTO MOU KOELTE 'ENN MOUT'? REGEN 01000000CLIWOC VERSION 1.0
166210191300 4757 34795 1306 101134 NL 1805 67 165 17672730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000043002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0473400N 43600E 1 1TENERIFE 0 21662101912 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015W/Z 14.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES Z MARSZEILSKOELTE, TOUPKOULTE REGEN 01000000CLIWOC VERSION 1.0
這些製表符分隔的文件,這是我一直在使用
進口data = pd.read_table('file.dat', header=None)
將數據導入爲包含所有數據的單列的x行。在單個列中,每個數據由空格分隔。
有沒有一種方法可以將這些數據導入到列中或讀取數據變量,並根據空格將每行分割成列。我認爲這就是我使用read.table函數所做的。完整的數據集很大,所以我寧願使用一種方法將它們導入而不必處理它們。
這會給你想要的輸出嗎? 'data = pd.read_table('file.dat',header = None).apply(lambda x:pd.Series(x [0] .split('')),axis = 1)' –
它將它們分成列,但每行> 2000列。 – mgcrump