NumPy的genfromxt類型錯誤：數據類型不被理解錯誤

我想在這個文件中讀取（的test.txt）NumPy的genfromxt類型錯誤：數據類型不被理解錯誤

01.06.2015;00:00:00;0.000;0;-9.999;0;8;0.00;18951;(SPECTRUM)ZERO(/SPECTRUM) 
01.06.2015;00:01:00;0.000;0;-9.999;0;8;0.00;18954;(SPECTRUM)ZERO(/SPECTRUM) 
01.06.2015;00:02:00;0.000;0;-9.999;0;8;0.00;18960;(SPECTRUM)ZERO(/SPECTRUM) 
01.06.2015;09:23:00;0.327;61;25.831;39;29;0.18;19006;01.06.2015;09:23:00;0.327;61;25.831;39;29;0.18;19006;(SPECTRUM);;;;;;;;;;;;;;1;1;;;1;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;1;;;;;;;;;;;;(/SPECTRUM) 
01.06.2015;09:24:00;0.000;0;-9.999;0;29;0.00;19010;(SPECTRUM)ZERO(/SPECTRUM)

...我與numpy的功能genfromtxt（試過）（請參見下面的代碼摘錄）。

import numpy as np 
col_names = ["date", "time", "rain_intensity", "weather_code_1", "radar_ref", "weather_code_2", "val6", "rain_accum", "val8", "val9"] 
types = ["object", "object", "float", "uint8", "float", "uint8", "uint8", "float", "uint8","|S10"] 
# Read in the file with np.genfromtxt 
mydata = np.genfromtxt("test.txt", delimiter=";", names=col_names, dtype=types)

現在，當我執行的代碼我碰到下面的錯誤 - >

raise ValueError(errmsg)ValueError: Some errors were detected ! 
    Line #4 (got 79 columns instead of 10)

現在我認爲，困難來自於最後一列（val9）與許多;;;;;;;
這是很明顯，最後一列;中的分隔符和符號是相同的！

如何在文件中讀取沒有錯誤的文件，也許有可能跳過最後一列，或僅替換最後一列中的;？

來源

2016-09-21 Markus

標題錯誤與文本錯誤不匹配。 – hpaulj

usecols可以用來忽略多餘的分隔符，例如

In [546]: np.genfromtxt([b'1,2,3',b'1,2,3,,,,,,'], dtype=None, 
    delimiter=',', usecols=np.arange(3)) 
Out[546]: 
array([[1, 2, 3], 
     [1, 2, 3]])

來源

2016-09-21 16:05:42 hpaulj

是的你是對的，這也適用於上面的例子，很多thxs！ – Markus

從numpy documentation

invalid_raise : bool, optional
If True, an exception is raised if an inconsistency is detected in the number of columns. If False, a warning is emitted and the offending lines are skipped.

mydata = np.genfromtxt("test.txt", delimiter=";", names=col_names, dtype=types, invalid_raise = False)

需要注意的是有你的代碼，我已經改正了錯誤（分隔符拼寫錯誤，並在函數調用被稱爲dtypestypes列表）

編輯：從您的評論，我看到我有點誤解。你的意思是你想跳過最後的列而不是最後的列。

看看下面的代碼。我已經定義了一個生成器，它只返回每行的前十個元素。這將允許genfromtxt()完成沒有錯誤，你現在從所有行中獲得列＃3。

但是請注意，您仍然會丟失一些數據，就好像您仔細觀察一樣，您會看到問題行實際上是兩行連接在一起的垃圾，其他行有ZERO。所以你仍然會失去第二條線。也許你可以修改生成解析每一行，並與此不同的處理，但我會離開，作爲一個有趣的練習:)

import numpy as np 

def filegen(filename): 
    with open(filename, 'r') as infile: 
     for line in infile: 
      yield ';'.join(line.split(';')[:10]) 

col_names = ["date", "time", "rain_intensity", "weather_code_1", "radar_ref", "weather_code_2", "val6", "rain_accum", "val8", "val9"] 
dtypes = ["object", "object", "float", "uint8", "float", "uint8", "uint8", "float", "uint8","|S10"] 
# Read in the file with np.genfromtxt 
mydata = np.genfromtxt(filegen('temp.txt'), delimiter=";", names=col_names, dtype = dtypes)

輸出

[('01.06.2015', '00:00:00', 0.0, 0, -9.999, 0, 8, 0.0, 7, '(SPECTRUM)') 
('01.06.2015', '00:01:00', 0.0, 0, -9.999, 0, 8, 0.0, 10, '(SPECTRUM)') 
('01.06.2015', '00:02:00', 0.0, 0, -9.999, 0, 8, 0.0, 16, '(SPECTRUM)') 
('01.06.2015', '09:23:00', 0.327, 61, 25.831, 39, 29, 0.18, 62, '01.06.2015') 
('01.06.2015', '09:24:00', 0.0, 0, -9.999, 0, 29, 0.0, 66, '(SPECTRUM)')]

來源

2016-09-21 12:07:25 SiHa

好的，謝謝你的幫助和編輯我的錯誤。 – Markus

@ SiHa，好的，謝謝你的幫助和編輯我的錯誤。如果我放入（invalid_raise = False）命令並且刪除了第4行孔。但我需要這些線（我需要分別在每一行中的第三列）。因爲.txt文件更長，並且始終在我感興趣的第3列中顯示。所以，當有;;;;;;;在最後一列的標誌，我將分別需要＃3列！ Thxs – Markus

@Markus查看已更新的答案，但請注意，在應對格式錯誤的行時，將難以保留*全部*數據。 – SiHa

NumPy的genfromxt類型錯誤：數據類型不被理解錯誤

回答

相關問題