問題導入數據集（txt文件）與Python使用numpy的庫genfromtxt功能

我努力學習Python，但我試圖導入一個數據集，並不能得到它正常工作......問題導入數據集（txt文件）與Python使用numpy的庫genfromtxt功能

此數據集包含16列另有16 320行保存爲txt文件。我用了genfromtxt功能如下：

import numpy as np 
dt=np.dtype([('name', np.str_, 16),('platform', np.str_, 16),('year', np.float_, (2,)),('genre', np.str_, 16),('publisher', np.str_, 16),('na_sales', np.float_, (2,)), ('eu_sales', np.float64, (2,)), ('jp_sales', np.float64, (2,)), ('other_sales', np.float64, (2,)), ('global_sales', np.float64, (2,)), ('critic_scores', np.float64, (2,)),('critic_count', np.float64, (2,)),('user_scores', np.float64, (2,)),('user_count', np.float64, (2,)),('developer', np.str_, 16),('rating', np.str_, 16)]) 
data=np.genfromtxt('D:\\data3.txt',delimiter=',',names=True,dtype=dt)

我得到這個錯誤：

ValueError: size of tuple must match number of fields.

但我dt變量，包含16種每列。我指定數據類型，因爲否則這些字符串會被nan替換。

任何幫助，將不勝感激。

來源

2017-03-04 Ben_its

建議：從您的data3.txt文件中發佈一些第一行。你確定它有16列嗎？ – payne

爲什麼所有的'（2，）'在dtype中？你定義了16個字段，但所有的浮點數都加倍了。你有沒有試過'dtype = None'加載？這讓它推斷出最好的dtype。 – hpaulj

看看你dt由數組：

In [78]: np.ones((1,),dt) 
Out[78]: 
array([ ('1', '1', [ 1., 1.], '1', '1', [ 1., 1.], [ 1., 1.], [ 1., 1.], 
     [ 1., 1.], [ 1., 1.], [ 1., 1.], [ 1., 1.], [ 1., 1.], 
     [ 1., 1.], '1', '1')], 
     dtype=[('name', '<U16'), ('platform', '<U16'), ('year', '<f8', (2,)), ('genre', '<U16'), ('publisher', '<U16'), ('na_sales', '<f8', (2,)), ('eu_sales', '<f8', (2,)), ('jp_sales', '<f8', (2,)), ('other_sales', '<f8', (2,)), ('global_sales', '<f8', (2,)), ('critic_scores', '<f8', (2,)), ('critic_count', '<f8', (2,)), ('user_scores', '<f8', (2,)), ('user_count', '<f8', (2,)), ('developer', '<U16'), ('rating', '<U16')])

我算26個1 S（字符串和浮點數），而不是16，你所需要的。你是否認爲（2，）表示雙重？它表示一個2元素的子字段。

取出所有那些（2）

In [80]: np.ones((1,),dt) 
Out[80]: 
array([ ('1', '1', 1., '1', '1', 1., 1., 1., 1., 1., 1., 1., 1., 1., '1', '1')], 
     dtype=[('name', '<U16'), ('platform', '<U16'), ('year', '<f8'), ('genre', '<U16'), ('publisher', '<U16'), ('na_sales', '<f8'), ('eu_sales', '<f8'), ('jp_sales', '<f8'), ('other_sales', '<f8'), ('global_sales', '<f8'), ('critic_scores', '<f8'), ('critic_count', '<f8'), ('user_scores', '<f8'), ('user_count', '<f8'), ('developer', '<U16'), ('rating', '<U16')])

現在我有16場應該分析你的16列恰到好處。

但是dtype=None通常也適用。它讓genfromtxt推導出每個領域的最佳dtype。在這種情況下，它會從列標題行（您的names=True參數）中獲取字段名稱。

在將代碼投入更大的腳本之前，測試複雜的代碼行是個好主意。特別是如果你在學習的過程中。

來源

2017-03-04 17:18:09 hpaulj

問題導入數據集（txt文件）與Python使用numpy的庫genfromtxt功能

回答

相關問題