使用pandas閱讀一個帶有numpy數組的csv

我有一個帶有3列的csv文件emotion, pixels, Usage由35000行組成，例如， 0,70 23 45 178 455,Training。使用pandas閱讀一個帶有numpy數組的csv

我用pandas.read_csv來讀取csv文件爲pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':np.int32, 'Usage':str})。

當我嘗試上述時，它說ValueError: invalid literal for long() with base 10: '70 23 45 178 455'？我如何將像素列作爲numpy數組讀取？

2015-06-19 VeilEclipse

請嘗試下面的代碼，而不是 -

df = pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':str, 'Usage':str}) 

def makeArray(text): 
    return np.fromstring(text,sep=' ') 

df['pixels'] = df['pixels'].apply(makeArray)

來源

2015-06-19 05:58:33

您好，感謝您的幫助。它現在說'TypeError：數據類型不理解'。可能的錯誤是什麼？ – VeilEclipse

試試我現在更新的代碼。 –

它會更快，我相信使用矢量化str方法來分割字符串，並創建新的像素列根據需要和concat新列到新DF：

In [175]: 
# load the data 
import pandas as pd 
import io 
t="""emotion,pixels,Usage 
0,70 23 45 178 455,Training""" 
df = pd.read_csv(io.StringIO(t)) 
df 

Out[175]: 
    emotion   pixels  Usage 
0  0 70 23 45 178 455 Training 

In [177]: 
# now split the string and concat column-wise with the orig df 
df = pd.concat([df, df['pixels'].str.split(expand=True).astype(int)], axis=1) 
df 
Out[177]: 
    emotion   pixels  Usage 0 1 2 3 4 
0  0 70 23 45 178 455 Training 70 23 45 178 455

如果你特別想要一個平坦的NP陣列你可以撥打.values屬性：

In [181]: 
df['pixels'].str.split(expand=True).astype(int).values 

Out[181]: 
array([[ 70, 23, 45, 178, 455]])

來源

2015-06-19 08:23:04 EdChum

我遇到了同樣的問題，並找出了一個黑客。將您的數據保存爲.npy文件。加載時，它將被加載爲ndarray。您可以使用pandas.DataFrame將ndarray轉換爲供您使用的數據幀。我發現這個解決方案比從字符串字段轉換更容易。下面的示例代碼：

import numpy as np 
import pandas as pd 
np.save('file_name.npy',dataframe_to_be_saved) 
#the dataframe is saved in 'file_name.npy' in your current working directory 

#loading the saved file into an ndarray 
arr=np.load('file_name.npy') 
df=pd.DataFrame(data=arr[:,1:],index=n1[:,0],columns=column_names) 

#df now stores your dataframe with the original datatypes

來源

2017-06-22 09:03:41

使用pandas閱讀一個帶有numpy數組的csv

回答

相關問題