如何強制pandas read_csv爲所有浮點列使用float32？

我不需要雙精度
我的機器具有有限的存儲器和I要處理的數據集更大
我需要通過所提取的數據（如矩陣）至BLAS庫和BLAS要求的單精度比雙精度等效要快兩倍。

請注意，並非原始csv文件中的所有列都具有浮點類型。我只需要將float32設置爲float列的默認值。

2015-05-27 Fabian

嘗試：

import numpy as np 
import pandas as pd 

# Sample 100 rows of data to determine dtypes. 
df_test = pd.read_csv(filename, nrows=100) 

float_cols = [c for c in df_test if df_test[c].dtype == "float64"] 
float32_cols = {c: np.float32 for c in float_cols} 

df = pd.read_csv(filename, engine='c', dtype=float32_cols)

該第一讀取100行數據的一個樣品（根據需要修改），以確定每列的類型。

它創建了'float64'這些列的列表，然後使用詞典理解創建一個字典，這些列作爲鍵和'np.float32'作爲每個鍵的值。

最後，它使用'c'引擎（將dtypes分配給列需要）讀取整個文件，然後將float32_cols字典作爲參數傳遞給dtype。

df = pd.read_csv(filename, nrows=100) 
>>> df 
    int_col float1 string_col float2 
0  1  1.2   a  2.2 
1  2  1.3   b  3.3 
2  3  1.4   c  4.4 

>>> df.info() 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 3 entries, 0 to 2 
Data columns (total 4 columns): 
int_col  3 non-null int64 
float1  3 non-null float64 
string_col 3 non-null object 
float2  3 non-null float64 
dtypes: float64(2), int64(1), object(1) 

df32 = pd.read_csv(filename, engine='c', dtype={c: np.float32 for c in float_cols}) 
>>> df32.info() 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 3 entries, 0 to 2 
Data columns (total 4 columns): 
int_col  3 non-null int64 
float1  3 non-null float32 
string_col 3 non-null object 
float2  3 non-null float32 
dtypes: float32(2), int64(1), object(1)

來源

2015-05-28 00:17:08 Alexander

如何強制pandas read_csv爲所有浮點列使用float32？

回答

相關問題