如果我讀只是一塊CSV的我得到的數據結構以下的毗連改變類別類型到對象/ float64
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 100000 entries, (2015-11-01 00:00:00, 4980770) to (2016-06-01 00:00:00, 8850573)
Data columns (total 5 columns):
CHANNEL 100000 non-null category
MCC 92660 non-null category
DOMESTIC_FLAG 100000 non-null category
AMOUNT 100000 non-null float32
CNT 100000 non-null uint8
dtypes: category(3), float32(1), uint8(1)
memory usage: 1.9+ MB
如果我在閱讀整個CSV和CONCAT塊按照上述我得到如下結構:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 30345312 entries, (2015-11-01 00:00:00, 4980770) to (2015-08-01 00:00:00, 88838)
Data columns (total 5 columns):
CHANNEL object
MCC float64
DOMESTIC_FLAG category
AMOUNT float32
CNT uint8
dtypes: category(1), float32(1), float64(1), object(1), uint8(1)
memory usage: 784.6+ MB
爲什麼分類變量改爲object/float64?我怎樣才能避免這種類型的變化? ESP。在float64
這是級聯代碼:
df = pd.concat([process(chunk) for chunk in reader])
處理功能只是做一些清潔和類型分配
你可以發佈你用來加載和連接它的代碼嗎? –
分類也有'NaN'問題,有時 –
現在加入到文本 – snovik