我使用的是python package爲多個分類變量多元對應分析的王子模塊。我學習的一組地質數據,這裏是一個示例預覽:多元對應分析MCA使用Python中無二進制範疇虛擬變量和
Quartz Oxides Hematite Limonite Geothite Clay Soil_Type
1 2 3 4 1 0 A
2 1 4 3 0 1 B
3 4 2 1 4 0 A
4 3 1 2 0 3 C
0 2 3 4 1 2 D
1 0 2 4 3 4 C
0 - 不存在,1 - 存在於非常小的(痕量)量,2 - 存在少量,3 - 存在於培養基量,4-存在豐富的量。
我的代碼如下:
geology = pd.read_csv('geology_data.csv')
x = geology[['RigNumber','Quartz','Oxides','Hematite','Limonite','Geothite','Clay']].fillna(0)
y = geology[['Soil_Type']]
print 'Dimensionality Reduction'
mca_ben = mca.mca(x)
print mca_ben
mca_ind = mca.mca(x, benzecri=False)
print mca_ind
print(mca.MCA.__doc__)
但是我收到一個錯誤,指出:
Traceback (most recent call last):
File "C:\Users\root\Desktop\Data\raw data\new raw\merged wit npt\multiclass without productive\parameter propagation\New Predict\clustering-mca.py", line 33, in <module>
mca_ben = mca.mca(x, ncols=31)
File "C:\Users\root\AppData\Roaming\Python\Python27\site-packages\mca.py", line 47, in __init__
self.D_r = numpy.diag(1/numpy.sqrt(self.r))
File "C:\Python27\lib\site-packages\numpy\lib\twodim_base.py", line 302, in diag
res = zeros((n, n), v.dtype)
MemoryError
我懷疑馬華只限於二元虛擬變量。
我也嘗試用
x = pd.get_dummies(x)
但無濟於事,我仍然得到同樣的錯誤以使每個虛擬變量爲單獨列。
請注意,我不希望使用PCA because of obvious reasons.
我也用另一種Python包稱爲prince,我試圖在documentation中發現的例子,不幸的是,我還收到一個錯誤:
Traceback (most recent call last):
File "C:\Users\root\Desktop\Data\raw data\new raw\merged wit npt\multiclass without productive\parameter propagation\New Predict\clustering-mca.py", line 14, in <module>
mca = prince.MCA(df, n_components=-1)
File "C:\Python27\lib\site-packages\prince\mca.py", line 42, in __init__
super(MCA, self).__init__(
TypeError: super() argument 1 must be type, not classobj
有什麼建議?