我假設你的矩陣是對稱的,所以你可以使用嵌套循環建立一個索引列表和上對角矩陣的值列表。然而,第二個循環應該從內部循環的值開始。
vals = []
idx = []
for i in range(df.shape[0]):
for j in range(i, df.shape[1]):
idx.append((df.index[i], df.columns[j]))
vals.append(df.iat[i, j])
>>> pd.Series(vals, index=idx)
(Arnston, Arnston) 0
(Arnston, Berg) 1
(Arnston, Carlson) 2
(Berg, Berg) 0
(Berg, Carlson) 3
(Carlson, Carlson) 0
dtype: float64
爲了給出一些定時比較:
dfc = df.copy()
# Nested loop.
%%timeit
vals = []
idx = []
for i in range(dfc.shape[0]):
for j in range(i, dfc.shape[1]):
idx.append((dfc.index[i], dfc.columns[j]))
vals.append(dfc.iat[i, j])
pd.Series(vals, index=idx)
1000 loops, best of 3: 187 µs per loop
# Melt.
%%timeit
df = dfc.reset_index()
df = pd.melt(df,id_vars=['index'])
df = df[df['index']<=df['variable']].sort_values(by='value')
df ['col'] = df['index'] +','+ df['variable']
df = df[['col','value']]
df = df.set_index('col')
100 loops, best of 3: 3.39 ms per loop
定時被反向放大100×100對稱矩陣,其中melt
熔化競爭:
df = pd.DataFrame(np.random.randn(100, 100))
for i in range(df.shape[0]):
df.iat[i, i] = 1
for j in range(i + 1, df.shape[1]):
df.iat[i, j] = df.iat[j, i]
df.columns = df.index = ['col_' + str(i) for i in range(100)]
dfc = df.copy()
# nested loop:
10 loops, best of 3: 55.2 ms per loop
# melt:
100 loops, best of 3: 5.72 ms per loop