請注意,你的問題已經被編輯過這樣的答案不再提供答案 你的問題。儘管他曾經去過兩次,但他們必須調整1
的John
的New York
。
選項1 pir1
我喜歡這個答案,因爲我認爲這是優雅的。
pd.get_dummies(df.customer).T.dot(pd.get_dummies(df.visited_city)).clip(0, 1)
London Melbourne New_York Paris
John 1 1 1 0
Mary 1 1 0 0
Peter 0 0 1 0
Steve 0 0 0 1
選項2 pir2
這個答案應該是快。
i, r = pd.factorize(df.customer.values)
j, c = pd.factorize(df.visited_city.values)
n, m = r.size, c.size
b = np.zeros((n, m), dtype=int)
b[i, j] = 1
pd.DataFrame(b, r, c).sort_index().sort_index(1)
London Melbourne New_York Paris
John 1 1 1 0
Mary 1 1 0 0
Peter 0 0 1 0
Steve 0 0 0 1
選項3 pir3
實用和漂亮的快速
df.groupby(['customer', 'visited_city']).size().unstack(fill_value=0).clip(0, 1)
visited_city London Melbourne New_York Paris
customer
John 1 1 1 0
Mary 1 1 0 0
Peter 0 0 1 0
Steve 0 0 0 1
定時
下面的代碼
# Multiples of Minimum time
#
pir1 pir2 pir3 wen vai
10 1.392237 1.0 1.521555 4.337469 5.569029
30 1.445762 1.0 1.821047 5.977978 7.204843
100 1.679956 1.0 1.901502 6.685429 7.296454
300 1.568407 1.0 1.825047 5.556880 7.210672
1000 1.622137 1.0 1.613983 5.815970 5.396008
3000 1.808637 1.0 1.852953 4.159305 4.224724
10000 1.654354 1.0 1.502092 3.145032 2.950560
30000 1.555574 1.0 1.413612 2.404061 2.299856
wen = lambda d: d.pivot_table(index='customer', columns='visited_city',aggfunc=len, fill_value=0)
vai = lambda d: pd.crosstab(d.customer, d.visited_city)
pir1 = lambda d: pd.get_dummies(d.customer).T.dot(pd.get_dummies(d.visited_city)).clip(0, 1)
pir3 = lambda d: d.groupby(['customer', 'visited_city']).size().unstack(fill_value=0).clip(0, 1)
def pir2(d):
i, r = pd.factorize(d.customer.values)
j, c = pd.factorize(d.visited_city.values)
n, m = r.size, c.size
b = np.zeros((n, m), dtype=int)
b[i, j] = 1
return pd.DataFrame(b, r, c).sort_index().sort_index(1)
results = pd.DataFrame(
index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
columns='pir1 pir2 pir3 wen vai'.split(),
dtype=float
)
for i in results.index:
d = pd.concat([df] * i, ignore_index=True)
for j in results.columns:
stmt = '{}(d)'.format(j)
setp = 'from __main__ import d, {}'.format(j)
results.at[i, j] = timeit(stmt, setp, number=10)
print((lambda r: r.div(r.min(1), 0))(results))
results.plot(loglog=True)
我錯過了這一個動作,有趣的剪輯使用。 +1 – Vaishali
謝謝@Vaishali – piRSquared
夢幻般的答案,謝謝@piRSquared !!! – cwl