我遇到了這樣的麻煩:我需要找到用戶第一次點擊一個電子郵件(變量發送),並在發生時在相應的行中放置一個。找到最早的發生
該數據集有幾千個用戶(散列)在通訊中點擊電子郵件的一部分。我試圖通過發送,哈希將它們分組,然後找到最早的日期,但無法使其工作。
所以我去了一小討厭的解決方案,然而返回奇怪的事情:
我的數據集(相關變量):
>>> clicks[['datetime','hash','sending']].head()
datetime hash sending
0 2016-11-01 19:13:34 0b1f4745df5925dfb1c8f53a56c43995 5
1 2016-11-01 10:47:14 0a73d5953ebf5826fbb7f3935bad026d 5
2 2016-10-31 19:09:21 605cebbabe0ba1b4248b3c54c280b477 5
3 2016-10-31 13:42:36 d26d61fb10c834292803b247a05b6cb7 5
4 2016-10-31 10:46:30 48f8ab83e8790d80af628e391f3325ad 5
有6個發送回合,datetime
是datetime64[ns]
。
我這樣做是如下的方式:
所有的clicks['first'] = 0
for hash in clicks['hash'].unique():
t = clicks.ix[clicks.hash==hash, ['hash','datetime','sending']]
part = t['sending'].unique()
for i in part:
temp = t.ix[t.sending == i,'datetime']
clicks.ix[t[t.datetime == np.min(temp)].index.values,'first']=1
首先,我不認爲這是非常Python的,而且是相當緩慢的。但主要是它返回一個奇怪的類型!有0.0
和1.0
值,但我不能與他們合作:
>>> type(clicks.first)
<type 'instancemethod'>
>>> clicks.loc[clicks.first==1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/air/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1296, in __getitem__
return self._getitem_axis(key, axis=0)
File "/Users/air/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1467, in _getitem_axis
return self._get_label(key, axis=axis)
File "/Users/air/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 93, in _get_label
return self.obj._xs(label, axis=axis)
File "/Users/air/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1749, in xs
loc = self.index.get_loc(key)
File "/Users/air/anaconda/lib/python2.7/site-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 156, in pandas.index.IndexEngine.get_loc (pandas/index.c:3977)
File "pandas/index.pyx", line 373, in pandas.index.Int64Engine._check_type (pandas/index.c:7634)
KeyError: False
所以任何想法,請?非常感謝!
----- UPDATE:------
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
Wowza,謝謝!我嘗試了lambda,但沒有讓它工作,不知道如何從中選擇最小值。所以這看起來不錯,但仍然不能對它進行分類,得到相同的錯誤。雖然'clicks.first'最終是整數。你知道爲什麼嗎? –
也許你有重複最小值的問題。它對樣本很好,並且真實的數據不是? – jezrael
每個'hash'和'sending'都不能有重複。該子集的錯誤說:'TypeError:不能在上使用這些索引器[False] '進行位置索引'所以它看起來不再是'DataFrame' –