0
我正在嘗試使用熊貓來解決數據科學問題。我的數據集包含以下列:「國家」,「轉換」,「測試」,「用戶ID」等。在國家專欄中,大約有10個國家。 「測試」列的值爲0和1表示兩種類型的測試:控制0和實驗1.「轉換」也具有值0和1,表示該人是否已轉換。TypeError:無法連接'str'和'float'對象:熊貓
我想groupby國家和計算每個組的測試== 0和測試== 1的p值和均值。我試圖使用下面的函數,但是它會拋出一個錯誤,「TypeError:無法連接'str'和'float'對象」。有人可以澄清這一點嗎?
def f(x):
control = x.loc[(x.test==0)]
test = x.loc[(x.test==1)]
p_value = stats.ttest_ind(control,test)[0]
control_mean = control['conversion'].mean()
test_mean = test['conversion'].mean()
return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})
bycountry = data1.groupby('country').apply(f)
bycountry = bycountry.reset_index(level='None')
bycountry
完整的錯誤消息:df.dtypes的
TypeError Traceback (most recent call last)
<ipython-input-495-bd6227878520> in <module>()
7 return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})
8
----> 9 bycountry = data1.groupby("country").apply(f)
10 bycountry = bycountry.reset_index(level='None')
11 bycountry
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, func, *args, **kwargs)
649 # ignore SettingWithCopy here in case the user mutates
650 with option_context('mode.chained_assignment', None):
--> 651 return self._python_apply_general(f)
652
653 def _python_apply_general(self, f):
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in _python_apply_general(self, f)
653 def _python_apply_general(self, f):
654 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 655 self.axis)
656
657 return self._wrap_applied_output(
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, f, data, axis)
1525 # group might be modified
1526 group_axes = _get_axes(group)
-> 1527 res = f(group)
1528 if not _is_indexed_like(res, group_axes):
1529 mutated = True
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in f(g)
645 @wraps(func)
646 def f(g):
--> 647 return func(g, *args, **kwargs)
648
649 # ignore SettingWithCopy here in case the user mutates
<ipython-input-495-bd6227878520> in f(x)
2 control = x.loc[(x.test==0)]
3 test = x.loc[(x.test==1)]
----> 4 p_value = stats.ttest_ind(control,test)[0]
5 control_mean = control['conversion'].mean()
6 test_mean = test['conversion'].mean()
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\scipy\stats\stats.pyc in ttest_ind(a, b, axis, equal_var, nan_policy)
3865 return Ttest_indResult(np.nan, np.nan)
3866
-> 3867 v1 = np.var(a, axis, ddof=1)
3868 v2 = np.var(b, axis, ddof=1)
3869 n1 = a.shape[axis]
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims)
3098
3099 return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
-> 3100 keepdims=keepdims)
C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims)
89 # Note that if dtype is not of inexact type then arraymean will
90 # not be either.
---> 91 arrmean = umr_sum(arr, axis, dtype, keepdims=True)
92 if isinstance(arrmean, mu.ndarray):
93 arrmean = um.true_divide(
TypeError: cannot concatenate 'str' and 'float' objects
輸出:
user_id int64
date datetime64[ns]
source object
device object
browser_language object
ads_channel object
browser object
conversion int64
test int64
sex object
age float64
country object
dtype: object
發佈完整的堆棧跟蹤 –
我懷疑發生了什麼是你有一個'obj'類型的列,其中混合了'float'和'string'值。 –
@ juanpa.arrivillaga:我發佈了完整的錯誤消息。 – Gingerbread