2016-12-15 65 views
0

我正在嘗試使用熊貓來解決數據科學問題。我的數據集包含以下列:「國家」,「轉換」,「測試」,「用戶ID」等。在國家專欄中,大約有10個國家。 「測試」列的值爲0和1表示兩種類型的測試:控制0和實驗1.「轉換」也具有值0和1,表示該人是否已轉換。TypeError:無法連接'str'和'float'對象:熊貓

我想groupby國家和計算每個組的測試== 0和測試== 1的p值和均值。我試圖使用下面的函數,但是它會拋出一個錯誤,「TypeError:無法連接'str'和'float'對象」。有人可以澄清這一點嗎?

def f(x): 
     control = x.loc[(x.test==0)] 
     test = x.loc[(x.test==1)] 
     p_value = stats.ttest_ind(control,test)[0] 
     control_mean = control['conversion'].mean() 
     test_mean = test['conversion'].mean() 
     return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})  

bycountry = data1.groupby('country').apply(f) 
bycountry = bycountry.reset_index(level='None') 
bycountry 

完整的錯誤消息:df.dtypes的

TypeError         Traceback (most recent call last) 
<ipython-input-495-bd6227878520> in <module>() 
     7  return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean}) 
     8 
----> 9 bycountry = data1.groupby("country").apply(f) 
    10 bycountry = bycountry.reset_index(level='None') 
    11 bycountry 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, func, *args, **kwargs) 
    649   # ignore SettingWithCopy here in case the user mutates 
    650   with option_context('mode.chained_assignment', None): 
--> 651    return self._python_apply_general(f) 
    652 
    653  def _python_apply_general(self, f): 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in _python_apply_general(self, f) 
    653  def _python_apply_general(self, f): 
    654   keys, values, mutated = self.grouper.apply(f, self._selected_obj, 
--> 655             self.axis) 
    656 
    657   return self._wrap_applied_output(

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in apply(self, f, data, axis) 
    1525    # group might be modified 
    1526    group_axes = _get_axes(group) 
-> 1527    res = f(group) 
    1528    if not _is_indexed_like(res, group_axes): 
    1529     mutated = True 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in f(g) 
    645   @wraps(func) 
    646   def f(g): 
--> 647    return func(g, *args, **kwargs) 
    648 
    649   # ignore SettingWithCopy here in case the user mutates 

<ipython-input-495-bd6227878520> in f(x) 
     2  control = x.loc[(x.test==0)] 
     3  test = x.loc[(x.test==1)] 
----> 4  p_value = stats.ttest_ind(control,test)[0] 
     5  control_mean = control['conversion'].mean() 
     6  test_mean = test['conversion'].mean() 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\scipy\stats\stats.pyc in ttest_ind(a, b, axis, equal_var, nan_policy) 
    3865   return Ttest_indResult(np.nan, np.nan) 
    3866 
-> 3867  v1 = np.var(a, axis, ddof=1) 
    3868  v2 = np.var(b, axis, ddof=1) 
    3869  n1 = a.shape[axis] 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims) 
    3098 
    3099  return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, 
-> 3100       keepdims=keepdims) 

C:\Users\SnehaPriya\Anaconda2\lib\site-packages\numpy\core\_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims) 
    89  # Note that if dtype is not of inexact type then arraymean will 
    90  # not be either. 
---> 91  arrmean = umr_sum(arr, axis, dtype, keepdims=True) 
    92  if isinstance(arrmean, mu.ndarray): 
    93   arrmean = um.true_divide(

TypeError: cannot concatenate 'str' and 'float' objects 

輸出:

user_id      int64 
date    datetime64[ns] 
source      object 
device      object 
browser_language   object 
ads_channel     object 
browser      object 
conversion     int64 
test       int64 
sex       object 
age      float64 
country      object 
dtype: object 
+0

發佈完整的堆棧跟蹤 –

+0

我懷疑發生了什麼是你有一個'obj'類型的列,其中混合了'float'和'string'值。 –

+0

@ juanpa.arrivillaga:我發佈了完整的錯誤消息。 – Gingerbread

回答

0
def f(x): 
    control = x.loc[(x.test==0)] 
    control = control['conversion'] 
    test = x.loc[(x.test==1)] 
    test = test['conversion'] 
    p_value = stats.ttest_ind(control,test)[0] 
    control_mean = control.mean() 
    test_mean = test.mean() 
    return pd.Series({'p_value': p_value, 'conversion_test': test_mean, 'conversion_control': control_mean})  

這並獲得成功!再次感謝@ juanpa.arrivillaga!

相關問題