我對Python熊貓相當陌生,而且我有問題讓熊貓人GroupBy
與transform
結合起來,以我想要的方式行事。我一直無法找到已發佈的答案,但我可能錯過了一些東西。將熊貓羣組合並轉換爲多索引數據框
我有大量條目的數據幀,結構類似如下:
GLT_City = pd.read_csv('GlobalLandTemperaturesByCity.csv', sep=',')
GLT_City.head()
AvgTemp AvgTempUncert City Country Lat Long year month day
0 6.068 1.737 Århus Denmark 57.05N 10.33E 1743 11 01
5 5.788 3.624 Århus Denmark 57.05N 10.33E 1744 04 01
6 10.644 1.283 Århus Denmark 57.05N 10.33E 1744 05 01
7 14.051 1.347 Århus Denmark 57.05N 10.33E 1744 06 01
8 16.082 1.396 Århus Denmark 57.05N 10.33E 1744 07 01
10 12.781 1.454 Århus Denmark 57.05N 10.33E 1744 09 01
11 7.950 1.630 Århus Denmark 57.05N 10.33E 1744 10 01
12 4.639 1.302 Århus Denmark 57.05N 10.33E 1744 11 01
我要計算每個城市的加權平均溫度,每個月,並且將其添加爲新列我的原始數據框以最平滑的方式使用,使用transform()
,原因在於更進一步。
首先,我定義一個函數來計算加權平均值:
def wavg(group,data_name,weight_name, sigma=None):
data = group[data_name]
weight = group[weight_name]
#Check whether we have actual weights or measurement uncertainties
if sigma=='sigma':
weight = 1./weight
try:
return (data * weight).sum()/weight.sum()
except ZeroDivisionError:
return data.mean()
那麼我想結合GroupBy
和transform()
該功能適用於我的數據幀和結果添加爲新的一列,例如:
GLT_City['WeightedMonthlyMean'] = GLT_City.groupby(['City','month']).transform(wavg, 'AvgTemp','AvgTempUncert', sigma='sigma')
現在,這將導致一個非常漫長的錯誤信息複製粘貼下面
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:14010)()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-61-cef679f52b5f> in <module>()
----> 1 GLT_City['WeightedMonthlyMean'] = GLT_City.groupby(['City','month']).transform(wavg,
'AvgTemp','AvgTemp', sigma='sigma')
~/anaconda/envs/python36/lib/python3.6/site-
packages/pandas/core/groupby.py in transform(self, func, *args, **kwargs)
3814 result = getattr(self, func)(*args, **kwargs)
3815 else:
-> 3816 return self._transform_general(func, *args, **kwargs)
3817
3818 # a reduction transform
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/groupby.py in _transform_general(self, func, *args, **kwargs)
3765 # Try slow path and fast path.
3766 try:
-> 3767 path, res = self._choose_path(fast_path, slow_path, group)
3768 except TypeError:
3769 return self._transform_item_by_item(obj, fast_path)
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/groupby.py in _choose_path(self, fast_path, slow_path, group)
3861 def _choose_path(self, fast_path, slow_path, group):
3862 path = slow_path
-> 3863 res = slow_path(group)
3864
3865 # if we make it here, test if we can use the fast path
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/groupby.py in <lambda>(group)
3856 fast_path = lambda group: func(group, *args, **kwargs)
3857 slow_path = lambda group: group.apply(
-> 3858 lambda x: func(x, *args, **kwargs), axis=self.axis)
3859 return fast_path, slow_path
3860
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4260 f, axis,
4261 reduce=reduce,
-> 4262 ignore_failures=ignore_failures)
4263 else:
4264 return self._apply_broadcast(f, axis)
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4356 try:
4357 for i, v in enumerate(series_gen):
-> 4358 results[i] = func(v)
4359 keys.append(v.name)
4360 except Exception as e:
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/groupby.py in <lambda>(x)
3856 fast_path = lambda group: func(group, *args, **kwargs)
3857 slow_path = lambda group: group.apply(
-> 3858 lambda x: func(x, *args, **kwargs), axis=self.axis)
3859 return fast_path, slow_path
3860
<ipython-input-58-181ef4bb1f30> in wavg(group, data_name, weight_name, sigma)
10
11 #Extracting data and weights.
---> 12 data = group[data_name]
13 weight = group[weight_name]
14 #Check whether we have actual weights, or measurement uncertainties
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
599 key = com._apply_if_callable(key, self)
600 try:
--> 601 result = self.index.get_value(self, key)
602
603 if not is_scalar(result):
~/anaconda/envs/python36/lib/python3.6/site-
packages/pandas/core/indexes/base.py in get_value(self, series, key)
2475 try:
2476 return self._engine.get_value(s, k,
-> 2477
tz=getattr(series.dtype, 'tz', None))
2478 except KeyError as e1:
2479 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4404)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4087)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5210)()
KeyError: ('AvgTemp', 'occurred at index AvgTemp')
所以這顯然不起作用,但我不清楚爲什麼。任何指針/解決方案將是最受歡迎的。
我可以用apply()
方法,以獲得所需的輸出,但由於我平均超過團體,我真的不能合併這與原來的數據幀,由於apply()
生產的系列將不同尺寸的。