2017-10-08 61 views
0

我試圖在「for」循環內使用「if」語句來檢查當前項目的索引是否在循環(包含該項目的熊貓系列的索引)對應於另一個系列的其中一個索引,但這樣做會產生ValueError。 這是代碼,給問題的行:ValueError:緩衝區有錯誤的維數(預期1,得到2),如果在語句中

if(ICM_items[ICM_items['track_id'] == i].index[0] in ICM_tgt_items.index.values.flatten().tolist()): 

我試圖改變「在」隨機整數或列表聲明兩側和它的作品,也是這兩個項目都正確構建,但是當耦合他們的聲明提出了錯誤。

希望有人可以給我一些提示問題的位置或執行相同任務的替代方法。

ICM_items和ICM_tgt_items都是pandas.Series

下面有控制檯錯誤:

Traceback (most recent call last): 
File "/Users/LucaButera/git/rschallenge/similarity_to_recommandable_builder.py", line 27, in <module> 
dot[ICM_tgt_items[ICM_items[ICM_items['track_id'] == i].index[0]]] = 0 
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/core/series.py", line 603, in __getitem__ 
result = self.index.get_value(self, key) 
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py", line 2169, in get_value 
tz=getattr(series.dtype, 'tz', None)) 
File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557) 
File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240) 
File "pandas/index.pyx", line 147, in pandas.index.IndexEngine.get_loc (pandas/index.c:4194) 
File "pandas/index.pyx", line 280, in pandas.index.IndexEngine._ensure_mapping_populated (pandas/index.c:6150) 
File "pandas/src/hashtable_class_helper.pxi", line 446, in pandas.hashtable.Int64HashTable.map_locations (pandas/hashtable.c:9261) 
ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 
[Finished in 1.26s] 
+0

您的問題的設置不是很清楚。如果您提供了有代表性的樣本數據,這將有所幫助請參見[如何創建最小,完整和可驗證示例](https://stackoverflow.com/help/mcve)。另外,你確定你只想看看匹配'track_id == i'的ICM_items的第一個索引嗎?如果返回多個索引會怎麼樣? –

回答

1

我會建議您簡化表達式,使用.loc,並保留一隻眼睛邊緣的情況下(如如track_id對於給定的i變空)。
使用正確的測試數據,這些步驟應該可以幫助您縮小尋找錯誤的範圍。

ICM_items數據:

import numpy as np 
import pandas as pd 

N = 7 
max_track_id = 5 
idx1 = ['A','B','C'] 
icm_idx = np.random.choice(idx1, size=N) 
icm = {"track_id":np.random.randint(0, max_track_id, size=N)} 
ICM_items = pd.DataFrame(icm, index=icm_idx) 

ICM_items 
    track_id 
C   1 
A   1 
A   2 
C   1 
B   0 
B   0 
B   2 

ICM_tgt_items數據:

idx2 = ['A','B'] 
icm_tgt_idx = np.random.choice(idx2, size=N) 
icm = np.random.random(size=N) 
ICM_tgt_items = pd.DataFrame(icm, index=icm_tgt_idx) 

      0 
B 0.785614 
A 0.976523 
A 0.856821 
B 0.098086 
B 0.481140 
A 0.686156 
A 0.851714 

現在簡單地比較和捕獲潛在邊緣情況:

for i in range(max_track_id): 
    mask = ICM_items['track_id'] == i 
    try: 
     # use .loc for indexing, no need to flatten() or use .values on the right. 
     if ICM_items.loc[mask].index[0] in ICM_tgt_items.index: 
      print("found") 
     else: 
      print("not found") 
    # catch error if i not found in track_id 
    except IndexError as e:   
     print(f"ERROR at i={i}: {e}") 

輸出:

found 
not found 
found 
ERROR at i=3: index 0 is out of bounds for axis 0 with size 0 
ERROR at i=4: index 0 is out of bounds for axis 0 with size 0 
相關問題