與排序標準

在previous question合併DataFrames，我是問如何從這個數據幀source匹配值：與排序標準

 car_id  lat  lon 
0 100  10.0 15.0 
1 100  12.0 10.0 
2 100  13.0 09.0 
3 110  23.0 08.0 
4 110  13.0 09.0 
5 110  12.0 10.0 
6 110  12.0 02.0 
7 120  11.0 11.0 
8 120  12.0 10.0 
9 120  13.0 09.0 
10 120  14.0 08.0 
11 130  12.0 10.0

，只保留那些COORDS在這第二個數據幀coords：

 lat  lon 
0 12.0 10.0 
1 13.0 09.0

但這一次我想匹配每個car_id誰得到：

都具有相同的順序

的從coords

值，使所產生的數據幀result是：

 car_id 
1 100 
2 120 

# 110 has all the values from coords, but not in the same order 
# 130 doesn't have all the values from coords

有沒有辦法在一個量化的方式來實現這一結果，避免經歷了很多循環和條件？

來源

2017-04-13 Jivan

計劃

我們會groupby'car_id'和評估每個子集
的innermerge後，我們應該看到兩件事情
1. 產生的合併數據框應該具有相同的值coords
2. 產生的合併數據框應該面面俱到

def duper(df): 
    m = df.merge(coords) 
    c = pd.concat([m, coords]) 
    # we put the merged rows first and those are 
    # the ones we'll keep after `drop_duplicates(keep='first')` 
    # `keep='first'` is the default, so I don't pass it 
    c1 = (c.drop_duplicates().values == coords.values).all() 

    # if `keep=False` then I drop all duplicates. If I got 
    # everything in `coords` this should be empty 
    c2 = c.drop_duplicates(keep=False).empty 
    return c1 & c2 

source.set_index('car_id').groupby(level=0).filter(duper).index.unique().values 

array([100, 120])

輕微替代

def duper(df): 
    m = df.drop('car_id', 1).merge(coords) 
    c = pd.concat([m, coords]) 
    c1 = (c.drop_duplicates().values == coords.values).all() 
    c2 = c.drop_duplicates(keep=False).empty 
    return c1 & c2 

source.groupby('car_id').filter(duper).car_id.unique()

來源

2017-04-13 22:41:57 piRSquared

這是不漂亮，但如果你做了這樣的事情是什麼：

df2 = DataFrame(df, copy=True) 
df2[['lat2', 'lon2']] = df[['lat', 'lon']].shift(-1) 
df2.set_index(['lat', 'lon', 'lat2', 'lon2'], inplace=True) 
print(df2.loc[(12, 10, 13, 9)].reset_index(drop=True)) 

    car_id 
0  100 
1  120

，這將是一般情況下：

raw_data = {'car_id': [100, 100, 100, 110, 110, 110, 110, 120, 120, 120, 120, 130], 
      'lat': [10, 12, 13, 23, 13, 12, 12, 11, 12, 13, 14, 12], 
      'lon': [15, 10, 9, 8, 9, 10, 2, 11, 10, 9, 8, 10], 
      } 
df = pd.DataFrame(raw_data, columns = ['car_id', 'lat', 'lon']) 

raw_data = { 
      'lat': [10, 12, 13], 
      'lon': [15, 10, 9], 
      } 

coords = pd.DataFrame(raw_data, columns = ['lat', 'lon']) 

def submatch(df, match): 
    df2 = DataFrame(df['car_id']) 
    for x in range(match.shape[0]): 
     df2[['lat{}'.format(x), 'lon{}'.format(x)]] = df[['lat', 'lon']].shift(-x) 

    n = match.shape[0] 
    cols = [item for sublist in 
     [['lat{}'.format(x), 'lon{}'.format(x)] for x in range(n)] 
     for item in sublist] 

    df2.set_index(cols, inplace=True) 
    return df2.loc[tuple(match.stack().values)].reset_index(drop=True) 

print(submatch(df, coords)) 

    car_id 
0  100

來源

2017-04-13 19:27:21 nbraun

什麼是這個答案原來的DF？ –

回答

相關問題