如何從熊貓數據框創建邊界列表？

-1

Col1 
A [Green,Red,Purple] 
B [Red, Yellow, Blue] 
C [Brown, Green, Yellow, Blue]

我需要將其轉換爲一個邊列表即形式的數據幀一大熊貓數據幀（DF）：

Source Target Weight 
    A   B   1 
    A   C   1 
    B   C   2

編輯注意新數據幀的行數等於可能的成對組合的總數。另外，要計算'權重'列，我們只需找到兩個列表之間的交集。例如，對於B & C，元素共享兩種顏色：藍色和黃色。因此，相應行的「重量」爲2.

這樣做的最快方法是什麼？原始數據框包含約28,000個元素。

來源

2017-07-09 Melsauce

得到長度獲取表示所有組合成對指數對不起，目前還不清楚你想從第一到第二。 –

@cᴏʟᴅsᴘᴇᴇᴅ將每個元素的列表進行兩兩比較。例如，對於A-B，元素有一個共同的元素（紅色）。因此，Source-A Target-B行的權重爲1.總而言之，新數據幀將具有原始數據幀行的所有成對組合。 – Melsauce

當你說28k元素你是指行/節點？如果是這樣，做一個生成所有組合的方法將是相當[大]（https://www.google.com/search?q=28000+choose+2&oq=28000+cho&aqs=chrome.0.69i59j69i57j0。6150j0j8＆sourceid = chrome＆ie = UTF-8） –

首先，在開始了與數據框：

In [823]: from itertools import combinations 

In [824]: df = pd.DataFrame({'Col1': [['Green','Red','Purple'], ['Red', 'Yellow', 'Blue'], ['Brown', 'Green', 'Yellow', 'Blue']]}, index=['A', 
    ...: 'B', 'C']) 

In [827]: df['Col1'] = df.Col1.apply(lambda x: set(x)) 

In [828]: df 
Out[828]: 
          Col1 
A   {Purple, Red, Green} 
B   {Red, Blue, Yellow} 
C {Green, Yellow, Blue, Brown}

在Col1每個列表都有被轉換成一個集合來有效地找到聯盟。接下來，我們將使用itertools.combinations在df創建的所有行的配對組合：

In [845]: df1 = pd.DataFrame(data=list(combinations(df.index.tolist(), 2)), columns=['Src', 'Dst']) 

In [849]: df1 
Out[849]: 
    Src Dst 
0 A B 
1 A C 
2 B C

現在，應用功能，採取套的團結和發現它的長度。 Src和Dst列充當對df的查找。

In [859]: df1['Weights'] = df1.apply(lambda x: len(df.loc[x['Src']]['Col1'].intersection(df.loc[x['Dst']]['Col1'])), axis=1) 

In [860]: df1 
Out[860]: 
    Src Dst Weights 
0 A B  1 
1 A C  1 
2 B C  2

我建議設置轉換的一開始。每次在飛行中將您的列表轉換爲集合是昂貴且浪費的。

更多的加速，你可能想也複製到組兩列新的數據幀，如@Wen做，因爲調用df.loc不斷將慢下來了一個檔次。

來源

2017-07-09 02:40:40

我想我們使用的是同樣的方法，但是你最好upvote〜 – Wen

@Wen對不起！我獨立於你寫了我的答案。有一些細微的差異，但它們在很多方面都很相似。我相信你也值得+1 +1 :) –

試試這個。不是很整潔，但工作。 PS：最後出來讓你可以調整它，我沒有刪除列和更改列名

import pandas as pd 
df=pd.DataFrame({"Col1":[['Green','Red','Purple'],['Red', 'Yellow', 'Blue'],['Brown', 'Green', 'Yellow', 'Blue']],"two":['A','B','C']}) 
df=df.set_index('two') 
del df.index.name 
from itertools import combinations 
DF=pd.DataFrame() 
dict1=df.T.to_dict('list') 
DF=pd.DataFrame(data=[x for x in combinations(df.index, 2)]) 
DF['0_0']=DF[0].map(df['Col1']) 
DF['1_1']=DF[1].map(df['Col1']) 
DF['Weight']=DF.apply(lambda x : len(set(x['0_0']).intersection(x['1_1'])),axis=1) 



DF 
Out[174]: 
    0 1     0_0       1_1 Weight 
0 A B [Green, Red, Purple]   [Red, Yellow, Blue]  1 
1 A C [Green, Red, Purple] [Brown, Green, Yellow, Blue]  1 
2 B C [Red, Yellow, Blue] [Brown, Green, Yellow, Blue]  2

來源

2017-07-09 02:16:51 Wen

得到的集的陣列
使用np.triu_indices
使用&操作者，以獲得成對交叉點，並經由理解

c = df.Col1.apply(set).values 

i, j = np.triu_indices(c.size, 1) 

pd.DataFrame(dict(
     Source=df.index[i], 
     Target=df.index[j], 
     Weight=[len(s) for s in c[i] & c[j]] 
    )) 

    Source Target Weight 
0  A  B  1 
1  A  C  1 
2  B  C  2

來源

2017-07-09 05:51:12 piRSquared

如何從熊貓數據框創建邊界列表？

回答

相關問題