2016-09-01 28 views
1

我的最終目標是創建一個帶d3的Force-Directed graph,它顯示在我的應用程序中使用某些功能的用戶羣。要做到這一點,我需要建立一套具有以下格式(從上面的鏈接所)「鏈接」:如何根據另一列的值獲取兩列組合的所有排列列表?

{"source": "Napoleon", "target": "Myriel", "value": 1} 

要到這一步,雖然,我開始與大熊貓數據框,看起來像這個。如何爲每個USER_ID生成APP_NAME/FEAT_ID組合的排列列表?

 APP_NAME  FEAT_ID USER_ID CNT 
280  app1   feature1 user1 114 
2622 app2   feature2 user1 8 
1698 app2   feature3 user1 15 
184  app3   feature4 user1 157 
2879 app2   feature5 user1 7 
3579 app2   feature6 user1 5 
232  app2   feature7 user1 136 
295  app2   feature8 user1 111 
2620 app2   feature9 user1 8 
2047 app3   feature10 user2 11 
3395 app2   feature2 user2 5 
3044 app2   feature11 user2 6 
3400 app2   feature12 user2 5 

預期結果:

基於以上數據幀,我期望user1user2生成以下排列

user1: 
    app1-feature1 -> app2-feature2, app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app2-feature2 -> app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app2-feature3 -> app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app3-feature4 -> app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app2-feature5 -> app2-feature6, app2-feature7, app2-feature8, app2-feature9 
    app2-feature6 -> app2-feature7, app2-feature8, app2-feature9 
    app2-feature7 -> app2-feature8, app2-feature9 
    app2-feature8 -> app2-feature9 

user2: 
    app3-feature10 -> app2-feature2, app2-feature11, app2-feature12 
    app2-feature2 -> app2-feature11, app2-feature12 
    app2-feature11 -> app2-feature12 

從此,我期望能夠生成D3的預期輸入,看起來像user2

{"source": "app3-feature10", "target": "app2-feature2"} 
{"source": "app3-feature10", "target": "app2-feature11"} 
{"source": "app3-feature10", "target": "app2-feature12"} 
{"source": "app2-feature2", "target": "app2-feature11"} 
{"source": "app2-feature2", "target": "app2-feature12"} 
{"source": "app2-feature11", "target": "app2-feature12"} 

怎樣才能在我的數據幀每個USER_IDAPP_NAME/FEAT_ID組合排列的列表?

回答

1

我想看看做一些元組出你的數據框,然後使用類似itertools.permutations東西創造所有的排列,然後從那裏,手藝你的字典,因爲你需要:

import itertools 

allUserPermutations = {} 

groupedByUser = df.groupby('USER_ID') 
for k, g in groupedByUser: 

    requisiteColumns = g[['APP_NAME', 'FEAT_ID']] 

    # tuples out of dataframe rows 
    userCombos = [tuple(x) for x in requisiteColumns.values] 

    # this is a generator obj 
    userPermutations = itertools.permutations(userCombos, 2) 

    # create a list of specified dicts for the current user 
    userPermutations = [{'source': s, 'target': tar for s, tar in userPermutations] 

    # store the current users specified dicts 
    allUserPermutations[k] = userPermutations 

如果排列唐不會返回所需的行爲,您可以嘗試一些其他組合發生器found here。希望這種策略有效(目前我沒有支持熊貓的REPL來測試它)。祝你好運!

相關問題