我有這個字典並將該數據有效:名單按在字典解析鍵值 - Python的
In [40]:
atemp
Out[40]:
{0: ['adc telecommunications inc'],
1: ['aflac inc'],
2: ['agco corporation'],
3: ['agl resources inc'],
4: ['invesco ltd'],
5: ['ak steel holding corporation'],
6: ['amn healthcare services inc'],
7: ['amr corporation']}
In [42]:
cemptemp
Out[42]:
Company name nstandar
0 1-800-FLOWERS.COM 1800flowerscom
1 1347 PROPERTY INS HLDGS INC 1347 property ins hldgs inc
2 1ST CAPITAL BANK 1st capital bank
3 1ST CENTURY BANCSHARES INC 1st century bancshares inc
4 1ST CONSTITUTION BANCORP 1st constitution bancorp
5 1ST ENTERPRISE BANK 1st enterprise bank
6 1ST PACIFIC BANCORP 1st pacific bancorp
7 1ST SOURCE CORP 1st source corporation
我的代碼,我用的是字典的每個值找到的列nstandar的元素熊貓數據框,其與字典的值的jaccard距離大於0.1並且創建新的字典,其中鍵是前一個字典的值,並且這些值是基於jaccard距離選擇的數據幀的值。
我試過這段代碼,但它只是給每個鍵一個值,我知道我應該有一個每個鍵的列表。
sd={ y : row['nstandar'] for k,value in atemp.iteritems() for y in value for index , row in cemptemp.iterrows() if jack(y,row['nstandar'])>=0.1}
所以SD是:
{'adc telecommunications inc': '1st century bancshares inc',
'aflac inc': '1st century bancshares inc',
'agco corporation': '1st source corporation',
'agl resources inc': '1st century bancshares inc',
'ak steel holding corporation': '1st source corporation',
'amn healthcare services inc': '1st century bancshares inc',
'amr corporation': '1st source corporation'}
但是,預期輸出的第一個關鍵shuld是:'adc telecommunications inc' :[ '1347 property ins hldgs inc' , '1st century bancshares inc']
那麼,怎樣才能修復我我的代碼來獲得我想要什麼?
編輯:中捷卡距離的代碼是:
def jack(a,b):
x=a.split()
y=b.split()
xy = set(x+y)
return float(len(x)+len(y)-len(xy))/float(len(xy))
編輯2:我想出了一個解決方案:
from collections import defaultdict
td=defaultdict(list)
for k,value in atemp.iteritems():
for y in value:
for index , row in cemptemp.iterrows():
if jack(y,row['nstandar'])>=0.1:
td[y].append(row['nstandar'])
但是,如果嘗試寫入相同代碼但字典理解,它不工作:
from collections import defaultdict
td=defaultdict(list)
td={y : td[y].append(row['nstandar']) for k,value in atemp.iteritems() for y in value for index , row in cemptemp.iterrows() if jack(y,row['nstandar'])>=0.1}
那麼,我的解決方案和使用詞典理解的代碼有什麼區別?