2017-10-12 20 views
0

請,我有兩個CSV文件與公司名稱的列。使用Python3和熊貓我做了一個合併比較名稱:如何使用difflib在列中查找相似的行?

compara1 = pd.merge(
    dividas_dep, funrural, 
    left_on='Nome_Devedor', 
    right_on='Razao_Social') 

找到7列,等列。但是這些文件的公司名稱並不總是在某些文件中正確輸入。例如:

AGROPECUARIA INDIANA LTDA 
AGROPECUARIA INDINA LTDA 

AGROTRI AGROPECUARIA TRIANGULO LTDA 
AGROTRI AGROPECUARI TRIANGULO LTDA 

因此合併沒有找到類似的值在Python

然後我用difflib:

from difflib import SequenceMatcher 

def similar(a, b): 
    threshold = 0.8 
    return (SequenceMatcher(None, a, b).ratio() > threshold) 


for i, row in dividas_dep.iterrows(): 
    a = (row['Nome_Devedor']) 
    for i, row in funrural.iterrows(): 
     b = (row['Razao_Social']) 
     similar(a, b) 

加工約5分鐘,但沒有任何回報。有問題?

回答

0

我認爲它只是需要顯示的結果,我現在意識到:

def similar(a, b): 
    threshold = 0.8 
    s = SequenceMatcher(None, a, b).ratio() > threshold 
    print(s) 
    return s 


for i, row in dividas_dep.iterrows(): 
    a = (row['Nome_Devedor']) 
    for i, row in funrural.iterrows(): 
     b = (row['Razao_Social']) 
     similar(a, b) 
     print(a) 
     print(b) 
     print("-/-") 
相關問題