無法比較Python中的字符串

我有這個代碼應該打開並閱讀兩個文本文件，並匹配兩個字中都存在的字。通過打印「SUCESS」並將該單詞寫入temp.txt文件來表示匹配。作爲無法比較Python中的字符串

/teetetet 
/eteasdsa 
/asdasdfsa 
/asdsafads 
. 
. 
...etc

paths.txt被格式化爲

/asdadasd.php/asdadas/asdad/asd 
/adadad.html/asdadals/asdsa/asd 
. 
. 
...etc

因此我使用分割功能，以獲得第一/ asadasda（路徑內

dir = open('listac.txt','r') 
path = open('paths.txt','r') 
paths = path.readlines() 
paths_size = len(paths) 
matches = open('temp.txt','w') 
dirs = dir.readlines() 

for pline in range(0,len(paths)): 
     for dline in range(0,len(dirs)): 
       p = paths[pline].rstrip('\n').split(".")[0].replace(" ", "") 
       dd = dirs[dline].rstrip('\n').replace(" ", "") 
       #print p.lower() 
       #print dd.lower() 
       if (p.lower() == dd.lower()): 
         print "SUCCESS\n" 
         matches.write(str(p).lower() + '\n')

listac.txt被格式化.txt）在點之前。問題是，這些詞從來不匹配，我甚至在每個IF語句前打印出每個比較結果，並且它們是相等的，在比較字符串之前Python還有其他的東西嗎？

=======

感謝大家的幫助。正如你所說，我清理代碼，以便它弄成這個樣子：

dir = open('listac.txt','r') 
path = open('paths.txt','r') 
#paths = path.readlines() 
#paths_size = len(paths) 

for line in path: 
     p = line.rstrip().split(".")[0].replace(" ", "") 
     for lines in dir: 
       d = str(lines.rstrip()) 
       if p == d: 
         print p + " = " + d

顯然，具有p申報並進入第二個for循環之前進行初始化，使在比較的道路的差異。當我在第二個for循環中聲明p和d時，它不起作用。我不知道原因，但如果有人這樣做，我在聽:)

再次感謝！

來源

2012-09-11 user1663160

在您的示例中，沒有匹配。 –

太複雜了。不要在'range（0，len（paths））'中使用''''''''，只要用''''''''''''''''<！爲什麼'rstrip（'\ n'）'。可能有一個額外的'\ r'。只需使用'rstrip（）'。 – Matthias

您也可以將'p = ...'移動到inner for循環之外，因爲它每次都執行相同的計算。 – mgilson

我不得不看到更多的數據集，看看爲什麼你沒有得到匹配。我已經重構了一些代碼，以便更多pythonic。

dirFile = open('listac.txt','r') 
pathFile = open('paths.txt','r') 
paths = pathFile.readlines() 
dirs = dirFile.readlines() 

matches = open('temp.txt','w') 

for pline in paths: 
    p = pline.rstrip('\n').split(".")[0].replace(" ", "") 
    for dline in dirs: 
     dd = dline.rstrip('\n').replace(" ", "") 
     #print p.lower() 
     #print dd.lower() 
     if p.lower() == dd.lower(): 
      print "SUCCESS\n" 
      matches.write(str(p).lower() + '\n')

來源

2012-09-11 14:50:32 desimusxvii

+1，但是你可以通過首先將'dirs'轉換成一個集合（'dirs = {line.lower（）for line in dirFile}'），然後檢查'if p.lower（） '），並直接遍歷文件，完全避免'readlines（）'和所有這些'rstrip（）。 –

@TimPietzcker當然。我會寫完全不同的。我認爲讓初學者進入這種情況會更有幫助。 – desimusxvii

雖然我們在讀取數據文件全部到內存中，無論如何，爲什麼不嘗試使用sets並得到交集？：

def format_data(x): 
    return x.rstrip().replace(' ','').split('.')[0].lower() 

with open('listac.txt') as dirFile: 
    dirStuff = set(format_data(dline) for dline in dirFile) 

with open('paths.txt') as pathFile: 
    intersection = dirStuff.intersection(format_data(pline) for pline in pathFile) 

for elem in intersection: 
    print "SUCCESS\n" 
    matches.write(str(elem)+"\n")

我用同樣的format_data功能兩個數據集，因爲它們看起來差不多，但如果你願意，你可以使用多個功能。另請注意，該解決方案僅將兩個文件中的一個讀入內存。與另一個的交點應該被延遲計算。

正如在評論中指出的那樣，這不會做任何維持秩序的嘗試。但是，如果您確實需要保留訂單，請嘗試以下操作：

<snip> 
... 
</snip> 

with open('paths.txt') as pathFile: 
    for line in pathFile: 
     if format_line(line) in dirStuff: 
      print "SUCCESS\n" 
      #...

來源

2012-09-11 14:55:57 mgilson

雖然這將導致不同的輸出，因爲線的順序將會丟失。這很可能無關緊要。 –

我更喜歡'a.intersection（b）'之前的'a＆b'。「什麼是a＆b」==「什麼是a＆b」。 – Alfe

像∩這樣的操作符不是ASCII，因此還不是Python的一部分;-) – Alfe

無法比較Python中的字符串

回答

相關問題