如何使用python查找文本中的字符偏移

我的目標是在兩個對齊的文本文檔中標識匹配的字符串，然後在每個文檔中查找匹配字符串的起始字符的位置。如何使用python查找文本中的字符偏移

doc1=['the boy is sleeping', 'in the class', 'not at home'] 
doc2=['the girl is reading', 'in the class', 'a serious student']

我嘗試：

# find matching string(s) that exist in both document list: 
matchstring=[x for x in doc1 if x in doc2] 
Output=matchstring='in the class'

現在的問題是找到在DOC1和DOC2匹配的字符串的字符偏移量（不包括標點符號，空格包括在內）。

理想的結果：

Position of starting character for matching string in doc1=20 
Position of starting character for matching string in doc2=20

在文本對齊任何想法？謝謝。

來源

2014-03-02 Tiger1

爲什麼我找到19而不是21？ – zhangxaochen

嗨@zhangxaochen，你在'睡眠'中停止了數字'g'而不是停止在''在'班'中的字符'i'。 – Tiger1

'男孩正在睡覺'的長度是19，'i'是位於第19位的第20位字符，如果從0開始索引。 – zhangxaochen

喜的人，試試這個：

doc1=['the boy is sleeping', 'in the class', 'not at home'] 
doc2=['the girl is reading', 'in the class', 'a serious student'] 

temp=''.join(list(set(doc1) & set(doc2))) 
resultDoc1 = ''.join(doc1).find(temp) 
resultDoc2 = ''.join(doc2).find(temp) 

print "Position of starting character for matching string in doc1=%d" % (resultDoc1 + 1) 
print "Position of starting character for matching string in doc2=%d" % (resultDoc2 + 1)

它的工作完全是你的期望！

來源

2014-03-02 19:40:05

Al Mamun，感謝您的解決方案。正如你所說，它完美運作。 – Tiger1

接受答案並投票，男人:) –

@Al Mamum，我仍然希望我會得到一個雙碼線的答案。 – Tiger1

如何使用python查找文本中的字符偏移

回答

相關問題