正則表達式在字符串中查找字符串而不考慮順序？

我不確定hwo最好說出這個，所以我會直接進入一個例子。正則表達式在字符串中查找字符串而不考慮順序？

a bunch of lines we don't care about [...] 
This is a nice line I can look for 
This is the string I wish to extract 
a bunch more lines we do't care about [...] 
This line contains an integer 12345 related to the string above 
more garbage [...]

但有時（和我有過這個沒有控制）的順序被交換：

a bunch of lines we don't care about [...] 
Here is another string I wish to extract 
This is a nice line I can look for 
a bunch more lines we do't care about [...] 
This line contains an integer 67890 related to the string above 
more garbage [...]

兩條線（「好行」和「串我想提取」）總是相鄰但順序是不可預測的。包含行的整數在下面的行數不一致。「好行」出現多次，總是相同的，我提取的字符串和整數（全局）可能是相同或不同的。

最終的想法是填充兩個列表，一個包含字符串，另一個包含整數，這兩個列表都是在發現它們時排序的，因此這兩個列表可以稍後用作鍵/值對。

什麼我不知道該怎麼做，或者即使它可能，是編寫一個正則表達式，可以在目標行之後立即找到該字符串？

在Python中做這個，順便說一句。

想法？

編輯/添加：所以我很期待，結果出了上面的示例文本會是這樣的：

list1["This is the string I wish to extract", "Here is another string I wish to extract"] 
list2[12345, 67890]

來源

2014-11-09 bcsteeve

你需要概括這兩個以上的亂序行？ – 2014-11-09 02:50:02

你能舉一些你想看到的具體例子嗎？ – 2014-11-09 03:42:48

@KarlKnechtel在這種情況下，沒有。它只有兩個。 – bcsteeve 2014-11-09 08:21:47

一個很好的策略可能是找「好行」，然後搜索上面和下面的線條。

請參閱以下（未經測試）蟒蛇僞代碼：

L1, L2 = [], [] 
lines = open("file.txt").readlines() 
for i, line in enumerate(i, lines): 
    if 'nice line' in line: 
     before_line = lines[min(i-1, 0)] 
     after_line = lines[min(i+1, len(lines) - 1)] 
     # You can generalize the above to a few lines above and below 

     # Use regex to parse information from `before_line` and `after_line` 
     # and add it to the lists: L1, L2

來源

2014-11-09 04:27:23 jaynp

啊，是的，這應該工作！謝謝。 – bcsteeve 2014-11-09 08:20:44

正則表達式在字符串中查找字符串而不考慮順序？

回答

相關問題