在正則表達式中解析FIX消息

我發現第二個答案Parsing FIX protocol in regex?非常好，所以我試了一下。在正則表達式中解析FIX消息

這是我的代碼。

new_order_finder1 = re.compile("(?:^|\x01)(11|15|55)=(.*?)\x01") 
new_order_finder2 = re.compile("(?:^|\x01)(15|55)=(.*?)\x01") 
new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)\x01") 

if __name__ == "__main__": 
    line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x0149=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x0111=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x0144=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x0160=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01" 
    fields = dict(re.findall(new_order_finder1, line)) 
    print(fields) 

    fields2 = dict(re.findall(new_order_finder2, line)) 
    print(fields2) 

    fields3 = dict(re.findall(new_order_finder3, line)) 
    print(fields3)

這裏是輸出

{'11': 'N09080243', '55': 'AAPL.O'} 
{'55': 'AAPL.O', '15': 'USD'} 
{'35': 'D', '38': '2100', '11': 'N09080243', '54': '1'}

它看起來像某些字段不正確的正則表達式匹配。

這裏有什麼問題？

來源

2015-07-03 Johnyy

問題歸因於\x01最終消耗了\x01分隔符，導致該模式在與剛剛匹配的鍵對值相鄰的鍵 - 值對上始終失敗，因爲(?:^|\x01)都不匹配。

使用您的輸入作爲例子的這串，對new_order_finder3匹配：

\x0154=1\x0155=AAPL.O\x01 
------------ 
      X

正如你所看到的，它管理相匹配的鍵值對54=1後，它也消耗\x01和相鄰的鍵 - 值對永遠不能匹配。

有多種方法可以解決此問題。一個解決方案是將\x01在先行斷言結束，這樣我們就可以確保\x01結束鍵值對，而無需耗費它：

new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)")

輸出現在包含所有所需字段：

{'11': 'N09080243', '38': '2100', '15': 'USD', '55': 'AAPL.O', '54': '1', '35': 'D'}

來源

2015-07-03 06:04:00 nhahtdh

尾隨\x01消耗的東西，你想匹配。正則表達式匹配器將在之後匹配下一個匹配。

隨着向前看，修復很容易。只需將\x01替換爲(?=\x01)即可。

import re new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)") if __name__ == "__main__": line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x01"\ "49=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x01" \ "11=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x01" \ "44=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x01" \ "60=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01" fields3 = dict(re.findall(new_order_finder3, line)) print(fields3)

來源

2015-07-03 06:04:08 tripleee

在正則表達式中解析FIX消息

回答

相關問題