2015-11-20 30 views
2

上下文: 這是一個日誌分析的事情。我正在創建一個regex程序來查找從客戶端發送到服務器的某些請求的發生。我有包含這些請求的客戶端日誌文件以及其他日誌。如何在模式重疊時發現發生

問題: 當一個請求消息發送給服務器,客戶端應該有像2個日誌語句:

sending.. 
message_type 

當上述陳述或圖案發現,我們可以說一個請求已發送。它是組合模式。好吧

我們期待日誌文件的內容會像

sending.. 
message_type 
...//other text 
sending.. 
message_type 
...//other text 
sending.. 
message_type 

從上面的日誌,我們可以說客戶端已派出3個消息。但在實際的日誌文件不知何故,模式如下重疊(不是所有的消息,但對於一些):

sending..(1) 
...//other text 
sending..(2) 
message_type(2) 
...//other text 
message_type(1) 
sending..(3) 
message_type(3) 

還有3請求(我編號信息來了解)。但是這種模式是重疊的,即在完全記錄第一條消息之前,記錄第二條消息。 以上說明僅供參考。下面是原來的日誌的部分:

原始日誌

Send message to server: 
Created post notification log dir 
Created post notification log dir 
Created post notification log dir 
Send message to server: 
Created post notification log dir 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message> 
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message> 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message> 
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message> 

這裏每解釋一個請求將與2份鑑定:

Send message to server: 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message> 

我試過

public class LogMatcher { 

    static final String create_session= "Send message to server(.){10,1000}(<\\?xml(.){10,500}type=\"createsession\"(.){1,100}</message>)"; 



    public static void main(String[] args) throws IOException { 
     BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File("D:/dummy.txt"))));//I put the above log in this file 
     StringBuilder b = new StringBuilder(); 
     String line = ""; 
     while((line = reader.readLine()) != null){  
      b.append(line); 
     } 

     findMatch(b,"Send message to server","Send message to server"); 
     findMatch(b,create_session,"create_session"); 

    } 
    private static int findMatch(StringBuilder b,String pattern, String type) { 
     int count =0; 
     Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE); 
     Matcher regexMatcher = regex.matcher(b.toString()); 
     while (regexMatcher.find()) { 
      count++; 
     } 
     System.out.printf("%25s%2d\n",type+": ",count); 
     return count; 
    } 
} 

電流輸出

目的是要找出createsession消息的號碼發送

Send message to server: 2 
     create_session: 1 

預計輸出

從日誌很顯然,2個消息sent.So放出來會是:

Send message to server: 2 
     create_session: 2 

你可以看到我在代碼中嘗試過的模式。任何人都可以提出一個模式來獲得所需的結果?

注:可以簡單地說爲什麼不單獨使用計數Send message to server。因爲在日誌中有很多類型的消息,比如login, closesession等等。所有這些消息的第一部分都是Send message to server。此外,他們已經單獨記錄的消息類型用於其他目的,所以我們不能對任何部分(這意味着我們能對繼電器的組合)的某些請求

+0

那麼,爲什麼你不指望'型= \ 「了createSession \」'一個人嗎? (1)'+'發送(2)'+'消息(1)'+'消息(2)'? – Mariano

+0

爲什麼因爲這些'type = \「createsession \」'xmls會與'Send message to server'一起記錄,還有一些其他的方式,比如存儲在'db'中。所以我們不能簡單地依靠這個。 –

+0

那麼如何知道'Send message to server'和你想匹配的xml之間是否存在'type = \「createsession \」'的'*其他方式*'?你可以展示一個你想忽略的xml的例子嗎? – Mariano

回答

1

查找發生中繼從客戶端發送到服務器。

「其他方式」,你可以在這裏忽略,這將有Store in DB :而不是Send message to server和xml消息。

我提出了一個新的戰略:

  1. 僅使用正則表達式1匹配所有的替代品,來分析日誌只有一次(在長文件提高性能)。
  2. 匹配type=\"createsession\" xmls獨立。
  3. 也匹配Store in DB: xmls,但忽略它們(不要增加計數器)。

我們可以使用下面的表達式來匹配發送到服務器的消息的數量。

^(?<toserver>Send message to server:) 
  • 通知我使用的是named group,我們以後可以參考作爲regexMatcher.group("toserver")遞增計數器。獨立

並匹配目標爲個XML:

^(?<message><\? *xml\b.{10,500} type *= *\"createsession\") 
  • 作爲regexMatcher.group("message")後來引用。
  • 我們將使用獨立的計數器。

那麼,我們該如何忽略Store in DB:個XML?我們可以匹配它們,而不會創建捕獲。

^Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.* 
  • 它字面Store in DB :,隨後
  • \r?\n(?:.*\n)*?如幾行儘可能相匹配,直到
  • <\? *xml\b.*它的拳頭<?xml
匹配

正則表達式

^(?:Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*|(?<toserver>Send message to server:)|(?<message><\? *xml\b.{10,500} type *= *\"createsession\")) 

regex101 demo


代碼

static final String create_session = "^(?:Store in DB ?:\\r?\\n(?:.*\\n)*?<\\? *xml\\b.*|(?<toserver>Send message to server:)|(?<message><\\? *xml\\b.{10,500} type *= *\\\"createsession\\\"))"; 

public static void main (String[] args) throws java.lang.Exception 
{ 
    //for testing purposes 
    final String text = "Send message to server:\nCreated post notification log dir\nCreated post notification log dir\nCreated post notification log dir\nSend message to server:\nCreated post notification log dir\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nStore in DB :\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></params></response></message>\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></response></message>"; 
    System.out.println("INPUT:\n" + text + "\n\nCOUNT:"); 
    StringBuilder b = new StringBuilder(); 
    b.append(text); 

    findMatch(b,create_session,"create_session"); 
} 

private static int findMatch(StringBuilder b,String pattern, String type) { 
    int count =0; // counter for "Send message to server:" 
    int countType=0; // counter for "type=\"createsession\"" 
    Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE); 
    Matcher regexMatcher = regex.matcher(b.toString()); 
    while (regexMatcher.find()) { 
     if (regexMatcher.group("toserver") != null) { 
      count++; 
     } else if (regexMatcher.group("message") != null) { 
      countType++; 
     } else { 
      // Ignoring "Store in DB :\n<?xml...." 
     } 
    } 
    System.out.printf("%25s%2d\n%25s%2d\n", "to server: ", count, type+": ", countType); 
    return countType; 
} 

輸出

INPUT: 
Send message to server: 
Created post notification log dir 
Created post notification log dir 
Created post notification log dir 
Send message to server: 
Created post notification log dir 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message> 
Store in DB : 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message> 
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message> 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message> 
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message> 

COUNT: 
       to server: 2 
     create_session: 2 

ideone demo

+1

感謝@Mariano,我喜歡你的名字和戰略。我想我可以從這個方法開始 –