2016-05-29 38 views
-3

我建立一個這樣的幾千個條目的文本文件:如何匹配這種模式(與表情符號)?

11111111111:文字文字文字文字文字:: word11111111:文字文字文字文字:: word111111111:

其中:

  • 11111111是一個很大的數字
  • text text text text可以anything包括表情圖案
  • word是8個字一個
  • 第二111111111是另一個號碼,但不同的。

我試過了,但無法匹配它。

我不知道如何對待表情符號,另一個問題是空格不一致,有時候是空格,有時候是製表符等等。

+3

你嘗試過什麼正則表達式? – nwk

+1

尋求調試幫助的問題(「爲什麼這個代碼不工作?」)必須包含所需的行爲,特定的問題或錯誤以及在問題本身中重現問題所需的最短代碼。沒有明確問題陳述的問題對其他讀者無益。 [請參閱:如何創建最小,完整和可驗證示例。](http://stackoverflow.com/help/mcve) –

+0

[此模式](https://regex101.com/r/lB7kE5/1 )? – Quinn

回答

1

說明

^([0-9]+):\s*((?:(?!\s::).)*)\s::\s*([^:]+)\s*:\s*((?:(?!\s::).)*)\s::\s*([^:]+):$ 

Regular expression visualization

這個正則表達式將執行以下操作:

  • 捕捉龍頭11111111
  • 匹配的:
  • 捕獲text text text text text其中可能包含表情符號。
  • 匹配的::
  • word11111111
  • 比賽:
  • 捕捉可能含有表情符號的text text text text text
  • 匹配的::
  • word11111111
  • 匹配的:
  • 允許:::是分隔符
  • 不包括周圍的分隔符包含在比賽的空間。

要看到圖像更好,你可以用鼠標右鍵單擊它並選擇在新窗口中

現場演示

https://regex101.com/r/qG7uZ7/1

示例文本

開放 從比賽

0. 11111111111: text text text text text :: word11111111: text text text text :: word111111111: 
1. `11111111111` 
2. `text text text text text` 
3. `word11111111` 
4. `text text text text` 
5. `word111111111` 

說明個
11111111111: text text text text text :: word11111111: text text text text :: word111111111: 

捕捉組

NODE      EXPLANATION 
---------------------------------------------------------------------- 
^      the beginning of a "line" 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
    [0-9]+     any character of: '0' to '9' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \1 
---------------------------------------------------------------------- 
    :      ':' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more 
          times (matching the most amount 
          possible)): 
---------------------------------------------------------------------- 
     (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
     \s      whitespace (\n, \r, \t, \f, and " ") 
---------------------------------------------------------------------- 
     ::      '::' 
---------------------------------------------------------------------- 
    )      end of look-ahead 
---------------------------------------------------------------------- 
     .      any character except \n 
---------------------------------------------------------------------- 
    )*      end of grouping 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    \s      whitespace (\n, \r, \t, \f, and " ") 
---------------------------------------------------------------------- 
    ::      '::' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \3: 
---------------------------------------------------------------------- 
    [^:]+     any character except: ':' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \3 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    :      ':' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \4: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more 
          times (matching the most amount 
          possible)): 
---------------------------------------------------------------------- 
     (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
     \s      whitespace (\n, \r, \t, \f, and " ") 
---------------------------------------------------------------------- 
     ::      '::' 
---------------------------------------------------------------------- 
    )      end of look-ahead 
---------------------------------------------------------------------- 
     .      any character except \n 
---------------------------------------------------------------------- 
    )*      end of grouping 
---------------------------------------------------------------------- 
)      end of \4 
---------------------------------------------------------------------- 
    \s      whitespace (\n, \r, \t, \f, and " ") 
---------------------------------------------------------------------- 
    ::      '::' 
---------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \5: 
---------------------------------------------------------------------- 
    [^:]+     any character except: ':' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \5 
---------------------------------------------------------------------- 
    :      ':' 
---------------------------------------------------------------------- 
    $      before an optional \n, and the end of a 
          "line" 
----------------------------------------------------------------------