2012-06-01 80 views
2

我嘗試去除一個典型的Google搜索字符串。 即刺痛可能是:PHP/PCRE /正則表達式:剝離搜索字詞appart

「如何」發動機 - 燃料所以我想「如何」和發動機 - 燃料 seperately。

我試着用下面的preg_match_all,但我得到「如何爲」 seperately以及那可能會unneccesarily難以處理。

preg_match_all(
    '=(["]{1}[^"]{1,}["]{1})' 
    .'|([-]{1}[^ ]{1,}[ ]{1})' 
    .'|([^-"]{1}[^ ]{1,}[ ]{1})=si', 
    $filter, 
    $matches, 
    PREG_PATTERN_ORDER); 

任何任何想法如何做到這一點的權利?

+1

類似的問題:http://stackoverflow.com/questions/10695143/split-a - 帶分隔符的帶引號的字符串 – nhahtdh

回答

2

嘗試:

,它將打印:

 
Array 
(
    [0] => Array 
     (
      [0] => "how to" 
      [1] => engine 
      [2] => -fuel 
     ) 

) 

含義:

"[^"]*" # match a quoted string 
|   # OR 
\S+  # 1 or more non-space chars 
+0

謝謝!它更小,做我想要的。幫了很多! –

+0

不客氣@AndreschSerj。 –

1

試試這個

(?i)("[^"]+") +([a-z]+) +(\-[a-z]+)\b 

代碼

if (preg_match('/("[^"]+") +([a-z]+) +(-[a-z]+)\b/i', $subject, $regs)) { 
    $howto = $regs[1]; 
    $engine = $regs[2]; 
    $fuel = $regs[3]; 
} else { 
    $result = ""; 
} 

說明

" 
(?i)  # Match the remainder of the regex with the options: case insensitive (i) 
(   # Match the regular expression below and capture its match into backreference number 1 
    \"   # Match the character 「\"」 literally 
    [^\"]  # Match any character that is NOT a 「\"」 
     +   # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
    \"   # Match the character 「\"」 literally 
) 
\   # Match the character 「 」 literally 
    +   # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
(   # Match the regular expression below and capture its match into backreference number 2 
    [a-z]  # Match a single character in the range between 「a」 and 「z」 
     +   # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
) 
\   # Match the character 「 」 literally 
    +   # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
(   # Match the regular expression below and capture its match into backreference number 3 
    \-   # Match the character 「-」 literally 
    [a-z]  # Match a single character in the range between 「a」 and 「z」 
     +   # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
) 
\b   # Assert position at a word boundary 
" 

希望這有助於。

+0

太好了,但我更喜歡另一個,因爲它非常簡單。仍然:非常感謝,祝你有美好的一天! –