2014-02-23 63 views
1

我有這樣的文字....的preg_match得到文本的一部分

=========================== =========================

澳大利亞醫學協會主席史蒂夫漢布爾頓週一表示,任何會阻止人們去看醫生的變化。 「Hambleton博士說,」我認爲我們必須確定我們想要解決的問題是什麼,「他說。

週日,聯邦衛生部長彼得·杜頓拒絕結束有關該提案得到政府支持的猜測,只是表示他致力於確保衛生系統「可持續且可供未來使用」。 Dutton先生在一份聲明中表示:「我們不會評論圍繞審計委員會可能推薦或不推薦的猜測,」Dutton先生在一份聲明中表示。「我們不會評論圍繞審計委員會可能推薦或不推薦的猜測,」Dutton先生在一份聲明中表示。

報告預測的「家庭醫生服務」量將由3%下降了一年,如果引入共同支付,減緩醫保增長受益於2014年7月

計劃支出==== ================================================================

我想匹配的所有報價在上面的段落,

預期結果:

  1. ''我想我們必須找出我們想要解決的問題,「Hambleton博士說。 「我們不會評論圍繞審計委員會可能推薦或不推薦的猜測,」達頓先生在一份聲明中表示。

我使用這個代碼,但沒有工作..

$match = array(); 

preg_match_all("/\'\'(.*)\'\'/i", $str, $match); 

如果可能的話用正則表達式我會使用,但問題是,我怎麼能抓住行情

回答

1

,聲明爲所有報價說動詞:

\B"[^"]+"(?=.+?said)[^.]+. 

Online demo

你應該多線使用&情況下insensetive(可選)標誌:

preg_match_all('/\B"[^"]+"(?=.+?said)[^.]+./mi', $text, $matches); 
0

這將提供「預期結果」下的內容。但請注意,它不符合您提供的文本中的所有引用,例如「可持續且可供未來訪問」。我知道你在預期的結果中沒有這些,我只是澄清它們被忽略,這是我的解決方案不提供它們的原因。

$text = <<<EOD 
==================================================== 

Australian Medical Association president Steve Hambleton said on Monday any change which would deter people from seeing a doctor. 

"I think we've got to identify what the problem is that we're trying to solve," Dr Hambleton said. 

On Sunday federal Health Minister Peter Dutton refused to end speculation that the proposal had the government's support, only saying he was committed to ensuring the health system was "sustainable and accessible to the future". 

"We won't be commenting on speculation around what the Commission of Audit may or may not recommend," Mr Dutton said in a statement. 

The report predicts that the volume of "GP services" would decline by 3 per cent a year if co-payment was introduced, slowing the growth in Medicare Benefits Schedule outlays from July 2014. 

================================================ 
EOD; 

$matches = null; 
preg_match_all('#"[ a-zA-Z,\']+"[ a-zA-Z,\']+\.#m', $text, $matches); 
var_dump($matches); // $matches[0] contains array of all matched quotes 

?>

+0

很抱歉在上面的代碼中的顏色格式。這是由於我使用[heredoc](http://us2.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc) –

0

下面是工作代碼的示例:

$str = <<<EOT 
Australian Medical Association president Steve Hambleton said on Monday any change which would deter people from seeing a doctor. 

"I think we've got to identify what the problem is that we're trying to solve," Dr Hambleton said. 

On Sunday federal Health Minister Peter Dutton refused to end speculation that the proposal had the government's support, only saying he was committed to ensuring the health system was "sustainable and accessible to the future". 

"We won't be commenting on speculation around what the Commission of Audit may or may not recommend," Mr Dutton said in a statement. 

The report predicts that the volume of "GP services" would decline by 3 per cent a year if co-payment was introduced, slowing the growth in Medicare Benefits Schedule outlays from July 2014. 
EOT; 
$match = array(); 

preg_match_all('/(?:[^.?!\s][^.?!"]+)?"[^"]*"[^.?!"]*[.?!]"?/i', $str, $match); 
var_dump($match); 

這產生:

array(1) { 
    [0]=> 
    array(4) { 
    [0]=> 
    string(98) ""I think we've got to identify what the problem is that we're trying to solve," Dr Hambleton said." 
    [1]=> 
    string(228) "On Sunday federal Health Minister Peter Dutton refused to end speculation that the proposal had the government's support, only saying he was committed to ensuring the health system was "sustainable and accessible to the future"." 
    [2]=> 
    string(132) ""We won't be commenting on speculation around what the Commission of Audit may or may not recommend," Mr Dutton said in a statement." 
    [3]=> 
    string(190) "The report predicts that the volume of "GP services" would decline by 3 per cent a year if co-payment was introduced, slowing the growth in Medicare Benefits Schedule outlays from July 2014." 
    } 
} 

說明正則表達式的:

  • (?:[^.?!\s][^.?!"]+)?允許句子開始與非所引用的材料。從非空格,非標點符號開始,然後匹配非標點符號,非引號字符。
  • "[^"]*"一個報價,非報價字符和另一個報價。
  • [^.?!"]*(可選)包含更多不含引號的非標點符號。
  • [.?!]"?最後一個標點符號和可選的最後一個引號。

Regular expression visualization

Debuggex Demo

已知問題:它不會有一個以上的報價或報價開始前一個週期一致的句子。例如,這樣的:Mr. Brown said, "I think," then continued, "therefore I am."

編輯:這解決了上述問題:

preg_match_all('/(?=[^.?!\s])(?:(?:(?:Mrs?|Dr|Rev)\.\s|[^.?!"])*?"[^"]*?[^.?!]")+(?:(?:(?:Mrs?|Dr|Rev)\.\s|[^.?!"])*?(?:"[^"]*?[.?!]"?|[.?!]))/i', $str, $match); 

它匹配上述每個句子,併產生一個匹配的這句話:

先生布朗說,「我想,」然後繼續格林先生坐立不安,「因此,我是。」

演示:

(?=[^.?!\s])(?:(?:(?:Mrs?|Dr|Rev)\.\s|[^.?!"])*?"[^"]*?[^.?!]")+(?:(?:(?:Mrs?|Dr|Rev)\.\s|[^.?!"])*?(?:"[^"]*?[.?!]"?|[.?!])) 

Regular expression visualization

Debuggex Demo