2013-10-31 90 views
0

我使用段落格式的文本,日期始終位於每個段落文章的上方。問題是在每篇文章之後,有不同種類的unicode換行符都有未知的換行符。我需要刪除每個段落之間換行符的每個實例,並用兩個\n\n替換它。替換常規換行符和統一碼換行符

所以從這個

05/12 
The 1959 Mexico hurricane was a devastating tropical cyclone 
that was one of the worst ever Pacific hurricanes. It 
impacted the Pacific coast of Mexico in October 1959. The 
hurricane killed at least 1,000 people. 




11/01 
The 1959 Mexico hurricane was a devastating tropical cyclone 
that was one of the worst ever Pacific hurricanes. It 
impacted the Pacific coast of Mexico in October 1959. The 
hurricane killed at least 1,000 people. 

對此

05/12 
The 1959 Mexico hurricane was a devastating tropical cyclone 
that was one of the worst ever Pacific hurricanes. It 
impacted the Pacific coast of Mexico in October 1959. The 
hurricane killed at least 1,000 people. 

11/01 
The 1959 Mexico hurricane was a devastating tropical cyclone 
that was one of the worst ever Pacific hurricanes. It 
impacted the Pacific coast of Mexico in October 1959. The 
hurricane killed at least 1,000 people. 

我嘗試使用preg_replace()但它不是每個實例匹配?

$text = preg_replace('/\r?\n+(?=\d{2}\/\d{2})/', "\n\n", $text); 
+1

也許你需要嘗試匹配代表'換行符'的所有Unicode字符?我知道另一個在一週前搞砸了我的文本標記器 - 回車'\ r'。這只是一個提示,雖然...... **劃痕,看起來像你匹配'\ r'。 –

回答

1

我發佈在類似question關於這一個月左右回來。

要匹配任何被認爲是斷行序列,可以使用\R

\ r通用換行符相匹配;也就是說,任何被Unicode認爲是換行順序的東西。這包括\ v(垂直空格)和多字符序列\ x0D \ x0A匹配的所有字符。

試試這個。

$text = preg_replace('~\R+(?=\d{2}/\d{2})~u', "\n\n", $text); 

請參閱PCRE有關實現此目的不同方法的文檔。

+0

如果'\ R'不支持'(\ n | \ r | \ n \ r)'可以使用 – kirilloid

+0

哇我從來不知道這個!這解決了這個問題... –