2017-05-17 32 views
1

我想要做的是取一塊html,去掉所有html標籤,並將每行文本放入一個PHP數組中。修剪不能從MySQL中獲取數組字符串字符串

我只是一個塊試圖在我的MySQL查詢測試(因此WHERE ID = '2409'

ID2409的HTML部分看起來是這樣的:

<table class="description-table"> 
<tbody> 
<tr><td>Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9</td></tr> 
<tr><td>Description</td></tr> 
<tr><td></td> 
<td><br> 
<br><p></p><p></p> 
<strong><br></strong> <strong><br></strong> <strong>Donec Rem </strong><br> 
<br> 
<strong>Animam Urgebat<br> 
<br></strong> <strong><br> 
<br> 
Rerum Sed 8613 - 3669 8358 & 6699<br> 
<br> 
1.mE (magNA) QUO Ad Nominum Statum Massa<br> 
ab SEM Autem Reddet Habitu Sit<br> 
<br></strong> <strong> PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM</strong> <strong><br></strong> <strong><br></strong> <strong>Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy!</strong><strong><br></strong><strong>               ad Quisque Modeste</strong><strong>               ac Rem Wisi</strong><strong>               ex Hac Congue mus Leo</strong><strong>               ab 7/92" Alias</strong><strong>               ad 2/73" Adverso & Erat</strong><strong>               me Personom Eget</strong><strong>               ad Viribus Fuga Fuga</strong><strong>               ab Louor-Sit Molles</strong><strong class="c2">               3x Block-Off Plates</strong><strong class="c2">               ad Facunda</strong><strong class="c2">               ab Personas Diam<br> 
NUNC<br> 
ex Teniet te Palmam Eaque<br> 
me Teniet in Versus Urna<br></strong> <strong><br></strong><br> 
<strong class="c3">**CONDEMNENDUS REM CUM MAGNORUM**</strong><strong></strong><br> 
</td> 
</table> 

這是我的PHP腳本設計成解析此

//connect to mysqli 

$results = $mysqli->query("SELECT ID, post_content 
FROM wp_posts' 
WHERE ID = '2409';"); 

while($row = $results->fetch_array()) { 
    $htmlarray2 = preg_split('/<.+?>/', $row['post_content']); 
    $htmlarray = array_values(array_filter(array_map('trim', $htmlarray2))); 
    echo '<pre>'; 
     print_r($htmlarray); 
    echo '</pre>'; 
    . . . 
} 

這產生這樣

的輸出
Array 
(
[0] => Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9 
[1] => Donec Rem 
[2] => Animam Urgebat 
[3] => Rerum Sed 8613 - 3669 8358 & 6699 
[4] => 1.mE (magNA) QUO Ad Nominum Statum Massa 
[5] => ab SEM Autem Reddet Habitu Sit 
[6] => PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM 
[7] => Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune! 
[8] =>               ad Quisque Modeste 
[9] =>               ac Rem Wisi 
[10] =>               ex Hac Congue mus Leo 
[11] =>               ab 7/92" Alias 
[12] =>               ad 2/73" Adverso & Erat 
[13] =>               me Personom Eget 
[14] =>               ad Viribus Fuga Fuga 
[15] =>               ea Totam Poenam 
[16] =>               ab Louor-Sit Molles 
[17] =>               ad Facunda 
[18] =>               ab Personas Diam 
[19] => NUNC 
[20] => ex Teniet te Palmam Eaque 
[21] => me Teniet in Versus Urna 
[22] => **CONDEMNENDUS REM CUM MAGNORUM** 
) 

這沒關係,但現在我遇到了在數組中的字符串前後移除空格的問題。

讓我們舉一個例子爲節點8陣列

. . . 
$arrayvalue = $htmlarray2['8']; 

呼應這樣

             ad Quisque Modeste 

現在,我想要做的是明顯調整每個數組的元素,但爲了測試,我只使用這個變量$arrayvalue

我的問題是trim()不適用於這個MySQL提取的變量。含義加入trim($arrayvalue);沒有影響,並以與上述相同的方式回聲。

我知道這是一件與我取通過我的查詢數組,因爲如果我只是在自己的PHP腳本測試這個變量進行正常

$string = '               ad Quisque Modeste '; 
echo trim($string); 

它工作正常,和回聲輸出只是簡單ad Quisque Modeste在字符串之前或之後都不需要空格。

爲什麼trim()工作在我的while循環中? 從元素中修剪前後空白區有什麼竅門?

編輯:這是我的完整while循環根據要求。這是一個有點不同的,那麼上面的例子(我已經做了很多的修改嘗試這種解決自己,所以它是不斷變化的),但這裏是我現在所擁有的全部:

while($row = $results->fetch_array()) { 
    $id = $row['ID']; 
    echo 'ID: ' . $id; 
    echo '<br />'; 

    //replace &nbsp; with white space 
    $converted = strtr($row['post_content'],array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES))); 
    trim($converted, chr(0xC2).chr(0xA0)); 

    //remove html elements 
    $htmlarray = preg_split('/<.+?>/', $converted); 

    // remove empty array elements and re-index array 
    $htmlarray2 = array_values(array_filter(array_map('trim', $htmlarray))); 

    // test by getting single value from array 
    $arrayvalue = $htmlarray2['9']; 

    // my attempt to trim string in while loop 
    trim($arrayvalue); 

    // doesn't trim 
    echo '<hr>' . $arrayvalue . '<hr>'; 

    // put this here so I can see the full array 
    echo '<pre>'; 
     print_r($htmlarray2); 
    echo '</pre>'; 
} 

按照要求,這裏是var_export($row['post_content']);

'<table class="product-description-table"> 
<tbody> 
<tr> 
<td class="item" colspan="3">Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9</td> 
</tr> 
<tr> 
<td class="title" colspan="3"></td> 
</tr> 
<tr> 
<td class="content"><br> 
<br> 
<p class="c1"></p> 
<p class="c1"></p> 
<strong><br></strong> <strong><br></strong> <strong>Donec Rem&nbsp;</strong><br> 
<br> 
<strong>Animam Urgebat<br> 
<br></strong> <strong><br> 
<br> 
Rerum Sed 8613 - 3669 8358 & 6699<br> 
<br> 
1.mE (magNA) QUO Ad Nominum Statum Massa<br> 
ab SEM Autem Reddet Habitu Sit<br> 
<br></strong> <strong>&nbsp;PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM</strong> <strong><br></strong> <strong><br></strong> <strong>Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy!</strong><strong><br></strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Quisque Modeste</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ac Rem Wisi</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ex Hac Congue mus Leo</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab 7/92" Alias</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad 2/73" Adverso & Erat</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;me Personom Eget</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Viribus Fuga Fuga</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab Louor-Sit Molles</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3x Block-Off Plates</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Facunda</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab Personas Diam<br> 
NUNC<br> 
ex Teniet te Palmam Eaque<br> 
me Teniet in Versus Urna<br></strong> <strong><br></strong><br> 
<strong class="c3">**CONDEMNENDUS REM CUM MAGNORUM**</strong><strong>&nbsp;</strong><br></td> 
<td class="product-content-border"></td> 
</tr> 
<tr> 
<td class="gallery" colspan="3"> 
<table> 
<tbody> 
<tr> 
<td></td> 
<td></td> 
</tr> 
<tr> 
<td></td> 
<td></td> 
</tr> 
<tr> 
<td></td> 
<td></td> 
</tr> 
<tr> 
<td></td> 
<td></td> 
</tr> 
<tr> 
<td></td> 
<td></td> 
</tr> 
<tr> 
<td></td> 
<td></td> 
</tr> 
<tr> 
<td></td> 
<td></td> 
</tr> 
<tr> 
<td></td> 
<td></td> 
</tr> 
</tbody> 
</table> 
</td> 
</tr> 
<tr> 
<td></td> 
</tr> 
<tr> 
<td class="spacer" colspan="3"></td> 
</tr> 
<tr> 
<td class="product-content-border"></td> 
</tr> 
</tbody> 
</table> 
<br> 
<br> 
<br> 
<p class="c4"></p>' 

最後編輯:)結果:

下面貼的解決方案。不會接受我自己的答案。

如果任何人都熟悉的正則表達式可以幫助解釋這一切背後的苦難,爲什麼這個表達式公式:/[\s]+/mu或者說$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);固定的這個問題,我會很樂意接受這是一個合適的回答和解釋。

+0

這是什麼array_values和array_filter在那裏做?如果你只使用地圖,它會工作嗎? Obligatory:http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454 – mkaatman

+1

https://3v4l.org/PMdrH ?? – hassan

+0

我對這個不工作的位有點困惑 - '$ htmlarray2'將保留白色空間的字符串(以及一些空白字符串),'$ htmlarray'將會有字符串而沒有白色空間。你提到一個無效的循環,但你還沒有發佈一個。 – iainn

回答

1

下面是對解決您的問題的正則表達式您要求的解釋:

/[\s]+/說:「找一個或多個空白字符(包括: ‘’,‘\ r’,‘\ n’ 「\ t」,「\ F」,「\ v」)。因爲你是不是在你的模式是通過錨(^$)的multi-line改性劑/標誌是沒有必要的。該unicode改性劑/標誌絕對臨界指數的在你的情況,因爲你的字符串的HTML文本包含許多小惡魔叫...

「NO-BREAK SPACE」是一個Unicode字符組合194160(以\x{00A0}表示),請參閱here

沒有u標誌,NO-BREAK SPACE字符仍然存在,並且需要額外的過濾來將其刪除。


雖然你最終得到了你的代碼到正確的輸出。我很樂意提供一個更精簡的單步模式,使您更快地使用preg_split()

while($row=$results->fetch_array()){ 
    $texts=preg_split('/\s*<[^>]+>\s*/u',$row['post_content'],null,PREG_SPLIT_NO_EMPTY); 
    var_export($texts); 
} 

這裏是一個工作demo

這個新的分支圖案看起來還是你的標籤,但它更有效,因爲<>之間,我只是要求,以匹配「不是>」使用[^>]+所有字符。這對引擎來說要簡單得多,而不是要求從.所代表的長字符列表中進行匹配。

此外,我包括匹配你的unicode擴展空白字符。 \s*將在每個標籤之後的AND之前匹配零個或多個空白字符。

最後,我應該在preg_split()上解釋附加參數。 null表示「查找無限匹配」 - 這是默認行爲,但我必須使用null-1作爲其值,以確保使用最終參數。 PREG_SPLIT_NO_EMPTY備件您必須採取額外的步驟,以後再使用array_filter()。它省略了分割中產生的任何空元素,所以你只能得到好東西。

我希望你發現這有幫助/教育。祝你的項目好運。

+0

不,它是完美的。剛醒來就讀了這個:)非常簡潔和有用的答案,我從中得到了很多!思考我應該更深入地研究正則表達式,因爲我可以看到它是處理大量數據的非常強大的工具 – bbruman

0

修剪不起作用。你需要這個:

$arrayvalue = trim($arrayvalue); 

就是這樣。修剪返回修剪後的字符串:它不會修改該變量。

+0

所以我做了'$ trim = trim($ arrayvalue);'後面跟着'echo'


'。 $ trim。'
';'然後'var_dump($ trim);'...仍然返回帶有所有我不想要的多餘空格的$ trim。就像我已經說過這與正常字符串一起工作..但不在我的while循環中工作...... – bbruman

+0

這是我的完整的PHP腳本,如果它有幫助... https://pastebin.com/6cA4Y7SQ – bbruman

+0

肯定有一些陌生感正在發生。對於初學者來說,用'strtr'線可與(我認爲)來代替: '$轉換= html_entity_decode($行[ 'POST_CONTENT'],ENT_QUOTES)' 此外,該行沒有做任何事情: ($轉換,chr(0xC2).chr(0xA0));' 您可能是想說: '$ converted = trim($ converted,chr(0xC2).chr(0xA0));'' 如果$轉換在開始/結尾處有一些非空白的非空格字符,則trim(第36行)將不起作用。事實上,你正在嘗試(但未能)刪除一些不可思議的非空格字符,這讓我認爲這是你的下一個問題。 –

0

我找到了解決方案。

不完全確定它是如何工作的。我對正則表達式很陌生。

但是,我找到了解決辦法(也許有人能解釋一下嗎?)是

$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray); 

奏效整個腳本(不包括MySQL的東西)是

$converted = html_entity_decode($row['post_content'], ENT_QUOTES); 
$converted = trim($converted, chr(0xC2).chr(0xA0)); 

$htmlarray = preg_split('/<.+?>/', $converted); 

$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray); 

$htmlarray2 = array_filter(array_map('trim', $clean_htmlarray)); 

$clean_htmlarray2 = array_values($htmlarray2); 

echo '<pre>'; 
print_r($clean_htmlarray2); 
echo '</pre>'; 

輸出是

Array 
(
    [0] => Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9 
    [1] => Description 
    [2] => Donec Rem 
    [3] => Animam Urgebat 
    [4] => Rerum Sed 8613 - 3669 8358 & 6699 
    [5] => 1.mE (magNA) QUO Ad Nominum Statum Massa 
    [6] => ab SEM Autem Reddet Habitu Sit 
    [7] => PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM 
    [8] => Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy! 
    [9] => ad Quisque Modeste 
    [10] => ac Rem Wisi 
    [11] => ex Hac Congue mus Leo 
    [12] => ab 7/92" Alias 
    [13] => ad 2/73" Adverso & Erat 
    [14] => me Personom Eget 
    [15] => ad Viribus Fuga Fuga 
    [16] => ab Louor-Sit Molles 
    [17] => 3x Block-Off Plates 
    [18] => ad Facunda 
    [19] => ab Personas Diam 
    [20] => NUNC 
    [21] => ex Teniet te Palmam Eaque 
    [22] => me Teniet in Versus Urna 
    [23] => **CONDEMNENDUS REM CUM MAGNORUM** 
) 

完全修剪陣列。

這也適用於我的while循環對所有行,即:

$results = $mysqli->query("SELECT ID, post_content 
FROM wp_posts' 
LIMIT 50;"); 

在這種情況下,我得到的所有50列具有完全修剪字符串。

所以最後...這是一個挑戰弄清楚!

我只是希望我能更多地理解它。我真的不覺得我應該被確認爲這個問題的答案,因爲我真正所做的就是嘗試一堆不同的東西,最後這工作。

如果有人想插話,並解釋爲什麼$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);或者說/[\s]+/mu是我需要在這種情況下什麼,我會很樂意獎勵的答案給他們:)

至於現在只是很高興,它的正常工作。感謝所有人的幫助和輸入!

+1

您的正則表達式只是用空格替換所有連續的空白字符。因此,例如,如果它找到5個連續的空格字符,它將用一個空格替換它們。 「空間特性」的正則表達式定義可能很廣泛。它包括標籤,換行符等內容。因此,一個空格後跟一個標籤,然後是一個新行,然後一個空格也會被一個空格替換。通常換行符會被有效忽略,但preg_replace的'm'標誌改變了行爲。關於preg_replace的PHP文檔有更多關於這方面的細節。 –

+1

爲什麼它可能很重要的最好的最好的是'你'標記到你的preg_replace。 'u'啓用了一個unicode模式,該模式可能對空白字符的定義更加自由。沒有任何參數,修剪功能將取代少量的字符。如果你在unicode中有一些非標準的空格字符,trim會忽略它們並且不修改你的字符串。但是,帶有'u'標誌的preg_replace可能會將它們轉換爲常規空間,然後可以刪除該修剪。如果你把'u'標誌拿出來,並停止工作,那可能是發生了什麼事情。 –