2014-04-23 84 views
1

我有一個WordPress的MySQL數據庫我試圖從perl的DBD :: mysql中提取一些數據。Perl DBD :: mysql從longtext字段中剔除單詞

如果我這樣做,在命令行:

mysql --raw mydb <<EOF 

select post_content from wp_posts where ID = 195; 
EOF 

我得到了我期待......這裏是前兩句:

I guess someone famous enough to have <a 
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own 
Wikipedia page</a> is worth anyone's consideration.  I'm not familiar 
with AIMAA, but they appear to have quite a few affiliated school 
(particularly in the UK). 

但如果我這樣做在Perl:

$dsn = "DBI:mysql:database=$dbname"; 
$dbh = DBI->connect($dsn, $dbuser, $dbpass); 

$sql_page_list = $dbh->prepare (" 
    SELECT post_title, post_content 
    FROM wp_posts 
    WHERE post_status = 'publish' 
    AND post_type = 'page' 
    ORDER BY post_title 
"); 
$sql_page_list->execute(); 
while ($prog_row = $sql_page_list->fetchrow_hashref) { 
    print $prog_row->{post_content} . "\n"; 
... 

我得到:

I guess someone famous enough to have <a 
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own 
Wikipedia page</a>worth anyone's consideration. not familiar with 
AIMAA, but they appear to have quite a few affiliated school 
(particularly in the UK). 

這裏是所缺單詞相同的文字註明:

I guess someone famous enough to have <a 
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own 
Wikipedia page</a> **is** worth anyone's consideration. **I'm** not 
familiar with AIMAA, but they appear to have quite a few affiliated 
school (particularly in the UK). 

任何想法可能會導致什麼? post_content是長文本。 table_collat​​ion是utf8_general_ci。

這種模式貫穿整個文本 - 單詞缺失。它發生在所有的帖子上。

+0

您執行這兩個查詢是不同的。我並不是說這就是問題所在,只是如果您對兩者使用完全相同的查詢,它可以使故障排除更容易。 – ThisSuitIsBlackNot

回答

1

原來有一些八進制240s嵌入。這使輸出亂碼。我做了一個od -c並看到它們。

刪除它們很容易:

$content =~ s/\xa0/ /g;