2014-02-06 48 views
0

我和我的朋友正在開發一個使用Beautifulsoup 4解析網站的python刮板。我們過濾部分頁面,並從python腳本中「打印」這個輸出。從unicode數據顯示正確的PHP JSON輸出

事實上,它是由PHP執行的。但是,我們很難弄清楚經典的編碼問題。默認情況下,Beautifulsoup返回Unicode數據。這就是我們轉發給PHP腳本的原因。

我們現在要做的是解析輸出並將其編碼爲有效的JSON。在這個過程中,我們不希望在輸出中包含unicode代表,但它們的等效代碼爲utf-8。從PHP腳本輸出的

零部件看起來是這樣的:

["{"," \"course_count_grade\": 24,"," \"course_count_pass\": 3,"," \"course_count_pending\": 5,"," \"course_count_total\": 32,"," \"course_credits_grade\": 0.0,"," \"course_credits_pass\": 0.0,"," \"course_list_grade\": ["," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2571\","," \"course_name_sv\": \"Framtidens medier\","," \"credits\": \"\","," \"credits_registered\": \"10.0\","," \"date\": \"2013-12-27\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"1.5\","," \"date\": \"2013-12-20\","," \"detail_id\": \"\\u00a0LABA\","," \"detail_name_sv\": \"Laborationer\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2572\","," \"course_name_sv\": \"Teori och metod f\\u00f6r Medieteknik\","," \"credits\": \"\","," \"credits_registered\": \"7.5\","," \"date\": \"2013-12-20\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"7.0\","," \"date\": \"2013-12-27\","," \"detail_id\": \"\\u00a0PRO1\","," \"detail_name_sv\": \"Projekt\","," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"3.0\","," \"date\": \"2013-12-27\","," \"detail_id\": \"\\u00a0LIT1\","," \"detail_name_sv\": \"Litteraturuppgift\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"B\""," },"," {"," \"comment\": \"\\u00a0\", 

我試圖像JSON_UNESCAPED_UNICODEPHPjson_encode()功能不同的選擇,但無濟於事。

任何提示我可能會做錯什麼?

更新: @Len_D,是的,我在我的執行Python腳本是這樣的: exec($command, $output); 後來我把搞出來並將其返回。當我嘗試按照你的建議去做:utf8_decode($output);我得到一個錯誤,說「utf8_decode()期望參數1是字符串,數組給出」。那麼我想這:utf8_decode(json_encode($output));這給了我一個輸出,但它和以前一樣:

["{"," \"course_count_grade\": 24,"," \"course_count_pass\": 3,"," \"course_count_pending\": 5,"," \"course_count_total\": 32,"," \"course_credits_grade\": 0.0,"," \"course_credits_pass\": 0.0,"," \"course_list_grade\": ["," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2571\","," \"course_name_sv\": \"Framtidens medier\","," \"credits\": \"\","," \"credits_registered\": \"10.0\","," \"date\": \"2013-12-27\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"1.5\","," \"date\": \"2013-12-20\","," \"detail_id\": \"\\u00a0LABA\","," \"detail_name_sv\": \"Laborationer\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2572\","," \"course_name_sv\": \"Teori och metod f\\u00f6r Medieteknik\", 

回答

0

你可以把這些頭header('Content-Type: application/json');

+0

試過了,沒有什麼區別。 – Marty

0
+0

試着更具體一點,我可以如何在這種情況下使用它。這並沒有真正的幫助。 – Marty

+0

我假設你正在輸出並將其存儲在一個php變量中,比如$ output ='...'。如果是,則使用$ output = utf8_decode(...),UTF符號將替換爲ISO-8859-1字符。我是否正確理解你的問題? –

+0

查看我更新的問題。 – Marty