我和我的朋友正在開發一個使用Beautifulsoup 4
解析網站的python刮板。我們過濾部分頁面,並從python腳本中「打印」這個輸出。從unicode數據顯示正確的PHP JSON輸出
事實上,它是由PHP
執行的。但是,我們很難弄清楚經典的編碼問題。默認情況下,Beautifulsoup
返回Unicode數據。這就是我們轉發給PHP
腳本的原因。
我們現在要做的是解析輸出並將其編碼爲有效的JSON
。在這個過程中,我們不希望在輸出中包含unicode代表,但它們的等效代碼爲utf-8
。從PHP腳本輸出的
零部件看起來是這樣的:
["{"," \"course_count_grade\": 24,"," \"course_count_pass\": 3,"," \"course_count_pending\": 5,"," \"course_count_total\": 32,"," \"course_credits_grade\": 0.0,"," \"course_credits_pass\": 0.0,"," \"course_list_grade\": ["," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2571\","," \"course_name_sv\": \"Framtidens medier\","," \"credits\": \"\","," \"credits_registered\": \"10.0\","," \"date\": \"2013-12-27\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"1.5\","," \"date\": \"2013-12-20\","," \"detail_id\": \"\\u00a0LABA\","," \"detail_name_sv\": \"Laborationer\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2572\","," \"course_name_sv\": \"Teori och metod f\\u00f6r Medieteknik\","," \"credits\": \"\","," \"credits_registered\": \"7.5\","," \"date\": \"2013-12-20\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"7.0\","," \"date\": \"2013-12-27\","," \"detail_id\": \"\\u00a0PRO1\","," \"detail_name_sv\": \"Projekt\","," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"3.0\","," \"date\": \"2013-12-27\","," \"detail_id\": \"\\u00a0LIT1\","," \"detail_name_sv\": \"Litteraturuppgift\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"B\""," },"," {"," \"comment\": \"\\u00a0\",
我試圖像JSON_UNESCAPED_UNICODE
的PHP
json_encode()
功能不同的選擇,但無濟於事。
任何提示我可能會做錯什麼?
更新: @Len_D,是的,我在我的執行Python腳本是這樣的: exec($command, $output);
後來我把搞出來並將其返回。當我嘗試按照你的建議去做:utf8_decode($output);
我得到一個錯誤,說「utf8_decode()期望參數1是字符串,數組給出」。那麼我想這:utf8_decode(json_encode($output));
這給了我一個輸出,但它和以前一樣:
["{"," \"course_count_grade\": 24,"," \"course_count_pass\": 3,"," \"course_count_pending\": 5,"," \"course_count_total\": 32,"," \"course_credits_grade\": 0.0,"," \"course_credits_pass\": 0.0,"," \"course_list_grade\": ["," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2571\","," \"course_name_sv\": \"Framtidens medier\","," \"credits\": \"\","," \"credits_registered\": \"10.0\","," \"date\": \"2013-12-27\","," \"details\": ["," {"," \"comment\": \"\\u00a0\","," \"credits\": \"\","," \"credits_registered\": \"1.5\","," \"date\": \"2013-12-20\","," \"detail_id\": \"\\u00a0LABA\","," \"detail_name_sv\": \"Laborationer\","," \"grade\": \"P\""," }"," ],"," \"grade\": \"A\""," },"," {"," \"comment\": \"\\u00a0\","," \"course_id\": \"DM2572\","," \"course_name_sv\": \"Teori och metod f\\u00f6r Medieteknik\",
試過了,沒有什麼區別。 – Marty