2013-08-16 54 views
0

我正在閱讀一個itunes與plistib生成的xml播放列表。 XML有一個utf8頭。當我用plistib讀取xml時,我同時得到了unicode(例如,'Name':你'你還記得')和字節串(例如'Name':'Where Eagles Dare')。Python 2.7.2:與iTunes的plistlib xml

標準建議是儘快解碼您所讀取的正確編碼,並在程序中使用unicode。然而,

unicode_string.decode('utf8') 

失敗(因爲它應該)與

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 3: ordinal not in range(128) 

的解決辦法似乎是:

for name in names: 
    if isinstance(name, str): 
     name = name.decode('utf8') 
    # etc. 

這是處理問題的正確方法是什麼?有沒有更好的辦法?

我在Windows 7

編輯:

XML閱讀:

import plistlib 
xml = plistlb.readPlist(fn) 
for track in xml['Tracks']: 
    info = xml['Tracks'][track] 
    info['Name'] 

主要生產空閒:

u'Don\u2019t You Remember' 
'Where Eagles Dare' 

這裏的xml文件:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> 
<plist version="1.0"> 
<dict> 
    <key>Major Version</key><integer>1</integer> 
    <key>Minor Version</key><integer>1</integer> 
    <key>Date</key><date>2013-08-14T15:04:27Z</date> 
    <key>Application Version</key><string>10.6.3</string> 
    <key>Features</key><integer>5</integer> 
    <key>Show Content Ratings</key><true/> 
    <key>Music Folder</key><string>file://localhost/C:/Users/rdp/Music/iTunes/iTunes%20Media/</string> 
    <key>Library Persistent ID</key><string>FE28CCACD9A36C34</string> 
    <key>Tracks</key> 
    <dict> 
     <key>1019</key> 
     <dict> 
      <key>Track ID</key><integer>1019</integer> 
      <key>Name</key><string>Where Eagles Dare</string> 
      <key>Artist</key><string>Iron Maiden</string> 
      <key>Album</key><string>Piece Of Mind</string> 
      <key>Genre</key><string>Rock</string> 
      <key>Kind</key><string>MPEG audio file</string> 
      <key>Size</key><integer>7372755</integer> 
      <key>Total Time</key><integer>370128</integer> 
      <key>Track Number</key><integer>1</integer> 
      <key>Year</key><integer>1983</integer> 
      <key>Date Modified</key><date>2009-10-07T21:11:31Z</date> 
      <key>Date Added</key><date>2008-02-07T16:04:15Z</date> 
      <key>Bit Rate</key><integer>153</integer> 
      <key>Sample Rate</key><integer>44100</integer> 
      <key>Play Count</key><integer>4</integer> 
      <key>Play Date</key><integer>3414416760</integer> 
      <key>Play Date UTC</key><date>2012-03-12T21:06:00Z</date> 
      <key>Artwork Count</key><integer>1</integer> 
      <key>Persistent ID</key><string>FE28CCACD9A383E5</string> 
      <key>Track Type</key><string>File</string> 
      <key>Location</key><string>file://localhost/D:/music/Iron%20Maiden/Piece%20Of%20Mind/01%20Where%20Eagles%20Dare.mp3</string> 
      <key>File Folder Count</key><integer>-1</integer> 
      <key>Library Folder Count</key><integer>-1</integer> 
     </dict> 
     <key>11559</key> 
     <dict> 
      <key>Track ID</key><integer>11559</integer> 
      <key>Name</key><string>Don’t You Remember</string> 
      <key>Artist</key><string>Adele</string> 
      <key>Album</key><string>21</string> 
      <key>Genre</key><string>Pop</string> 
      <key>Kind</key><string>MPEG audio file</string> 
      <key>Size</key><integer>6120028</integer> 
      <key>Total Time</key><integer>229511</integer> 
      <key>Track Number</key><integer>4</integer> 
      <key>Track Count</key><integer>11</integer> 
      <key>Year</key><integer>2011</integer> 
      <key>Date Modified</key><date>2012-11-17T10:50:31Z</date> 
      <key>Date Added</key><date>2012-12-19T16:03:46Z</date> 
      <key>Bit Rate</key><integer>199</integer> 
      <key>Sample Rate</key><integer>44100</integer> 
      <key>Artwork Count</key><integer>1</integer> 
      <key>Persistent ID</key><string>7130C888606FB153</string> 
      <key>Track Type</key><string>File</string> 
      <key>Location</key><string>file://localhost/D:/music/Adele/21/04%20-%20Don%E2%80%99t%20You%20Remember.mp3</string> 
      <key>File Folder Count</key><integer>-1</integer> 
      <key>Library Folder Count</key><integer>-1</integer> 
     </dict> 
    </dict> 
    <key>Playlists</key> 
    <array> 
     <dict> 
      <key>Name</key><string>short</string> 
      <key>Playlist ID</key><integer>30888</integer> 
      <key>Playlist Persistent ID</key><string>166746C6572B0005</string> 
      <key>All Items</key><true/> 
      <key>Playlist Items</key> 
      <array> 
       <dict> 
        <key>Track ID</key><integer>11559</integer> 
       </dict> 
       <dict> 
        <key>Track ID</key><integer>1019</integer> 
       </dict> 
      </array> 
     </dict> 
    </array> 
</dict> 
</plist> 
+0

因爲這個問題,導致這個問題的plist文件是怎麼樣的不應該發生。它應該始終返回相同的類型。 –

+0

我已經包含了xml文件和相關的python代碼。 – foosion

+0

實際上,Python 2中的'plistlib'是相當遙不可及的,古老而古老的,它會嘗試將所有東西都編碼爲ASCII。當它失敗時,它將數據保留爲Unicode。壞,壞模塊! –

回答

1

哇這是一個非常奇怪的行爲。我甚至會說這種不一致的行爲是plistlib的2.X實現中的一個錯誤。 Python 3中的plistlib總是返回更好的unicode字符串。

但你必須忍受它:)所以你的問題的答案是肯定的。從plist

def safe_unicode(s): 
    if isinstance(s, unicode): 
     return s 
    return s.decode('utf-8', errors='replace') 

value = safe_unicode(info['Name']) 

讀取一個字符串時,應始終保護自己,我添加了errors='replace'以防萬一字符串不是utf-8編碼。如果無法解碼,你會得到一堆\ufffd個字符。如果您寧願留下異常,請使用e.decode('utf-8')

更新:

當我試圖用ElementTree的:

from xml.etree import ElementTree as et 
tree = et.parse('test.plist') 
map(lambda x: x.text, tree.findall('dict/dict/dict')[1].findall('string')) 

這給了我:

[u'Don\u2019t You Remember', 
'Adele', 
'21', 
'Pop', 
'MPEG audio file', 
'7130C888606FB153', 
'File', 
'file://localhost/D:/music/Adele/21/04%20-%20Don%E2%80%99t%20You%20Remember.mp3'] 

所以有Unicode和字節串混合: -/

+0

我原以爲plistlib已經存在了足夠長的時間來清理錯誤。好吧。 – foosion

+0

沒有很多人使用它。正如你可以在[hg log](http://hg.python.org/releasing/2.7.4/log/tip/Lib/plistlib.py)中看到的那樣,上次被觸及的時間是2011年: -/ –

+0

你會推薦ElementTree的? – foosion