`str.format（）`中的Unicode錯誤

我試圖運行以下腳本，它掃描*.csproj文件並檢查Visual Studio解決方案中的項目依賴關係，但出現以下錯誤。我已經試過各種codec和encode/decode和u''組合，都無濟於事......`str.format（）`中的Unicode錯誤

（變音符號被打算，我打算讓他們）。

Traceback (most recent call last): 
    File "E:\00 GIT\SolutionDependencies.py", line 44, in <module> 
    references = GetProjectReferences("MiotecGit") 
    File "E:\00 GIT\SolutionDependencies.py", line 40, in GetProjectReferences 
    outputline = u'"{}" -> "{}"'.format(projectName, referenceName) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 19: ordinal not in range(128)

import glob 
import os 
import fnmatch 
import re 
import subprocess 
import codecs 

gvtemplate = """ 
digraph g { 

rankdir = "LR" 

##### 

} 
""".strip() 

def GetProjectFiles(rootFolder): 
    result = [] 
    for root, dirnames, filenames in os.walk(rootFolder): 
     for filename in fnmatch.filter(filenames, "*.csproj"): 
      result.append(os.path.join(root, filename)) 
    return result 

def GetProjectName(path): 
    result = os.path.splitext(os.path.basename(path))[0] 
    return result 

def GetProjectReferences(rootFolder): 
    result = [] 
    projectFiles = GetProjectFiles(rootFolder) 
    for projectFile in projectFiles: 
     projectName = GetProjectName(projectFile) 
     with codecs.open(projectFile, 'r', "utf-8") as pfile: 
      content = pfile.read() 
      references = re.findall("<ProjectReference.*?</ProjectReference>", content, re.DOTALL) 
      for reference in references: 
       referenceProject = re.search('"([^"]*?)"', reference).group(1) 
       referenceName = GetProjectName(referenceProject) 
       outputline = u'"{}" -> "{}"'.format(projectName, referenceName) 
       result.append(outputline) 
    return result 

references = GetProjectReferences("MiotecGit") 

output = u"\n".join(*references) 

with codecs.open("output.gv", "w", 'utf-8') as outputfile: 
    outputfile.write(gvtemplate.replace("#####", output)) 


graphvizpath = glob.glob(r"C:\Program Files*\Graphviz*\bin\dot.*")[0] 
command = '{} -Gcharset=latin1 -T pdf -o "output.pdf" "output.gv"'.format(graphvizpath) 
subprocess.call(command)

來源

2016-08-17 heltonbiker

不能用正則表達式解析XML文件。使用xml解析器（如'ElementTree'）。 – Daniel

請用全小寫名稱命名您的函數，以便語法突出顯示不會將它們格式化爲類名稱。 –

@Daniel我不解析，我在尋找。但我明白了，謝謝你的建議！如果我真的最終使用腳本，就像我想的那樣，這將是值得的額外工作。 – heltonbiker

當Python 2.x的嘗試在Unicode上下文中使用一個字節的字符串時，它自動嘗試將字節串decode使用ascii編解碼器Unicode字符串。雖然ascii編解碼器是安全的選擇，但它通常不起作用。

對於Windows環境，mbcs編解碼器將選擇Windows用於8位字符的代碼頁。你可以明確地解碼字符串。

outputline = u'"{}" -> "{}"'.format(projectName.decode('mbcs'), referenceName.decode('mbcs'))

來源

2016-08-17 21:45:25

謝謝，但現在我得到了錯誤：「文件」C：\ Python27 \ lib \編碼\ mbcs.py「，第21行，在解碼返回mbcs_decode（輸入，錯誤，True） UnicodeEncodeError：'ascii'編解碼器可以在位置19中編碼字符u'\ xed'：序號不在範圍內（128）' – heltonbiker

@heltonbiker可能意味着其中一個字符串已經是Unicode。從該字符串中刪除'.decode'。 –

你是對的，其中一個是，但另一個不是。再次感謝！ – heltonbiker

`str.format（）`中的Unicode錯誤

回答

相關問題