2016-08-17 39 views
0

我試圖運行以下腳本,它掃描*.csproj文件並檢查Visual Studio解決方案中的項目依賴關係,但出現以下錯誤。我已經試過各種codecencode/decodeu''組合,都無濟於事......`str.format()`中的Unicode錯誤

(變音符號打算,我打算讓他們)。

Traceback (most recent call last): 
    File "E:\00 GIT\SolutionDependencies.py", line 44, in <module> 
    references = GetProjectReferences("MiotecGit") 
    File "E:\00 GIT\SolutionDependencies.py", line 40, in GetProjectReferences 
    outputline = u'"{}" -> "{}"'.format(projectName, referenceName) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 19: ordinal not in range(128) 
import glob 
import os 
import fnmatch 
import re 
import subprocess 
import codecs 

gvtemplate = """ 
digraph g { 

rankdir = "LR" 

##### 

} 
""".strip() 

def GetProjectFiles(rootFolder): 
    result = [] 
    for root, dirnames, filenames in os.walk(rootFolder): 
     for filename in fnmatch.filter(filenames, "*.csproj"): 
      result.append(os.path.join(root, filename)) 
    return result 

def GetProjectName(path): 
    result = os.path.splitext(os.path.basename(path))[0] 
    return result 

def GetProjectReferences(rootFolder): 
    result = [] 
    projectFiles = GetProjectFiles(rootFolder) 
    for projectFile in projectFiles: 
     projectName = GetProjectName(projectFile) 
     with codecs.open(projectFile, 'r', "utf-8") as pfile: 
      content = pfile.read() 
      references = re.findall("<ProjectReference.*?</ProjectReference>", content, re.DOTALL) 
      for reference in references: 
       referenceProject = re.search('"([^"]*?)"', reference).group(1) 
       referenceName = GetProjectName(referenceProject) 
       outputline = u'"{}" -> "{}"'.format(projectName, referenceName) 
       result.append(outputline) 
    return result 

references = GetProjectReferences("MiotecGit") 

output = u"\n".join(*references) 

with codecs.open("output.gv", "w", 'utf-8') as outputfile: 
    outputfile.write(gvtemplate.replace("#####", output)) 


graphvizpath = glob.glob(r"C:\Program Files*\Graphviz*\bin\dot.*")[0] 
command = '{} -Gcharset=latin1 -T pdf -o "output.pdf" "output.gv"'.format(graphvizpath) 
subprocess.call(command) 
+2

不能用正則表達式解析XML文件。使用xml解析器(如'ElementTree')。 – Daniel

+0

請用全小寫名稱命名您的函數,以便語法突出顯示不會將它們格式化爲類名稱。 –

+0

@Daniel我不解析,我在尋找。但我明白了,謝謝你的建議!如果我真的最終使用腳本,就像我想的那樣,這將是值得的額外工作。 – heltonbiker

回答

1

當Python 2.x的嘗試在Unicode上下文中使用一個字節的字符串時,它自動嘗試將字節串decode使用ascii編解碼器Unicode字符串。雖然ascii編解碼器是安全的選擇,但它通常不起作用。

對於Windows環境,mbcs編解碼器將選擇Windows用於8位字符的代碼頁。你可以明確地解碼字符串。

outputline = u'"{}" -> "{}"'.format(projectName.decode('mbcs'), referenceName.decode('mbcs')) 
+0

謝謝,但現在我得到了錯誤:「文件」C:\ Python27 \ lib \編碼\ mbcs.py「,第21行,在解碼 返回mbcs_decode(輸入,錯誤,True) UnicodeEncodeError:'ascii'編解碼器可以在位置19中編碼字符u'\ xed':序號不在範圍內(128)' – heltonbiker

+0

@heltonbiker可能意味着其中一個字符串已經是Unicode。從該字符串中刪除'.decode'。 –

+0

你是對的,其中一個是,但另一個不是。再次感謝! – heltonbiker