Python非ASCII字符

我有一個python文件，它在ms sql中創建並填充表格。唯一的問題在於，如果有非ASCII字符或單撇號（每個字符有很多），代碼就會中斷。雖然我可以運行替換函數來消除撇號字符串，但我更願意保持它們的完整。我也嘗試將數據轉換爲utf-8，但也沒有運氣。Python非ASCII字符

下面是個錯誤信息，我得到：

"'ascii' codec can't encode character u'\2013' in position..." (for non-ascii characters)

和單引號

class 'pyodbc.ProgrammingError'>: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server] Incorrect syntax near 'S, 230 X 90M.; Eligibilty....

當我嘗試在UTF-8編碼字符串，我反而得到以下錯誤信息：

<type 'exceptions.UnicodeDecodeError'>: ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128)

python代碼包含在下面。我相信代碼中發生此中斷的點在以下行之後：InsertValue = str（row.GetValue（CurrentField ['Name']））。

# -*- coding: utf-8 -*- 

import pyodbc 
import sys 
import arcpy 
import arcgisscripting 

gp = arcgisscripting.create(9.3) 
SQL_KEYWORDS = ['PERCENT', 'SELECT', 'INSERT', 'DROP', 'TABLE'] 

#SourceFGDB = '###' 
#SourceTable = '###' 
SourceTable = sys.argv[1] 
TempInputName = sys.argv[2] 
SourceTable2 = sys.argv[3] 
#--------------------------------------------------------------------------------------------------------------------- 
# Target Database Settings 
#--------------------------------------------------------------------------------------------------------------------- 
TargetDatabaseDriver = "{SQL Server}" 
TargetDatabaseServer = "###" 
TargetDatabaseName = "###" 
TargetDatabaseUser = "###" 
TargetDatabasePassword = "###" 

# Get schema from FGDB table. 
# This should be an ordered list of dictionary elements [{'FGDB_Name', 'FGDB_Alias', 'FGDB_Type', FGDB_Width, FGDB_Precision, FGDB_Scale}, {}] 

if not gp.Exists(SourceTable): 
    print ('- The source does not exist.') 
    sys.exit(102) 
#### Should see if it is actually a table type. Could be a Feature Data Set or something... 
print('  - Processing Items From : ' + SourceTable) 
FieldList = [] 
Field_List = gp.ListFields(SourceTable) 
print('   - Getting number of rows.') 
result = gp.GetCount_management(SourceTable) 
Number_of_Features = gp.GetCount_management(SourceTable) 
print('    - Number of Rows: ' + str(Number_of_Features)) 
print('   - Getting fields.') 
Field_List1 = gp.ListFields(SourceTable, 'Layer') 
Field_List2 = gp.ListFields(SourceTable, 'Comments') 
Field_List3 = gp.ListFields(SourceTable, 'Category') 
Field_List4 = gp.ListFields(SourceTable, 'State') 
Field_List5 = gp.ListFields(SourceTable, 'Label') 
Field_List6 = gp.ListFields(SourceTable, 'DateUpdate') 
Field_List7 = gp.ListFields(SourceTable, 'OBJECTID') 
for Current_Field in Field_List1 + Field_List2 + Field_List3 + Field_List4 + Field_List5 + Field_List6 + Field_List7: 
     print('   - Field Found: ' + Current_Field.Name) 
     if Current_Field.AliasName in SQL_KEYWORDS: 
      Target_Name = Current_Field.Name + '_' 
     else: 
      Target_Name = Current_Field.Name 

     print('     - Alias : ' + Current_Field.AliasName) 
     print('     - Type  : ' + Current_Field.Type) 
     print('     - Length : ' + str(Current_Field.Length)) 
     print('     - Scale : ' + str(Current_Field.Scale)) 
     print('     - Precision: ' + str(Current_Field.Precision)) 
     FieldList.append({'Name': Current_Field.Name, 'AliasName': Current_Field.AliasName, 'Type': Current_Field.Type, 'Length': Current_Field.Length, 'Scale': Current_Field.Scale, 'Precision': Current_Field.Precision, 'Unique': 'UNIQUE', 'Target_Name': Target_Name}) 
# Create table in SQL Server based on FGDB table schema. 
cnxn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=###;DATABASE=###;UID=sql_webenvas;PWD=###') 
cursor = cnxn .cursor() 
#### DROP the table first? 
try: 
    DropTableSQL = 'DROP TABLE dbo.' + TempInputName + '_Test;' 
    print DropTableSQL 
    cursor.execute(DropTableSQL) 
    dbconnection.commit() 
except: 
    print('WARNING: Can not drop table - may not exist: ' + TempInputName + '_Test') 
CreateTableSQL = ('CREATE TABLE ' + TempInputName + '_Test ' 
' (Layer varchar(500), Comments varchar(5000), State int, Label varchar(500), DateUpdate DATETIME, Category varchar(50), OBJECTID int)') 
cursor.execute(CreateTableSQL) 
cnxn.commit() 
# Cursor through each row in the FGDB table, get values, and insert into the SQL Server Table. 
# We got Number_of_Features earlier, just use that. 
Number_Processed = 0 
print('  - Processing ' + str(Number_of_Features) + ' features.') 
rows = gp.SearchCursor(SourceTable) 
row = rows.Next() 
while row: 
    if Number_Processed % 10000 == 0: 
     print('   - Processed ' + str(Number_Processed) + ' of ' + str(Number_of_Features)) 
    InsertSQLFields = 'INSERT INTO ' + TempInputName + '_Test (' 
    InsertSQLValues = 'VALUES (' 
    for CurrentField in FieldList: 
     InsertSQLFields = InsertSQLFields + CurrentField['Target_Name'] + ', ' 
     InsertValue = str(row.GetValue(CurrentField['Name'])) 
     if InsertValue in ['None']: 
      InsertValue = 'NULL' 
     # Use an escape quote for the SQL. 
     InsertValue = InsertValue.replace("'","' '") 
     if CurrentField['Type'].upper() in ['STRING', 'CHAR', 'TEXT']: 
      if InsertValue == 'NULL': 
       InsertSQLValues = InsertSQLValues + "NULL, " 
      else: 
       InsertSQLValues = InsertSQLValues + "'" + InsertValue + "', " 
     elif CurrentField['Type'].upper() in ['GEOMETRY']: 
      ## We're not handling geometry transfers at this time. 
      if InsertValue == 'NULL': 
       InsertSQLValues = InsertSQLValues + '0' + ', ' 
      else: 
       InsertSQLValues = InsertSQLValues + '1' + ', ' 
     else: 
      InsertSQLValues = InsertSQLValues + InsertValue + ', ' 
    InsertSQLFields = InsertSQLFields[:-2] + ')' 
    InsertSQLValues = InsertSQLValues[:-2] + ')' 
    InsertSQL = InsertSQLFields + ' ' + InsertSQLValues 
    ## print InsertSQL 
    cursor.execute(InsertSQL) 
    cnxn.commit() 
    Number_Processed = Number_Processed + 1 
    row = rows.Next() 
print('   - Processed all ' + str(Number_Processed)) 
del row 
del rows

來源

2011-09-28 James D.

它是如何突破的？哪裏？ – Dave

它通常會在此處中斷：InsertValue = str（row.GetValue（CurrentField ['Name']））。它會填充它創建的sql表，直到找到一個非ascii字符或一個單撇號，然後它會在那裏出錯。 –

和你有什麼例外，你可以編輯你的問題來添加它嗎？ – Dave

我會用我的心理調試技能，說你試圖str() IFY東西，並得到一個錯誤與ASCII編碼解碼器。你真正應該做的是使用UTF-8編碼解碼器，而不是像這樣：

insert_value_uni = unicode(row.GetValue(CurrentField['Name'])) 
InsertValue = insert_value_uni.encode('utf-8')

來源

2011-09-28 21:56:28 Dave

當我嘗試使用utf-8編碼時遇到了另一個錯誤。 'ascii'編碼解碼器無法解碼位置219中的字節0xe2：序號不在範圍內（128） –

@JamesD，您能否將整個回溯置於您的問題中？確保將其縮進來保留格式。 – Dave

會做。謝謝！ –

或者你可以只ASCII允許查看和使用赫然命名Unicode Hammer

來源

2011-09-28 23:37:03 billinkc

詹姆斯，我相信真正的問題在於你沒有使用統一碼。嘗試執行以下操作：

確保您用來填充數據庫的輸入文件是UTF-8，並且您正在使用UTF-8編碼器讀取它。
確保您的數據庫實際上是將數據存儲爲Unicode
當您從文件或數據庫中檢索數據或想要操縱字符串時（例如使用+運算符），您需要確保所有部件都是適當的Unicode。你不能使用str（）方法。你需要使用unicode（），正如Dave指出的那樣。如果你在你的代碼中定義了字符串，使用u'my字符串'而不是'我的字符串'（否則它不會被認爲是unicode）。

此外，請提供完整的堆棧跟蹤和例外名稱。

來源

2011-09-28 23:44:48 Michael

一般而言，您希望將數據輸入轉換爲unicode，並將其轉換爲輸出中所需的編碼。

因此，如果您這樣做會更容易找到您的問題。這意味着將所有字符串更改爲unicode，'INSERT INTO'更改爲u'INSERT INTO'。（注意字符串前面的「u」）然後當你發送要執行的字符串轉換爲所需的編碼「utf8」。

cursor.execute(InsertSQL.encode("utf8")) # Where InsertSQL is unicode

此外，您應該將編碼字符串添加到您的源代碼的頂部。這意味着增加了編碼的cookie文件的前兩行之一：

 #!/usr/bin/python 
    # -*- coding: <encoding name> -*-

如果從一個文件中建立自己的字符串您可以將數據提取使用codecs.open自動從一個特定的編碼爲Unicode的轉換加載。

來源

2011-09-29 00:26:06 monkut

當我將我的str（）轉換爲unicode時，解決了這個問題。一個簡單的答案，我感謝大家在這方面的幫助。

來源

2011-09-29 16:02:15

Python非ASCII字符

回答

相關問題