2011-09-28 194 views
2

我有一個python文件,它在ms sql中創建並填充表格。唯一的問題在於,如果有非ASCII字符或單撇號(每個字符有很多),代碼就會中斷。雖然我可以運行替換函數來消除撇號字符串,但我更願意保持它們的完整。我也嘗試將數據轉換爲utf-8,但也沒有運氣。Python非ASCII字符

下面是個錯誤信息,我得到:

"'ascii' codec can't encode character u'\2013' in position..." (for non-ascii characters) 

和單引號

class 'pyodbc.ProgrammingError'>: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server] Incorrect syntax near 'S, 230 X 90M.; Eligibilty.... 

當我嘗試在UTF-8編碼字符串,我反而得到以下錯誤信息:

<type 'exceptions.UnicodeDecodeError'>: ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128) 

python代碼包含在下面。我相信代碼中發生此中斷的點在以下行之後:InsertValue = str(row.GetValue(CurrentField ['Name']))。

# -*- coding: utf-8 -*- 

import pyodbc 
import sys 
import arcpy 
import arcgisscripting 

gp = arcgisscripting.create(9.3) 
SQL_KEYWORDS = ['PERCENT', 'SELECT', 'INSERT', 'DROP', 'TABLE'] 

#SourceFGDB = '###' 
#SourceTable = '###' 
SourceTable = sys.argv[1] 
TempInputName = sys.argv[2] 
SourceTable2 = sys.argv[3] 
#--------------------------------------------------------------------------------------------------------------------- 
# Target Database Settings 
#--------------------------------------------------------------------------------------------------------------------- 
TargetDatabaseDriver = "{SQL Server}" 
TargetDatabaseServer = "###" 
TargetDatabaseName = "###" 
TargetDatabaseUser = "###" 
TargetDatabasePassword = "###" 

# Get schema from FGDB table. 
# This should be an ordered list of dictionary elements [{'FGDB_Name', 'FGDB_Alias', 'FGDB_Type', FGDB_Width, FGDB_Precision, FGDB_Scale}, {}] 

if not gp.Exists(SourceTable): 
    print ('- The source does not exist.') 
    sys.exit(102) 
#### Should see if it is actually a table type. Could be a Feature Data Set or something... 
print('  - Processing Items From : ' + SourceTable) 
FieldList = [] 
Field_List = gp.ListFields(SourceTable) 
print('   - Getting number of rows.') 
result = gp.GetCount_management(SourceTable) 
Number_of_Features = gp.GetCount_management(SourceTable) 
print('    - Number of Rows: ' + str(Number_of_Features)) 
print('   - Getting fields.') 
Field_List1 = gp.ListFields(SourceTable, 'Layer') 
Field_List2 = gp.ListFields(SourceTable, 'Comments') 
Field_List3 = gp.ListFields(SourceTable, 'Category') 
Field_List4 = gp.ListFields(SourceTable, 'State') 
Field_List5 = gp.ListFields(SourceTable, 'Label') 
Field_List6 = gp.ListFields(SourceTable, 'DateUpdate') 
Field_List7 = gp.ListFields(SourceTable, 'OBJECTID') 
for Current_Field in Field_List1 + Field_List2 + Field_List3 + Field_List4 + Field_List5 + Field_List6 + Field_List7: 
     print('   - Field Found: ' + Current_Field.Name) 
     if Current_Field.AliasName in SQL_KEYWORDS: 
      Target_Name = Current_Field.Name + '_' 
     else: 
      Target_Name = Current_Field.Name 

     print('     - Alias : ' + Current_Field.AliasName) 
     print('     - Type  : ' + Current_Field.Type) 
     print('     - Length : ' + str(Current_Field.Length)) 
     print('     - Scale : ' + str(Current_Field.Scale)) 
     print('     - Precision: ' + str(Current_Field.Precision)) 
     FieldList.append({'Name': Current_Field.Name, 'AliasName': Current_Field.AliasName, 'Type': Current_Field.Type, 'Length': Current_Field.Length, 'Scale': Current_Field.Scale, 'Precision': Current_Field.Precision, 'Unique': 'UNIQUE', 'Target_Name': Target_Name}) 
# Create table in SQL Server based on FGDB table schema. 
cnxn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=###;DATABASE=###;UID=sql_webenvas;PWD=###') 
cursor = cnxn .cursor() 
#### DROP the table first? 
try: 
    DropTableSQL = 'DROP TABLE dbo.' + TempInputName + '_Test;' 
    print DropTableSQL 
    cursor.execute(DropTableSQL) 
    dbconnection.commit() 
except: 
    print('WARNING: Can not drop table - may not exist: ' + TempInputName + '_Test') 
CreateTableSQL = ('CREATE TABLE ' + TempInputName + '_Test ' 
' (Layer varchar(500), Comments varchar(5000), State int, Label varchar(500), DateUpdate DATETIME, Category varchar(50), OBJECTID int)') 
cursor.execute(CreateTableSQL) 
cnxn.commit() 
# Cursor through each row in the FGDB table, get values, and insert into the SQL Server Table. 
# We got Number_of_Features earlier, just use that. 
Number_Processed = 0 
print('  - Processing ' + str(Number_of_Features) + ' features.') 
rows = gp.SearchCursor(SourceTable) 
row = rows.Next() 
while row: 
    if Number_Processed % 10000 == 0: 
     print('   - Processed ' + str(Number_Processed) + ' of ' + str(Number_of_Features)) 
    InsertSQLFields = 'INSERT INTO ' + TempInputName + '_Test (' 
    InsertSQLValues = 'VALUES (' 
    for CurrentField in FieldList: 
     InsertSQLFields = InsertSQLFields + CurrentField['Target_Name'] + ', ' 
     InsertValue = str(row.GetValue(CurrentField['Name'])) 
     if InsertValue in ['None']: 
      InsertValue = 'NULL' 
     # Use an escape quote for the SQL. 
     InsertValue = InsertValue.replace("'","' '") 
     if CurrentField['Type'].upper() in ['STRING', 'CHAR', 'TEXT']: 
      if InsertValue == 'NULL': 
       InsertSQLValues = InsertSQLValues + "NULL, " 
      else: 
       InsertSQLValues = InsertSQLValues + "'" + InsertValue + "', " 
     elif CurrentField['Type'].upper() in ['GEOMETRY']: 
      ## We're not handling geometry transfers at this time. 
      if InsertValue == 'NULL': 
       InsertSQLValues = InsertSQLValues + '0' + ', ' 
      else: 
       InsertSQLValues = InsertSQLValues + '1' + ', ' 
     else: 
      InsertSQLValues = InsertSQLValues + InsertValue + ', ' 
    InsertSQLFields = InsertSQLFields[:-2] + ')' 
    InsertSQLValues = InsertSQLValues[:-2] + ')' 
    InsertSQL = InsertSQLFields + ' ' + InsertSQLValues 
    ## print InsertSQL 
    cursor.execute(InsertSQL) 
    cnxn.commit() 
    Number_Processed = Number_Processed + 1 
    row = rows.Next() 
print('   - Processed all ' + str(Number_Processed)) 
del row 
del rows 
+0

它是如何突破的?哪裏? – Dave

+0

它通常會在此處中斷:InsertValue = str(row.GetValue(CurrentField ['Name']))。它會填充它創建的sql表,直到找到一個非ascii字符或一個單撇號,然後它會在那裏出錯。 –

+0

和你有什麼例外,你可以編輯你的問題來添加它嗎? – Dave

回答

1

我會用我的心理調試技能,說你試圖str() IFY東西,並得到一個錯誤與ASCII編碼解碼器。你真正應該做的是使用UTF-8編碼解碼器,而不是像這樣:

insert_value_uni = unicode(row.GetValue(CurrentField['Name'])) 
InsertValue = insert_value_uni.encode('utf-8') 
+0

當我嘗試使用utf-8編碼時遇到了另一個錯誤。 'ascii'編碼解碼器無法解碼位置219中的字節0xe2:序號不在範圍內(128) –

+0

@JamesD,您能否將整個回溯置於您的問題中?確保將其縮進來保留格式。 – Dave

+0

會做。謝謝! –

3

詹姆斯,我相信真正的問題在於你沒有使用統一碼。嘗試執行以下操作:

  • 確保您用來填充數據庫的輸入文件是UTF-8,並且您正在使用UTF-8編碼器讀取它。
  • 確保您的數據庫實際上是將數據存儲爲Unicode
  • 當您從文件或數據庫中檢索數據或想要操縱字符串時(例如使用+運算符),您需要確保所有部件都是適當的Unicode。你不能使用str()方法。你需要使用unicode(),正如Dave指出的那樣。如果你在你的代碼中定義了字符串,使用u'my字符串'而不是'我的字符串'(否則它不會被認爲是unicode)。

此外,請提供完整的堆棧跟蹤和例外名稱。

0

一般而言,您希望將數據輸入轉換爲unicode,並將其轉換爲輸出中所需的編碼。

因此,如果您這樣做會更容易找到您的問題。這意味着將所有字符串更改爲unicode,'INSERT INTO'更改爲u'INSERT INTO'。 (注意字符串前面的「u」) 然後當你發送要執行的字符串轉換爲所需的編碼「utf8」。

cursor.execute(InsertSQL.encode("utf8")) # Where InsertSQL is unicode 

此外,您應該將編碼字符串添加到您的源代碼的頂部。 這意味着增加了編碼的cookie文件的前兩行之一:

 #!/usr/bin/python 
    # -*- coding: <encoding name> -*- 

如果從一個文件中建立自己的字符串您可以將數據提取使用codecs.open自動從一個特定的編碼爲Unicode的轉換加載。

0

當我將我的str()轉換爲unicode時,解決了這個問題。一個簡單的答案,我感謝大家在這方面的幫助。