我有一個Python代碼來索引一個包含阿拉伯文字的文本文件。我測試了英文文本上的代碼，它運行良好，但是當我測試一個阿拉伯文文本時，它給了我一個錯誤。注意：文本文件保存在unicode編碼中，而不是ANSI編碼。「列表索引超出範圍」在python

這是我的代碼：

from whoosh import fields, index 
import os.path 
import csv 
import codecs 
from whoosh.qparser import QueryParser 

# This list associates a name with each position in a row 
columns = ["juza","chapter","verse","voc"] 

schema = fields.Schema(juza=fields.NUMERIC, 
         chapter=fields.NUMERIC, 
         verse=fields.NUMERIC, 
         voc=fields.TEXT) 

# Create the Whoosh index 
indexname = "indexdir" 
if not os.path.exists(indexname): 
    os.mkdir(indexname) 
ix = index.create_in(indexname, schema) 

# Open a writer for the index 
with ix.writer() as writer: 
    with open("h.txt", 'r') as txtfile: 
    lines=txtfile.readlines() 

    # Read each row in the file 
    for i in lines: 

     # Create a dictionary to hold the document values for this row 
     doc = {} 
     thisline=i.split() 
     u=0 

     # Read the values for the row enumerated like 
     # (0, "juza"), (1, "chapter"), etc. 
     for w in thisline: 
     # Get the field name from the "columns" list 
      fieldname = columns[u] 
      u+=1 
      #if isinstance(w, basestring): 
      #  w = unicode(w) 
      doc[fieldname] = w 
     # Pass the dictionary to the add_document method 
     writer.add_document(**doc) 
with ix.searcher() as searcher: 
    query = QueryParser("voc", ix.schema).parse(u"بسم") 
    results = searcher.search(query) 
    print(len(results)) 
    print(results[1])

然後錯誤是：

Traceback (most recent call last): 
    File "C:\Python27\yarab.py", line 38, in <module> 
    fieldname = columns[u] 
IndexError: list index out of range

這是文件的一個樣本：

1 1 1 كتاب 
1 1 2 قرأ 
1 1 3 لعب 
1 1 4 كتاب

來源

2013-02-21 user2091683

你有印刷的'thisline = i.split（）的結果'？它無疑有超過4個項目。 – StoryTeller 2013-02-21 16:10:10

爲此，最好使用python csv模塊。看看這裏[鏈接]（http://docs.python.org/2/library/csv.html） – Crazyshezy 2013-10-23 08:53:12

雖然我不能看到任何明顯的錯誤與此同時，我會確保你是designing for error。確保你捕捉到split（）返回的元素數量超過預期數量並及時處理（例如打印和終止）的任何情況。看起來你可能正在處理格式不正確的數據。

來源

2013-02-21 16:35:02 ellimilial

您錯過了腳本中Unicode的標題。第一行應爲：

編碼：UTF-8

另外打開與unicode編碼中使用的文件：

import codecs 
with codecs.open("s.txt",encoding='utf-8') as txtfile:

來源

2015-08-24 12:58:54 user3099761

「列表索引超出範圍」在python

回答

編碼：UTF-8

相關問題