2013-12-09 27 views
1

我剛剛開始使用Python編碼,並且有很多東西需要學習。我的代碼的目標是從單元格拉出一個字符串,檢查其字符長度並用特定的縮寫替換單詞。然後,我將新字符串寫入不同的Excel表格,並在所有數據都減少後保存。我終於想出瞭如何讓它起作用,但這確實需要很長時間。我正在處理10000個以上的字符串,我的循環迭代可能遠沒有優化。如果你有任何信息可以幫助你,那就太好了。使用xlrd/xlwt和循環迭代優化Excel數據收集/減少

import xlwt 
import xlrd 

book = xlrd.open_workbook() # opens excel file for data input 
reduc = xlwt.Workbook()  # creates the workbook that the reduced data will be saved in 

# Calls the sheets I will be working with 
Data = book.sheet_by_index(3) 
Table = book.sheet_by_index(5) 
sheet1 = reduc.add_sheet("sheet 1") 

# the initial loop pulls the string from excel 

for x in xrange(30): # I use a limited range for debugging 
    From = str(Data.col(15)[x].value) 
    To = str(Data.col(16)[x].value) 
    print x # I just print this to let me know that i'm not stuck 

    if len(From) <= 30 and len(To) <= 30: 
     sheet1.write(x, 3, From) 
     sheet1.write(x, 4, To) 
    else: 
     while len(From) > 30 or len(To) > 30: 
      for y in xrange(Table.nrows): 
       word = str(Table.col(0)[y].value) 
       abbrv = str(Table.col(1)[y].value) 
       if len(From) > 30: 
        From = From.replace(word, abbrv) 
       if len (To) > 30: 
        To = To.replace(word, abbrv) 
      sheet1.write(x, 3, From) 
      sheet1.write(x, 4, To) 
      break 

reduc.save("newdoc.xls") 
print " DONE! 

以下是我的更新代碼。這幾乎是即時的,這是我的預期。我預先加載了我想要的所有列,然後通過相同的循環系統運行它。然後我存儲而不是將數據寫入新的excel文件。在所有數據減少後,我將每個單元保存在一個單獨的for循環中。感謝您的建議傢伙。

import xlwt 
import xlrd 

# Workbook must be located in the Python27 folder in the C:/directory 
book = xlrd.open_workbook() # opens exel file for data input 

# Calls the sheets I will be working with 
Data = book.sheet_by_index(0) 
Table = book.sheet_by_index(1) 

# Import column data from excel 
From = Data.col_values(15) 
To = Data.col_values(16) 
word = Table.col_values(0) 
abbrv = Table.col_values(1) 

# Empty variables to be filled with reduced string 
From_r = [] 
To_r = [] 

# Notes to be added 
for x in xrange(Data.nrows): 
    if len(From[x]) <= 28 and len(To[x]) <= 28: 
     From_r.append(From[x]) 
     To_r.append(To[x]) 
    else: 
     while len(From[x]) > 28 or len(To[x]) > 28: 
      for y in xrange(Table.nrows): 
       if len(From[x]) > 28: 
        From[x] = From[x].replace(word[y], abbrv[y]) 
       if len (To[x]) > 28: 
        To[x] = To[x].replace(word[y], abbrv[y]) 
      From_r.append(From[x]) 
      To_r.append(To[x]) 
      break 

# Create new excel file to write reduced strings into 
reduc = xlwt.Workbook() 
sheet1 = reduc.add_sheet("sheet 1") 

# Itterate through list to write each object into excel 
for i in xrange(Data.nrows): 
    sheet1.write(i, 3, From_r[i]) 
    sheet1.write(i, 4, To_r[i]) 

# Save reduced string in new excel file 
reduc.save("lucky.xls") 
print " DONE! " 

回答

1

緩慢可能是由於低效的替換代碼。 你應該嘗試加載旅館的所有單詞和相應的縮寫,除非列表如此之大,否則將耗盡內存。 然後爲了更快的速度,你可以一次性替換所有的單詞。

做到這一點,將它移出循環

words = [str(cell.value) for cell in Table.col(0)] #list comprehension 
abbr = [str(cell.value) for cell in Table.col(1)] 
replacements = zip(words, abbr) 

此功能從here使用正則表達式模塊在一個給定的名單更換對所有的比賽。

import re 
def multiple_replacer(*key_values): 
    replace_dict = dict(key_values) 
    replacement_function = lambda match: replace_dict[match.group(0)] 
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values])) 
    return lambda string: pattern.sub(replacement_function, string) 

要使用它,這樣做:

replaceFunc = multiple_replacer(*replacements) #constructs the function. Do this outside the loop, after the replacements have been gathered. 
myString = replaceFunc(myString) 
+0

謝謝你的投入,我會在所有的話和abbrvs加載。對所有匹配進行替換的唯一問題是我只是想將字符串轉換爲特定的長度。這些字符串將用於爲系統建立索引,並且我希望它在仍然適合特定框的情況下易於閱讀。 – user3081146

+0

您可以使用不同的方法,例如分割字符串並從左到右或從右向左替換項,直到總長足夠短,但可能比正則表達式答案慢。我個人認爲它會很好,因爲你仍然不會有雙層嵌套,這使得這麼耗時,成千上萬真的不是那麼多,但你必須嘗試一下,看看。如果這聽起來像你想測試的東西,我可以添加一個答案,顯示細節。 –