2016-01-06 54 views
1

我想寫一個簡單的腳本,將csv作爲輸入,並將其寫入單個電子表格文檔。現在我有它的工作,但腳本很慢。大約需要10分鐘才能在兩張工作表中書寫cca 350行。我可以在Python Spreadsheets中使用gspread在Python中編寫整行代碼嗎?

這裏是腳本我有:

#!/usr/bin/python 
import json, sys 
import gspread 
from oauth2client.client import SignedJwtAssertionCredentials 

json_key = json.load(open('client_secrets.json')) 
scope = ['https://spreadsheets.google.com/feeds'] 

# change to True to see Debug messages 
DEBUG = False 

def updateSheet(csv,sheet): 
    linelen = 0 
    counter1 = 1 # starting column in spreadsheet: A 
    counter2 = 1 # starting row in spreadsheet: 1 
    counter3 = 0 # helper for iterating through line entries 
    credentials = SignedJwtAssertionCredentials(json_key['client_email'], json_key['private_key'], scope) 

    gc = gspread.authorize(credentials) 

    wks = gc.open("Test Spreadsheet") 
    worksheet = wks.get_worksheet(sheet) 
    if worksheet is None: 
     if sheet == 0: 
      worksheet = wks.add_worksheet("First Sheet",1,8) 
     elif sheet == 1: 
      worksheet = wks.add_worksheet("Second Sheet",1,8) 
     else: 
      print "Error: spreadsheet does not exist" 
      sys.exit(1) 

    worksheet.resize(1,8) 

    for i in csv: 
     line = i.split(",") 
     linelen = len(line)-1 
     if (counter3 > linelen): 
      counter3 = 0 
     if (counter1 > linelen): 
      counter1 = 1 

     if (DEBUG): 
      print "entry length (starting from 0): ", linelen 
      print "line: ", line 
      print "counter1: ", counter1 
      print "counter3: ", counter3 
     while (counter3<=linelen): 
      if (DEBUG): 
       print "writing line: ", line[counter3] 
      worksheet.update_cell(counter2, counter1, line[counter3].rstrip('\n')) 
      counter3 += 1 
      counter1 += 1 

     counter2 += 1 
     worksheet.resize(counter2,8) 

我的系統管理員,所以我提前爲低劣的代碼道歉。

無論如何,腳本將從csv中逐行讀取,按逗號分隔並逐個寫入,因此編寫它需要一些時間。這個想法是讓cron每天執行一次,它會刪除舊的條目並寫入新的條目 - 這就是爲什麼我使用resize()。

現在,我想知道是否有一個更好的方法來獲取整個csv行,並將其寫入每個值在它自己的單元格的工作表中,避免像現在一樣寫入單元格?這將顯着減少執行它所需的時間。

謝謝!

回答

2

是的,這可以做到。我上傳了100行12行的數據塊,它處理得很好 - 我不確定這個比例如何,但是對於像一個整體csv一樣的東西。另外請注意,工作表的默認長度爲1000行,如果您嘗試引用此範圍之外的行(因此請事先使用add_rows以確保空間有限),您將收到錯誤消息。簡單的例子:

data_to_upload = [[1, 2], [3, 4]] 

column_names = ['','A','B','C','D','E','F','G','H', 'I','J','K','L','M','N', 
       'O','P','Q','R','S','T','U','V','W','X','Y','Z', 'AA'] 

# To make it dynamic, assuming that all rows contain same number of elements 
cell_range = 'A1:' + str(column_names[len(data_to_upload[0])]) + str(len(data_to_upload)) 

cells = worksheet.range(cell_range) 

# Flatten the nested list. 'Cells' will not by default accept xy indexing. 
flattened_data = flatten(data_to_upload) 

# Go based on the length of flattened_data, not cells. 
# This is because if you chunk large data into blocks, all excess cells will take an empty value 
# Doing the other way around will get an index out of range 
for x in range(len(flattened_data)): 
    cells[x].value = flattened_data[x].decode('utf-8') 

worksheet.update_cells(cells) 

如果行的長度不同的那麼顯然你需要插入空字符串的適當數量爲cells,以確保這兩個名單沒有得到同步。爲了方便起見,我使用瞭解碼,因爲我一直使用特殊字符崩潰,因此似乎最好只是將其放入。

相關問題