2012-11-26 191 views
0

我正在編寫一個解析Excel文件的Python腳本。這個腳本的目的是在第1列計算每個單元格的值,它在列值的數量2.使用python動態計算excel行

每例子,看起來像這樣一個Excel文件:

12 abc 
12 abc 
12 efg 
12 efg 
13 hij 
13 hij 
13 klm 

我的腳本會返回:

For cell value 12 : 2 values "abc", 2 values "efg" and for cell value 13 : 2 values "hij" and 1 value "klm". 

我使用Python中的散列試過了,這裏就是我想要做的事:

import xlrd 
workbook = xlrd.open_workbook('myexcelfile.xls') 
worksheet = workbook.sheet_by_name('myexcelsheet') 
num_rows = worksheet.nrows - 1 
num_cells = worksheet.ncols - 1 
first_col = 0 
scnd_col = 1 
curr_row = 1 
hash = [] 
while curr_row < num_rows: 
curr_row += 1 
curr_cell = -1 
print 'IN ROW', curr_row 
while curr_cell < num_cells: 
     curr_cell += 1 
     print 'IN CELL', curr_cell 
     cell0_val = int(worksheet.cell_value(curr_row,first_col)) 
     cell1_val = worksheet.cell_value(curr_row,scnd_col) 
     print 'CELL VALUE', cell0_val, cell1_val 
     hash[cell0_val][cell1_val]+=1 

我當然會以錯誤的方式使用這個散列,但我真的是Python的新手,並且我找不到任何符合我真正想要的好例子。任何幫助將非常感激。謝謝

+0

你確定你正在解析_Excel_文件,而不是更像是'csv'或其他格式的東西嗎?我非常懷疑你能夠輕易地用Python解析一個'.xls'或'.xlsx'文件。 – jdotjdot

+0

他使用'xlrd',一個庫來讀取Excel文件。 –

回答

0

你的意思是一個dictionary
也許把每個鍵內的列表。 首先它是hash = {}

並且如果只有兩列,則不需要第二個循環。只是這樣做

cell0_val = int(worksheet.cell_value(curr_row,first_col)) 
cell1_val = worksheet.cell_value(curr_row,scnd_col) 

if cell0_val in hash: 
    hash[cell0_val].append(cell1_val) 
else: 
    hash[cell0_val] = [cell1_val] 

你應該得到類似hash= {12: ['abc', 'abc', 'efg', 'efg'], 13: ['hij', 'hij', 'klm']}

0

我會用一個雙層詞典:

所以你的字典中定義:

celldict =字典()#或celldict = {}

import xlrd 
workbook = xlrd.open_workbook('myexcelfile.xls') 
worksheet = workbook.sheet_by_name('myexcelsheet') 

num_rows = worksheet.nrows - 1 
num_cells = worksheet.ncols - 1 

first_col = 0 
scnd_col = 1 


# Read Data into double level dictionary 
celldict = dict() 
for curr_row in range(num_rows) : 

    #print 'IN ROW',curr_row 
    cell0_val = int(worksheet.cell_value(curr_row,first_col)) 
    cell1_val = worksheet.cell_value(curr_row,scnd_col) 

    # if this cell number isn't in my cell dict add it 
    if not cell0_val in celldict : 

     celldict[cell0_val] = dict() 

    # if the entry isn't in the second level dictionary then add it, with count 1 

    if not cell1_val in celldict[cell0_val] : 
     celldict[cell0_val][cell1_val] = 1 

    # Otherwise increase the count 
    else : 
     celldict[cell0_val][cell1_val] += 1 

# Outputs Dictionary hierachy 
print celldict 
# Outputs it more pretiliy 
for cellval in celldict : 
    print "For cell value ", cellval ,":" 
    for cellval2 in celldict[cellval] : 
     print cellval2," values", celldict[cellval][cellval2] 
1

你可以也可以這樣做:

from itertools import groupby 
from operator import itemgetter 
from collections import Counter 
import xlrd 

workbook = xlrd.open_workbook('myexcelfile.xls') 
sheet = workbook.sheet_by_name('myexcelsheet') 

as_list = sorted([sheet.row_values(rownum) for rownum in range(sheet.nrows)], 
       key=itemgetter(1)) 

for cell_value, vals in groupby(as_list, itemgetter(0)): 
    letter_values = [v[1] for v in vals] 
    occurrences = dict(Counter(letter_values)) 

    print 'For cell value {}:'.format(int(cell_value)) 
    print ', '.join('{} values {}'.format(str(c), v) 
        for v, c in occurrences.items()) 

然後根據需要格式化輸出。