2014-02-27 64 views
0

是否有任何方法使用python在Excel工作簿中拆分/拆分單元格?我想解釋如下 -如何使用python在excel工作簿中拆分合並的單元格

enter image description here

結果應符合以下條目的新的Excel文件 -

enter image description here

使用xlrd所有合併列是複製相同的字符串我的解決辦法如下所示 -

[注意:「formatted_info = True」標誌尚未在xlrd中實現,因此我無法直接獲取合併單元的列表..我不是s 。upposed升級xlrd上設置]

def xlsx_to_dict(): 
    workbook = xlrd.open_workbook(xlsfile) 
    worksheet_names = workbook.sheet_names() 
    for worksheet_name in worksheet_names: 
     worksheet = workbook.sheet_by_name(worksheet_name) 
     num_rows = worksheet.nrows - 1 
     num_cells = worksheet.ncols - 1 
     curr_row = -1 
     header_row = worksheet.row(0) 
     columns = [] 
     for cell in range(len(header_row)): 
      value = worksheet.cell_value(0, cell) 
      columns.append(value) 

     cities = [] 

     for row in range(1,num_rows): 
      value = worksheet.cell_value(row,0) 
      type = worksheet.cell_type(row,0) 
      if not value == "": 
       cities.append(value) 

     names = [] 
     for row in range(1,num_rows): 
      value = worksheet.cell_value(row,1) 
      type = worksheet.cell_type(row,1) 
      if not value == "": 
       names.append(value) 

      current_city = cities[0] 
      result_dict = {} 
      for curr_row in range(1,num_rows): 
       row = worksheet.row(curr_row) 
       curr_cell = -1 
       curr_name = names[0] 
       while curr_cell < num_cells: 
        curr_cell += 1 
        cell_value = worksheet.cell_value(curr_row, curr_cell) 
        if cell_value in cities and curr_cell == 0: 
         current_city = cell_value 
         if not result_dict.has_key(current_city): 
          result_dict[current_city] = {} 
         continue 
        if cell_value == "" and curr_cell == 0: 
         continue 
        if cell_value in names and curr_cell == 1: 
         curr_name = cell_value 
         if not result_dict[current_city].has_key(curr_name): 
          result_dict[current_city][curr_name] = {} 
         continue 
        if cell_value == "" and curr_cell == 1: 
         continue 
        try: 
         result_dict[current_city][curr_name]['Phone'].append(cell_Value) 
        except: 
         result_dict[current_city][curr_name]['Phone'] = [cell_value] 

上述函數將返回如下的Python字典 -

{ 'New York' : { 'Tom' : [92929292, 33929] }, ........} 

我會再遍歷目錄,並寫入新的Excel。

但是,我想要一些拆分合並單元格的通用方法。

+1

請分享您迄今嘗試過的方法嗎?否則,人們會繼續向下投票 – Gogo

回答

0

如果你的文件中間沒有空單元格,這可能有助於讀取文件,做一些工作,重寫它。

def read_merged_xls(file_contents): 
    book = xlrd.open_workbook(file_contents=file_contents) 
    data = [] 
    sheet = book.sheet_by_index(0) 
    for rx in range(sheet.nrows): 
     line = [] 
     for ry in range(sheet.ncols): 
      cell = sheet.cell_value(rx,ry) 
      if not cell: 
       cell = data[-1][ry] if data else '' 
      line.append(cell) 
     data.append(line) 
    return data 
0

該函數獲得「真正的」單元格的值,即,合併單元格的值如果座標合併的單元格內的任何地方。

def unmergedValue(rowx,colx,thesheet): 
    for crange in thesheet.merged_cells: 
     rlo, rhi, clo, chi = crange 
     if rowx in xrange(rlo, rhi): 
      if colx in xrange(clo, chi): 
       return thesheet.cell_value(rlo,clo) 
    #if you reached this point, it's not in any merged cells 
    return thesheet.cell_value(rowx,colx) 

鬆散的基礎上http://www.lexicon.net/sjmachin/xlrd.html#xlrd.Sheet.merged_cells-attribute

非常innefficient,但應該是小十歲上下的電子表格可以接受的。

0
import xlrd 
import xlsxwriter 
import numpy as np 
import pandas as pd 
def rep(l,i): 
    j= i 
    while(j>=0): 
     if not l[j-1] == u'': 
      return l[j-1] 
     else: 
      j = j-1 
def write_df2xlsx(df,filename): 
    # Create a Pandas Excel writer using XlsxWriter as the engine. 
    writer = pd.ExcelWriter(filename,engine='xlsxwriter') 

    # Convert the dataframe to an XlsxWriter Excel object. 
    df.to_excel(writer, sheet_name='Sheet1', index = False) 

    # Close the Pandas Excel writer and output the Excel file. 
    writer.save() 

def csv_from_excel(filename): 

    wb = xlrd.open_workbook(filename) 
    worksheet_names = wb.sheet_names() 
    for worksheet_name in worksheet_names: 
     sh = wb.sheet_by_name(worksheet_name) 
     #To find the headers/column names of the xlsx file 

     header_index = 0 
     for i in range(sh.nrows): 
      if(len(filter(lambda x: not (x.value == xlrd.empty_cell.value), sh.row(i))) == len(sh.row(i))): 
       header_row = sh.row(i) 
       header_index = i 
       break 
     columns = [] 
     for cell in range(len(header_row)): 
      value = sh.cell_value(header_index, cell) 
      columns.append(value) 
     rows = [] 
     for rownum in range(header_index+1,sh.nrows): 
      rows.append(sh.row_values(rownum)) 
     data = pd.DataFrame(rows,columns = columns) 
     cols = [col for col in data.columns if u'' in list(data[col])] 
     res = [] 
     for col in cols: 
      t_list = list(data[col]) 
      res.append(map(lambda x,y: rep(list(data[col]),y[0]) if x == u'' else x,t_list,enumerate(t_list))) 
     for (col,r) in zip(cols,res): 
      data[col] = pd.core.series.Series(r) 
     write_df2xlsx(data,'ResultFile.xlsx') 
+1

歡迎來到StackOverflow。在發佈代碼作爲答案時,最好附上一個簡短的解釋。這裏是[答案] – BenH

相關問題