2014-07-24 68 views
2

我想一個數據幀從大熊貓出口的原因練成這樣做:熊貓數據框中導出到Excel類型錯誤

writer = pd.io.excel.ExcelWriter(args.out_file, engine='xlsxwriter', options={'constant_memory': True}) 
summary_data.to_excel(writer, sheet_name='summary', na_rep='NA', index=False) 

但我得到這一信息:

"cannot convert the series to {0}".format(str(converter))) 
TypeError: cannot convert the series to <type 'float'> 

有什麼錯我的數據框,所以我對這個錯誤信息有點困惑,它在數據幀包含少於1000行時發生,但一旦它變得更大,發生此錯誤

任何想法?

感謝

更新summary_data.info()

<class 'pandas.core.frame.DataFrame'> 
Int64Index: 2176 entries, 0 to 2175 
Data columns (total 27 columns): 
chrom         2176 non-null object 
coord         2176 non-null int64 
ref_base        2176 non-null object 
var_base        2176 non-null object 
normal_ref_counts      2176 non-null int64 
normal_var_counts      2176 non-null int64 
VOA867-A1_S43_merged_ref_counts   2176 non-null object 
VOA867-A1_S43_merged_var_counts   2176 non-null object 
VOA867-A1_S43_merged_somatic_status  2176 non-null object 
VOA867-E02_S73_merged_ref_counts  2176 non-null object 
VOA867-E02_S73_merged_var_counts  2176 non-null object 
VOA867-E02_S73_merged_somatic_status 2176 non-null object 
VOA867-F03_S76_merged_ref_counts  2176 non-null object 
VOA867-F03_S76_merged_var_counts  2176 non-null object 
VOA867-F03_S76_merged_somatic_status 2176 non-null object 
VOA867-F04_S75_merged_ref_counts  2176 non-null object 
VOA867-F04_S75_merged_var_counts  2176 non-null object 
VOA867-F04_S75_merged_somatic_status 2176 non-null object 
VOA867-F09_S74_merged_ref_counts  2176 non-null object 
VOA867-F09_S74_merged_var_counts  2176 non-null object 
VOA867-F09_S74_merged_somatic_status 2176 non-null object 
VOA867-T_S41_merged_ref_counts   2176 non-null object 
VOA867-T_S41_merged_var_counts   2176 non-null object 
VOA867-T_S41_merged_somatic_status  2176 non-null object 
VOA867xeno_S18_merged_ref_counts  2176 non-null object 
VOA867xeno_S18_merged_var_counts  2176 non-null object 
VOA867xeno_S18_merged_somatic_status 2176 non-null object 
dtypes: int64(3), object(24)None 

這裏是產生它

def get_summary_data(data, normal_sample): 
    summary_data = [] 
    for index, normal_row in data[normal_sample].iterrows(): 
     out_row = {'chrom': index[0], 
        'coord': index[1], 
        'ref_base': normal_row['ref_base'], 
        'var_base': normal_row['var_base'], 
        'normal_ref_counts': normal_row['ref_counts'], 
        'normal_var_counts': normal_row['var_counts'], 
        } 

     normal_variant_status = normal_row['variant_status'] 

     normal_depth = out_row['normal_ref_counts'] + out_row['normal_var_counts'] 

     if normal_depth > 0: 
      normal_var_freq = out_row['normal_var_counts']/normal_depth 
     else: 
      normal_var_freq = 0 

     for sample in data: 
      if sample == normal_sample: 
       continue 

      sample_row = data[sample].ix[[index]] 

      out_row['{0}_ref_counts'.format(sample)] = sample_row['ref_counts'] 

      out_row['{0}_var_counts'.format(sample)] = sample_row['var_counts'] 

      sample_variant_status = str(sample_row['variant_status'].iget(0)) 

      sample_somatic_status = call_somatic_status(normal_variant_status, 
                 sample_variant_status, 
                 normal_var_freq, 
                 args.min_normal_germline_var_freq) 

      out_row['{0}_somatic_status'.format(sample)] = sample_somatic_status 

     summary_data.append(out_row) 

    columns = ['chrom', 'coord', 'ref_base', 'var_base', 'normal_ref_counts', 'normal_var_counts'] 

    for sample in data: 
     if sample == normal_sample: 
      continue 

     columns.append('{0}_ref_counts'.format(sample)) 

     columns.append('{0}_var_counts'.format(sample)) 

     columns.append('{0}_somatic_status'.format(sample)) 

    summary_data = pd.DataFrame(summary_data, columns=columns) 

    return summary_data 

計數功能應該是INT,但我可以看到它在這裏被認爲是字符串,可能是因爲它是從另一個數據框提取的?

+0

show''df.info()'' – Jeff

+0

你應該只有對象類型的''object'' dtypes。你是如何生成數據的? – Jeff

+0

是正確的,但我可以看到計數有可疑的對象dtypes – Rad

回答

0

.to_excel只接受類型爲object的列。快速的方式來解決,這是迫使所有列的寫作前的對象類型:

summary_data = summary_data.astype(object) 

然後,你可以把它寫不崩潰:

summary_data.to_excel(writer, sheet_name='summary', na_rep='NA', index=False) 

有一些改寫(munging)做這裏在某些情況下,我必須將列複製爲對象類型。奇怪的。另一種選擇是刪除問題的列。