1
我已經問了a question,但是當我對超過百萬行的文件執行下面的代碼時,我遇到了一個問題。expect string,int found在一個大文件上使用熊貓數據框
代碼:
import numpy as np
import pandas as pd
import xlrd
import xlsxwriter
df = pd.read_excel('full-cust-data-nonconcat.xlsx')
df =df.groupby('ORDER_ID')['ASIN'].agg(','.join).reset_index()
writer = pd.ExcelWriter('PythonExport-Data.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
print df
錯誤:
Traceback (most recent call last):
File "grouping-data.py", line 9, in <module>
df =df.groupby('ORDER_ID')['ASIN'].agg(','.join).reset_index()
File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2668, in aggregate
result = self._aggregate_named(func_or_funcs, *args, **kwargs)
File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2786, in _aggregate_named
output = func(group, *args, **kwargs)
TypeError: sequence item 0: expected string, int found
自一個巨大的文件我怎麼能檢查它在哪兒找到字符串,並得到詮釋?
有什麼辦法可以將所有這些轉換爲字符串?
樣本數據:(這些ID是字母數字)
ID1 Some_other_id1
ID2 Some_other_id2
THA NKS IT工作。但是,當我在其他文件中嘗試使用一個字符串列時,它會給出錯誤信息:文件「grouping-data.py」,第11行,在 df = df ['ASIN'] .stype(str).groupby(df ['ORDER_ID'])。agg(','。join).reset_index() 文件「/Library/Python/2.7/site-packages/pandas/core/frame.py」,第2059行,在__getitem__中 return self ._getitem_column(鍵) 文件 「/Library/Python/2.7/site-packages/pandas/core/frame.py」,線2066,在_getitem_column 回報self._get_item_cache(鍵) 文件「/Library/Python/2.7 /site-packages/pandas/core/generic.py「,第1386行,在_get_item_cache values = sel –
user2696258
我不確定這是什麼,也許你可以分享一些失敗命令的數據。 – Psidom