2017-01-06 52 views
0

我想獲得兩個熊貓數據表的相同元素,索引數據併合並它。我用它來處理大量的數據(百萬)。弗里斯特表(DF)是constatn,和第二(D2)爲改變每一個循環,用新的元件將與所述第一表進行合併。熊貓哈希表給出了關鍵的錯誤:0

這是我對這個過程的代碼:

df = pd.read_csv("inputfile.csv",header=None) 
d1 = pd.DataFrame(df).set_index(0) 

for i in range(0, len(df)): 
    try: 
      follower_id=twitter.get_followers_ids(user_id=df.iloc[i][0],cursor=next_cursor) 


      f=follower_id['ids'] 
      json.dumps(f) 
      d2 = pd.DataFrame(f).set_index(0) 
      match_result = pd.merge(d1,d2,left_index=True,right_index=True) 
      fk=[df.iloc[i][0] for number in range(len(match_result))] 
      DF = pd.DataFrame(fk) 

      DF.to_csv(r'output1.csv',header=None,sep=' ',index=None) 
      match_result.to_csv(r'output2.csv', header=None, sep=' ') 

我都經歷過,這段代碼,運行良好了一段時間,但可能但─它relatad到第二databasses大小至極後改變每LOOP-節目給了我下面的錯誤消息,並且停止運行:

Traceback (most recent call last): 
File "halozat3.py", line 39, in <module> 
d2 = pd.DataFrame(f).set_index(0) #1Trump koveto kovetolistaja 
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 2372, in set_index 
level = frame[col].values 
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1678, in __getitem__ 
return self._getitem_column(key) 
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1685, in _getitem_column 
return self._get_item_cache(key) 
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1052, in _get_item_cache 
values = self._data.get(item) 
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2565, in get 
loc = self.items.get_loc(item) 
File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 1181, in get_loc 
return self._engine.get_loc(_values_from_object(key)) 
File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas/index.c:3656) 
File "index.pyx", line 149, in pandas.index.IndexEngine.get_loc (pandas/index.c:3534) 
File "hashtable.pyx", line 381, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7035) 
File "hashtable.pyx", line 387, in   pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6976) 
KeyError: 0 

可能是什麼問題呢?

+1

很簡單,'0'不是列標籤你必須寫下儘可能多的行。你有沒有試過'd2 = pd.DataFrame(f).set_index('ids')'? – IanS

+0

我建議捕捉異常並在發生KeyError時打印f。通過這種方式,你可以看到這個f是如何與你期望的不同的,因爲由於某種原因,這個f沒有第0列,而其他人卻這麼做。 – Skirrebattie

+0

我tryed它,但給了我:'KeyError異常:「ids'' – John

回答

0

你的數據框中只有一行嗎?

,只要你喜歡 Look