2017-09-05 27 views
3

如何將此數據框轉換爲由numpy.nan行分割的數據框字典?如何通過空格拆分這個數據框?

import pandas 
import numpy 
names = ['a', 'b', 'c'] 
df = pandas.DataFrame([1,2,3,numpy.nan, 4,5,6,numpy.nan, 7, 8,9]) 
>>> df 

     0 
0 1.0 
1 2.0 
2 3.0 
3 NaN 
4 4.0 
5 5.0 
6 6.0 
7 NaN 
8 7.0 
9 8.0 
10 9.0 

所需的輸出:

df_dict = {'a': <df1>, 'b': <df2>, 'c': <df3>} 

df1 = 

     0 
0 1.0 
1 2.0 
2 3.0 

df2 = 

4 4.0 
5 5.0 
6 6.0 

df3 = 

8 7.0 
9 8.0 
10 9.0 

回答

3

使用dict comprehensiongroupby用:

d = {names[i]: x.dropna() for i, x in df.groupby(df[0].isnull().cumsum())} 

{'c':  0 
0 7.0 
1 8.0 
2 9.0, 'b':  0 
0 4.0 
1 5.0 
2 6.0, 'a':  0 
0 1.0 
1 2.0 
2 3.0} 

print (d['a']) 
    0 
0 1.0 
1 2.0 
2 3.0 

print (d['b']) 
    0 
4 4.0 
5 5.0 
6 6.0 

print (d['c']) 
     0 
8 7.0 
9 8.0 
10 9.0 
+0

你是對的。你能回到你原來的答案嗎? – jezrael

+1

我已添加註釋。 – Zero

1

這裏有一種方法

本來

In [2109]: df_dict = dict(zip(
          names, 
          [g.dropna() for _, g in df.groupby(df[0].isnull().cumsum())] 
          )) 

在修改意識到它與另一個答案相同。

In [2100]: df_dict = {names[i]: g.dropna() for i, g in df.groupby(df[0].isnull().cumsum())} 

In [2101]: df_dict['a'] 
Out[2101]: 
    0 
0 1.0 
1 2.0 
2 3.0 

In [2102]: df_dict['b'] 
Out[2102]: 
    0 
4 4.0 
5 5.0 
6 6.0 

In [2103]: df_dict['c'] 
Out[2103]: 
     0 
8 7.0 
9 8.0 
10 9.0 
+1

:(現在答案是一樣的:( – jezrael

2

另一種方法是通過numpy的陣列分割即

import numpy as np 
dic = {names[i]: j.dropna() for i,j in enumerate(np.array_split(df, np.where(df[0].isnull())[0]))} 
 
%%timeit 
dic = {names[i]: j.dropna() for i,j in enumerate(np.array_split(df, np.where(df[0].isnull())[0]))} 
100 loops, best of 3: 2.51 ms per loop 
%%timeit 
d = {names[i]: x.dropna() for i, x in df.groupby(df[0].isnull().cumsum())} 
100 loops, best of 3: 6.1 ms per loop 
相關問題