通過python熊貓中的MultiIndex數據迭代

我希望能夠通過在多索引上進行分組來迭代pandas DataFrame。在這裏，我希望能夠在每個行業中一起處理一組行。我加載了一個多索引。通過python熊貓中的MultiIndex數據迭代

from StringIO import StringIO 
data = """industry,location,number 
retail,brazil,294 
technology,china,100 
retail,nyc,2913 
retail,paris,382 
technology,us,2182 
""" 

df = pd.read_csv(StringIO(data), sep=",", index_col=['industry', 'location'])

所以，我希望有些事情到這種效果：

for industry, rows in df.iter_multiindex(): 
    for row in rows: 
     process_row(row)

是否有這樣的方式來做到這一點？

來源

2014-12-03 lollercoaster

可以GROUPBY多指數的第一級（行業），然後遍歷槽組：

In [102]: for name, group in df.groupby(level='industry'): 
    .....:  print name, '\n', group, '\n' 
    .....: 
retail 
        number 
industry location 
retail brazil  294 
     nyc   2913 
     paris  382 

technology 
        number 
industry location 
technology china  100 
      us   2182

group將每次都是數據幀，然後您可以遍歷該數據幀（例如使用for row in group.iterrows()。

但是，在大多數情況下，這樣的迭代是不需要的！ process_row需要什麼？可能你可以通過矢量化方式直接在groupby對象上執行此操作。

來源

2014-12-03 20:29:57 joris

不知道爲什麼你要做到這一點，但你可以做這樣的：

for x in df.index: 
    print x[0] # industry 
    process(df.loc[x]) # row

但它不是你平時怎麼用數據框中工作，你可能想了解apply()（Essential Basic Functionality也真正有用的）

來源

2014-12-03 19:57:05

我想看到一些關於-1 – 2014-12-04 17:52:13

通過python熊貓中的MultiIndex數據迭代

回答

相關問題