2016-02-29 62 views
0

我已經通過谷歌搜索找到答案,但沒有運氣。我需要重塑熊貓數據框以使數值非數值(comp_url)作爲多值索引數據框中的「值」。下面是數據的一個樣本:重塑熊貓數據框中的非數字值

store_name sku comp price ship comp_url 
    CSE  A1025 compA 30.99 9.99 some url 
    CSE  A1025 compB 30.99 9.99 some url 
    CSE  A1025 compC 30.99 9.99 some url 

我有幾個STORE_NAME的,所以我需要把它看起來像這樣:

SKU  CSE       store_name2 
     comp_url price ship  comp_url price ship 
A1025 some url 30.99 9.99  some url 30.99 9.99 

任何意見或指導意見,將不勝感激!

回答

0

假設每個SKU/STORE_NAME組合是獨一無二的,這裏是一個工作示例:

# imports 
import pandas as pd 

# Create a sample DataFrame. 
cols = ['store_name', 'sku', 'comp', 'price', 'ship', 'comp_url'] 
records = [['CSA', 'A1025', 'compA', 30.99, 9.99, 'some url'], 
      ['CSB', 'A1025', 'compB', 32.99, 9.99, 'some url2'], 
      ['CSA', 'A1026', 'compC', 30.99, 19.99, 'some url'], 
      ['CSB', 'A1026', 'compD', 30.99, 9.99, 'some url3']] 
df = pd.DataFrame.from_records(records, columns=cols) 

# Move both 'sku' and 'store_name' to the rows index; the combination 
# of these two columns provide a unique identifier for each row. 
df.set_index(['sku', 'store_name'], inplace=True) 
# Move 'store_name' from the row index to the column index. Each 
# unique value in the 'store_name' index gets its own set of columns. 
# In the multiindex, 'store_name' will be below the existing column 
# labels. 
df = df.unstack(1) 
# To get the 'store_name' above the other column labels, we simply 
# reorder the levels in the MultiIndex and sort it. 
df.columns = df.columns.reorder_levels([1, 0]) 
df.sort_index(axis=1, inplace=True) 

# Show the result. 
df 

這工作,因爲SKU/STORE_NAME標籤組合是唯一的。當我們使用unstack()時,我們只是移動標籤和單元格。我們沒有進行任何聚合。如果我們做的東西沒有獨特的標籤並且需要聚合,那麼pivot_table()可能是更好的選擇。

+0

這工作完美!謹慎解釋:) Unstack/Stack僅適用於獨特的索引組合? @SPKoder – Kevin

+0

@Kevin - 我添加了一些評論。希望這會有所幫助! – SPKoder

+0

@Kevin - 如果它解決了您的問題,請將其標記爲答案! – SPKoder

0

也許一個pandas.Panel更合適。它們用於3維數據。 DataFrames是2D