2017-09-22 34 views
1

我有兩個數據框與下面提到的列。當我嘗試追加第二個到第一個我得到ValueError:計劃形狀不對齊錯誤。熊貓dataframe.append給出錯誤:計劃形狀沒有對齊

DF1列:

Index([     u'asin',  u'view_publish_data', 

       u'data_viewer',  u'relationship_viewer', 
      u'parent_task_id',   u'submission_id', 
        u'source',   u'creation_date', 
       u'created_by',    u'vendor_code', 
         u'week',    u'processor', 
       u'brand_name',   u'brand_name_new', 
       u'bullet_point',    u'cost_price', 
      u'country_of_origin',     u'cpu_type', 
       u'cpu_type_new',    u'item_name', 
      u'item_type_keyword',    u'list_price', 
    u'minimum_order_quantity',     u'model', 
      u'product_category', u'product_site_launch_date', 
     u'product_subcategory',   u'product_tier_id', 
    u'replenishment_category',  u'product_description', 
       u'style_name',      u'vc', 
       u'vendor_code',  u'warranty_description'], 
    dtype='object') 

DF2列:

Index([       u'asin',    u'view_publish_data', 

        u'data_viewer',   u'relationship_viewer', 
        u'parent_task_id',     u'submission_id', 
          u'source',     u'creation_date', 
         u'created_by',     u'vendor_code', 
          u'week',     u'brand_name', 
       u'bullet_features',     u'color_name', 
          u'itk',      u'item_name', 
         u'list_price',      u'new_brand', 
       u'product_catagory',   u'product_sub_catagory', 
       u'product_tier_id',  u'replenishment_category', 
         u'size_name',     u'cost_price', 
       u'item_type_keyword',      u'our_price', 
      u'is_shipped_from_vendor',  u'manufacturer_vendor_code', 
      u'product_description',     u'vendor_code'], 
    dtype='object') 
+0

你如何追加它? – jezrael

+0

它們是具有不同列的兩個不同的數據框,你會如何追加這些數據框? – Yorian

+0

我想根據列名追加它。如果在兩個數據框中都有一個列名,那麼最終的數據框將包含兩個列。如果列中不存在其中一列,那麼最後一個數據框中的那列應該具有NAN,而不是那個列。 –

回答

1

您可以使用concatalign什麼回報對準DataFrame S的元組:

cols1 = pd.Index([ u'asin', u'view_publish_data', 

       u'data_viewer',  u'relationship_viewer', 
      u'parent_task_id',   u'submission_id', 
        u'source',   u'creation_date', 
       u'created_by',    u'vendor_code', 
         u'week',    u'processor', 
       u'brand_name',   u'brand_name_new', 
       u'bullet_point',    u'cost_price', 
      u'country_of_origin',     u'cpu_type', 
       u'cpu_type_new',    u'item_name', 
      u'item_type_keyword',    u'list_price', 
    u'minimum_order_quantity',     u'model', 
      u'product_category', u'product_site_launch_date', 
     u'product_subcategory',   u'product_tier_id', 
    u'replenishment_category',  u'product_description', 
       u'style_name',      u'vc', 
       u'vendor_code',  u'warranty_description']) 

cols2 = pd.Index([ u'asin', u'view_publish_data', 

        u'data_viewer',   u'relationship_viewer', 
        u'parent_task_id',     u'submission_id', 
          u'source',     u'creation_date', 
         u'created_by',     u'vendor_code', 
          u'week',     u'brand_name', 
       u'bullet_features',     u'color_name', 
          u'itk',      u'item_name', 
         u'list_price',      u'new_brand', 
       u'product_catagory',   u'product_sub_catagory', 
       u'product_tier_id',  u'replenishment_category', 
         u'size_name',     u'cost_price', 
       u'item_type_keyword',      u'our_price', 
      u'is_shipped_from_vendor',  u'manufacturer_vendor_code', 
      u'product_description',     u'vendor_code']) 

df1 = pd.DataFrame([range(len(cols1))], columns=cols1) 
df2 = pd.DataFrame([range(len(cols2))], columns=cols2) 

df = pd.concat(list(df1.align(df2)), ignore_index=True) 
print (df) 

    asin brand_name brand_name_new bullet_features bullet_point \ 
0  0   12   13.0    NaN   14.0 
1  0   11    NaN    12.0   NaN 

    color_name cost_price country_of_origin cpu_type cpu_type_new ... \ 
0   NaN   15    16.0  17.0   18.0 ...  
1  13.0   23    NaN  NaN   NaN ...  

    style_name submission_id vc vendor_code vendor_code vendor_code \ 
0  30.0    5 31.0   9   9   32 
1   NaN    5 NaN   9   29   9 

    vendor_code view_publish_data warranty_description week 
0   32     1     33.0 10 
1   29     1     NaN 10 

[2 rows x 46 columns]