2016-02-22 17 views
4

我使用xlwings來替換我的VB代碼,但由於我不是一個有經驗的程序員,我在想 - 要使用哪種數據結構?在Python中使用哪種數據結構來替換Excel 2-dim數組的字符串/數量?

  1. 數據爲2列的.xls文件,格式如下:在VB我提升到一個基本二維陣列arrCampaignsAmounts(I,J)此:

    西1: 'market_channel_campaign_product';西2: '2334.43 $'

  2. 然後我串聯從4列字上的另一片成類似 '字符串',到另一個2維數組arrStrings(I,J)

    'Austria_Facebook_Winter_Active vacation'; 'rowNumber'

  3. 最後,我從1. array中的數組中搜索字符串2. array;如果發現我寫入數量從arrStrings(i,2) rowNumber。

我會在這個任務中使用4個列表嗎?
兩本字典?
還有別的嗎?

+2

你應該看看'pandas'及其'DataFrame'結構。它基本上是一個具有列名和行索引的大數組。 – CoMartel

+0

HarryPotfleur - 我目前正在使用Pythons默認的csv impot(閱讀器)。我可以得到它來導入數據並打印到終端,你知道它導入的數據結構是什麼?字符串,列表?將檢查出Pandas –

+0

我認爲熊貓是一種從csv文件導入/導出的方法 – CoMartel

回答

1

絕對使用熊貓數據框。這裏有參考和非常簡單的Dataframe示例。

#reference: http://pandas.pydata.org/pandas-docs/stable/10min.html 
#reference: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html. 

import numpy as np 
import pandas as pd 

def df_dupes(df_in): 
    ''' 
    Returns [object,count] pairs for each unique item in the dataframe.  
    ''' 
    # import pandas 
    if isinstance(df_in, list) or isinstance(df_in, tuple): 
     import pandas as pd 
     df_in = pd.DataFrame(df_in) 
    return df_in.groupby(df_in.columns.tolist(),as_index=False).size() 


def df_filter_example(df): 
    ''' 
    In [96]: df 
    Out[96]: 
     A B C D 
    0 1 4 9 1 
    1 4 5 0 2 
    2 5 5 1 0 
    3 1 3 9 6 
    ''' 
    import pandas as pd 
    df=pd.DataFrame([[1,4,9,1],[4,5,0,2],[5,5,1,0],[1,3,9,6]],columns=['A','B','C','D']) 
    return df[(df.A == 1) & (df.D == 6)] 

def df_compare(df1, df2, compare_col_list, join_type): 
    ''' 
    df_compare compares 2 dataframes. 
    Returns left, right, inner or outer join 
    df1 is the first/left dataframe 
    df2 is the second/right dataframe 
    compare_col_list is a lsit of column names that must match between df1 and df2 
    join_type = 'inner', 'left', 'right' or 'outer' 
    '''    
    import pandas as pd 
    return pd.merge(df1, df2, how=join_type, 
       on=compare_col_list) 

def df_compare_examples(): 
    import numpy as np 
    import pandas as pd 
    df1=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns = ['c1', 'c2', 'c3']) 
    '''  c1 c2 c3 
      0 1 2 3 
      1 4 5 6 
      2 7 8 9 ''' 
    df2=pd.DataFrame([[4,5,6],[7,8,9],[10,11,12]], columns = ['c1', 'c2', 'c3']) 
    '''  c1 c2 c3 
      0 4 5 6 
      1 7 8 9 
      2 10 11 12 ''' 
    # One can see that df1 contains 1 row ([1,2,3]) not in df3 and 
    # df2 contains 1 rown([10,11,12]) not in df1. 

    # Assume c1 is not relevant to the comparison. So, we merge on cols 2 and 3. 
    df_merge = pd.merge(df1,df2,how='outer',on=['c2','c3']) 
    print(df_merge) 
    '''  c1_x c2 c3 c1_y 
      0  1 2 3 NaN 
      1  4 5 6  4 
      2  7 8 9  7 
      3 NaN 11 12 10 ''' 
    ''' One can see that columns c2 and c3 are returned. We also received 
      columns c1_x and c1_y, where c1_X is the value of column c1 
      in the first dataframe and c1_y is the value of c1 in the second 
      dataframe. As such, 
       any row that contains c1_y = NaN is a row from df1 not in df2 & 
       any row that contains c1_x = NaN is a row from df2 not in df1. ''' 
    df1_unique = pd.merge(df1,df2,how='left',on=['c2','c3']) 
    df1_unique = df1_unique[df1_unique['c1_y'].isnull()] 
    print(df1_unique) 
    df2_unique = pd.merge(df1,df2,how='right',on=['c2','c3']) 
    print(df2_unique) 
    df_common = pd.merge(df1,df2,how='inner',on=['c2','c3']) 
    print(df_common) 


def delete_column_example(): 
    print 'create df' 
    import pandas as pd 
    df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns=['a','b','c']) 
    print 'drop (delete/remove) column' 
    col_name = 'b' 
    df.drop(col_name, axis=1, inplace=True) # or df = df.drop('col_name, 1) 

def delete_rows_example(): 
    print '\n\ncreate df' 
    import pandas as pd 
    df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns=['col_1','col_2','col_3']) 
    print(df) 
    print '\n\nappend rows' 
    df= df.append(pd.DataFrame([[11,22,33]], columns=['col_1','col_2','col_3'])) 
    print(df) 
    print '\n\ndelete rows where (based on) column value' 
    df = df[df.col_1 == 4] 
    print(df)