2016-09-08 80 views
1

我有這個簡單的clean_data函數,它將四捨五入輸入數據框中的數字。代碼起作用,但我很困惑它爲什麼會起作用。有人能幫助我理解嗎?Python Pandas如何處理表的列表?

我感到困惑的部分就是這個。 table_list是數據框的一個新列表,所以在運行代碼之後,table_list中的每個項目都應該被格式化,而tablea,tableb和tablec應該保持不變。但顯然我錯了。運行代碼後,所有三個表的格式都正確。到底是怎麼回事?非常感謝您的幫助。

table_list = [tablea, tableb, tablec] 

def clean_data(df): 

    for i in df: 
     df[i] = df[i].map(lambda x: round(x, 4)) 

    return df 

map(clean_data, table_list) 

回答

0

在Python中,數據框或任何複雜對象的列表僅僅是指向底層數據框的引用列表。例如,table_list的第一個元素是對tablea的引用。因此,clean_data將按照table_list [0]給定的參考直接進入數據框,即tablea。

0

簡單的辦法是打破這個代碼完全:

# List of 3 dataframes 
table_list = [tablea, tableb, tablec] 

# function that cleans 1 dataframe 
# This will get applied to each dataframe in table_list 
# when the python function map is used AFTER this function 
def clean_data(df): 

    # for loop. 
    # df[i] will be a different column in df for each iteration 
    # i iterates througn column names. 
    for i in df: 
     # df[i] = will overwrite column i 
     # df[i].map(lambda x: round(x, 4)) in this case 
     # does the same thing as df[i].apply(lambda x: round(x, 4)) 
     # in other words, it rounds each element of the column 
     # and assigns the reformatted column back to the column 
     df[i] = df[i].map(lambda x: round(x, 4)) 

    # returns the formatted SINGLE dataframe 
    return df 

# I expect this is where the confusion comes from 
# this is a python (not pandas) function that applies the 
# function clean_df to each item in table_list 
# and returns a list of the results. 
# map was also used in the clean_df function above. That map was 
# a pandas map and not the same function as this map. There do similar 
# things, but not exactly. 
map(clean_data, table_list) 

希望有所幫助。

+2

值得一提的是,map(lambda df:numpy.round(df,4),table_list)'(可能)完成了與整個腳本相同的事情嗎? –

+0

這只是一個例子。真正的代碼有很多其他細節。謝謝您的幫助。 – qqzj