我有三個csv文件,我們可以調用a,b和c。文件a具有包括郵政編碼的地理信息。文件b有統計數據。文件c只有郵政編碼。將對象轉換爲熊貓字符串後的關鍵錯誤?
我用大熊貓a
和b
轉換爲我用於連接上,這是那兩個dataframes(intermediate_df
)之間的共享列信息dataframes(a_df
和b_df
)。讀取文件c
並將其轉換爲具有整數類型的zipcode的數據幀。我必須將其轉換爲字符串,以便將zipcode不視爲整數。但是,c_df
將該列轉換爲字符串後將其視爲對象,這意味着我無法在c_df
和intermediate_df之間進行連接以創建final_df。
爲了說明我的意思:
a_data = pd.read_csv("a.csv")
b_data = pd.read_csv("b.csv", dtype={'zipcode': 'str'})
a_df = pd.DataFrame(a_data)
b_df = pd.DataFrame(b_data)
# file c conversion
c_data = pd.read_csv("slcsp.csv", dtype={'zipcode': 'str'})
print ("This is c data types: ", c_data.dtypes)
c_conversion = c_data['zipcode'].apply(str)
print ("This is c_conversion data types: ", c_conversion.dtypes)
c_df = pd.DataFrame(c_conversion)
print ("This is c_df data types: ", c_df.dtypes)
# Joining on the two common columns to avoid duplicates
joined_ab_df = pd.merge(a_df, a_df, on =['state', 'area'])
# Dropping columns that are not needed anymore
ab_for_analysis_df = joined_ab.drop(['county_code','name', 'area'], axis=1)
# Time to analyze this dataframe. Let's pick out only the silver values for
a specific attribute
silver_only_df = (ab_for_analysis_df[filtered_df.metal_name == 'Silver'])
# Getting second lowest value of silver only
sorted_silver = silver_only_df.groupby('zipcode')['rate'].nsmallest(2)
sorted_silver_df = sorted_silver.to_frame()
print ("We cleaned up our data. Let's join the dataframes.")
print ("Final result...")
print (c_df.dtypes)
final_df = pd.merge(sorted_silver_df,c_df, on ='zipcode')
這是我們運行之後得到:
This is c_data types: zipcode object
rate float64
dtype: object
This is c_conversion_data types: object
This is c_df data types: zipcode object
dtype: object
zipcode object
dtype: object
We cleaned up our data. Let's join the dataframes.
This is the final result...
KeyError: 'zipcode'
任何想法,爲什麼它改變了數據類型和我怎麼那麼解決它,所以它所有最後加入?
你可以添加'打印(c_df.columns)'和'打印(sorted_silver_df.columns)' – Dark
所以倒數第二行:'打印(c_df.dtypes)'也不打印?這是奇怪的。我建議使用ipython/jupyter和'%debug'魔術功能,這樣你可以逐步處理這些錯誤。 –
這是一個奇怪的問題。 @AndyHayden。打印c_df.dtypes工程雖然它給出奇怪的結果 –