1
目前,我有以下情況。如何使用Pandas庫將一個值與Python中的多個值進行比較?
Excel Data Frame = SQL Data Frame =
________ ________ _______ ___________ _________
|sector| |sector| | hour| | value_cs| value_ps|
-------- -------- ------- ----------- ---------
AXYZ AXYZ 0 78.90 87.10
BYYT RACH 0 87.12 13.90
IOPL IOPL 0 93.10 13.87
XFTR AXYZ 1 27.90 12.87
MANU IOPL 1 23.09 90.09
FRES 2 34.09 12.34
YYYT 2 12.43 32.98
REWT 3 98.09 99.99
我有一個Excel文件和一組SQL結果和我想的部門列的每個值從對所有的值Excel文件從SQL結果的部門列進行比對,結果,如果這兩列的值之間存在匹配,則將SQL結果中的列小時,value_cs和value_ps添加到新數據框中。 注意: SQL結果的數據與Excel文件的數據大小不一樣。
期望的結果
New data frame 1 for value cs
________ ____ ___ ___ ___ ___ ___ ___ ____
|sector| |0| |1| |2| |3| |4| |5| |6| .... |23|
-------- ---- --- ---- --- --- --- ---- ----
AXYZ 78.90 27.90 78.89 54.90 98.23 85.0 45.90 68.23
BYYT 18.94 67.10 65.69 76.32 76.56 56.03 56.23 87.65
IOPL 93.10 23.09 34.29 97.34 34.34 14.54 34.91 23.21
... ...
New data frame 2 for value ps
________ ____ ___ ___ ___ ___ ___ ___ ____
|sector| |0| |1| |2| |3| |4| |5| |6| .... |23|
-------- ---- --- ---- --- --- --- ---- ----
AXYZ 87.10 12.87 49.89 84.90 76.23 15.01 12.90 68.23
BYYT 28.43 27.11 54.69 57.12 19.56 45.12 45.23 47.15
IOPL 13.87 90.09 24.19 47.34 18.34 21.54 67.11 13.61
... ...
我遵循的方法是在SQL結果轉換成數據幀,以及從Excel文件中的數據,但我不知道如何在不執行比較一個for循環,但只能使用Pandas(for循環會花費太多時間來執行計算)。
import pandas as pd
import pypyodbc
from datetime import date
def get_and_compare():
start_date = date.today()
retrieve_values = "[DEV].[CS].[QA_Export] @start_date='{start_date:%Y-%m-%d}'".format(start_date=start_date)
# Connect to the database
db_connection = pypyodbc.connect(driver="{SQL Server}", server="xxx.xxx.xxx.xxx", uid="xxx",
pwd="xxx", Trusted_Connection="No")
# Get the sql result into dataframe
data_frame_sql = pd.read_sql(retrieve_values,db_connection)
#declare new data frames
new_df_one = pd.DataFrame(columns=['sector', 'value cs', 'hour 0', 'hour 1', 'hour 2', 'hour 3', 'hour 4',
'hour 5', 'hour 6', 'hour 7', 'hour 8', 'hour 9', 'hour 10', 'hour 11',
'hour 12', 'hour 13', 'hour 14', 'hour 15', 'hour 16', 'hour 17', 'hour 18',
'hour 19', 'hour 20', 'hour 21', 'hour 22', 'hour 23'])
new_df_two = pd.DataFrame(columns=['sector', 'value ps', 'hour 0', 'hour 1', 'hour 2', 'hour 3', 'hour 4',
'hour 5', 'hour 6', 'hour 7', 'hour 8', 'hour 9', 'hour 10', 'hour 11',
'hour 12', 'hour 13', 'hour 14', 'hour 15', 'hour 16', 'hour 17', 'hour 18',
'hour 19', 'hour 20', 'hour 21', 'hour 22', 'hour 23'])
# Read the Excel file
current_wb = pd.ExcelFile \
("C:\\U\\dev\\testing\\Main values to compare.xlsx")
# Get the specific sheet to compare
working_values = current_wb.parse("Main values")
#Get the column from Excel
sector_from_excel = working_values['sector']
#Comparison to perform
#.... unknown part
所有的建議,意見將不勝感激,以幫助我完成這部分的代碼。