1
我有pyspark
腳本,如下所示。在python中分別找到每行代碼所需的時間
#!/usr/bin/env python
from datetime import datetime
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext
conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)
hivedb='MySql'
table='abc_123'
df = sqlContext.table("{}.{}".format(hivedb,table))
# Register the Data Frame as a TempTable
df.registerTempTable('mytempTable')
#Time:
date=datetime.now().strftime('%Y-%m-%d %H:%M:%S')
#Find min value ID:
min_id = sqlContext.sql("select nvl(min(id),0) as minval from mytempTable").collect()[0].asDict()['minval']
sc.stop()
現在我想分別找出每行代碼所花費的時間。像下面的東西
df = sqlContext.table("{}.{}".format(hivedb,table))
Time taken for `df` to create was 10 seconds
date=datetime.now().strftime('%Y-%m-%d %H:%M:%S')
Time taken for finding `date` was 1 second
min_id = sqlContext.sql("select nvl(min(id),0) as minval from mytempTable").collect()[0].asDict()['minval']
Time taken for `min_id` query to execute was 3 seconds
我該怎麼做到這一點。
如果可能的話,我想打印這些值以及
調用函數之前獲取的時間,得到後函數調用的時間和減去和顯示器它... https://docs.python.org/3/library/time.html#time.time可能會有用。 – MooingRawr