2017-06-26 17 views
1

我有pyspark腳本,如下所示。在python中分別找到每行代碼所需的時間

#!/usr/bin/env python 

from datetime import datetime 
from pyspark import SparkContext, SparkConf 
from pyspark.sql import HiveContext 

conf = SparkConf() 
sc = SparkContext(conf=conf) 
sqlContext = HiveContext(sc) 

hivedb='MySql' 
table='abc_123' 

df = sqlContext.table("{}.{}".format(hivedb,table)) 

# Register the Data Frame as a TempTable 
df.registerTempTable('mytempTable') 

#Time: 
date=datetime.now().strftime('%Y-%m-%d %H:%M:%S') 

#Find min value ID: 
min_id = sqlContext.sql("select nvl(min(id),0) as minval from mytempTable").collect()[0].asDict()['minval'] 

sc.stop() 

現在我想分別找出每行代碼所花費的時間。像下面的東西

df = sqlContext.table("{}.{}".format(hivedb,table)) 

Time taken for `df` to create was 10 seconds 

date=datetime.now().strftime('%Y-%m-%d %H:%M:%S') 

Time taken for finding `date` was 1 second 

min_id = sqlContext.sql("select nvl(min(id),0) as minval from mytempTable").collect()[0].asDict()['minval'] 

Time taken for `min_id` query to execute was 3 seconds 

我該怎麼做到這一點。

如果可能的話,我想打印這些值以及

+0

調用函數之前獲取的時間,得到後函數調用的時間和減去和顯示器它... https://docs.python.org/3/library/time.html#time.time可能會有用。 – MooingRawr

回答

1

您可以使用內置的cProfile。如果你希望顯示的信息,您可以使用Snakeviz

TLDR: 與python -m cProfile [-o output_file] [-s sort_order] myscript.py命令運行腳本和下載Snakeviz和運行snakeviz output_file

相關問題