使用spark.sql和Cloudant

計算偏我有下面一段代碼的問題：使用spark.sql和Cloudant

def skewTemperature(cloudantdata,spark): 
    return spark.sql("""SELECT (1/count(temperature)) * (sum(POW(temperature-%s,3))/pow(%s,3)) as skew from washing""" %(meanTemperature(cloudantdata,spark),sdTemperature(cloudantdata,spark))).first().skew

meanTemperature和sdTemperature都工作正常，但與上面的查詢我收到以下錯誤：

Py4JJavaError: An error occurred while calling o2849.collectToPython. 
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 315.0 failed 10 times, most recent failure: Lost task 3.9 in stage 315.0 (TID 1532, yp-spark-dal09-env5-0045): java.lang.RuntimeException: Database washing request error: {"error":"too_many_requests","reason":"You've exceeded your current limit of 5 requests per second for query class. Please try later.","class":"query","rate":5

有誰知道如何解決這個問題？

來源

2017-09-06 Navin Deshpande

請問清楚問題不清楚 – Kondal

錯誤表明您超出了查詢類的Cloudant API調用閾值，對於您正在使用的服務計劃，該閾值似乎爲5 /秒。一個潛在的解決方案是通過定義jsonstore.rdd.partitions配置屬性來限制分區的數目，如示於下火花2例如：

spark = SparkSession\  
     .builder\  
     .appName("Cloudant Spark SQL Example in Python using dataframes")\ 
     .config("cloudant.host","ACCOUNT.cloudant.com")\  
     .config("cloudant.username", "USERNAME")\  
     .config("cloudant.password","PASSWORD")\  
     .config("jsonstore.rdd.partitions", 5)\  
     .getOrCreate()

開始用5和工作的方式下降到1應的誤差仍然存在。該設置基本上限制了將向Cloudant發送多少個併發請求。如果設置爲1不能解決問題，則可能需要考慮升級到具有較大閾值的服務計劃。

來源

2017-09-06 18:32:42 ptitzler

使用spark.sql和Cloudant

回答

相關問題