2017-07-26 72 views
0

我正在使用pyspark 2.1。下面是我的數據框內容從日期到字符串Pyspark類型轉換問題

expecteddays,date 

139,30.JUl.2017 

134,01.NOV.2018 

我的輸出應該如下

138,30.JUL.2017,<30/SEP/2018,4/FEB/2019> 

最後一列的Poupulation是照顧我的下面模塊dateRangeBetweenget_date

下面是我的代碼

from datetime import datetime 
from datetime import timedelta 
import pandas as pd 
from datetime import timedelta 
from pyspark.sql import SparkSession 
from pyspark import SparkContext 
from pyspark.sql.functions import concat,explode 
from datetime import datetime 
from pyspark.sql.functions import udf 
from pyspark.sql.types import StringType 
from datetime import timedelta 
import pandas as pd 
from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType 
from pyspark.sql import types maintenance_final_join=spark.read.csv('/user/NaveenSri/adh_dev_engg/test.csv',header=True) 

def get_date(dateFormat="%d-%m-%Y", addDays=0 ,timeNow=0): 
    #print('inside get date',timesNow) 
    if (addDays!=0): 
     anotherTime = timeNow + timedelta(days=addDays) 
    else: 
     anotherTime = timeNow 
    return anotherTime.strftime(dateFormat) 
def dateRangebetween(expectedDate , estimatedDays): 
output_format = '%d-%m-%Y' 



dateRangeList =[] 
j=2 
#print('inside Date range',expectedDate) 
rangeEnddate= datetime.strptime(get_date(output_format, 730,expectedDate), '%d-%m-%Y').date() 
#print('rangeEnddate---',rangeEnddate) 
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,expectedDate), '%d-%m-%Y').date() 
#print('calculatedDate----',calculatedDate) 

while(calculatedDate<=rangeEnddate):  
    # print(calculatedDate) 
    #print (estimatedDays) 
    dateRangeList.append(calculatedDate) 
    calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date() 

#print('-----', datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date()) 
return dateRangeList 

dateRange = udf(dateRangebetween, types.ArrayType(types.StringType())) 
addDays=182 
result = maintenance_final_join.withColumn('Part_Dates',dateRange(maintenance_final_join.Expected,maintenance_final_join.estimateddays)).show() 

執行後我得到這個錯誤:

TypeError: coercing to Unicode: need string or buffer, datetime.timedelta found 

回答

1

首先,請問您是否可以修復您的縮進。您的dateRangebetween()功能很難正確讀取。

然而,你的問題是這個:

dateRangeList.append(calculatedDate) 
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays, 
     calculatedDate), '%d-%m-%Y').date() 

你calculatedDate是DateTime對象。然後你將這個對象(不是字符串表示)追加到dateRangeList並返回它。然後在你的主程序中,你試着對一組datetime對象做udf。

我假設你的意圖是使用字符串表示。如果您更改了

dateRangeList.append(calculatedDate.strftime("......")) 

並插入正確的格式字符串代替點,您至少會處理字符串對象而不是日期時間。

+0

非常感謝Hannu的工作。謝謝你的建議 –