獲取一個月,我有一個數據幀以「周」 &「年」列,需要計算月供下同:SPARK SQL:從週數和年
輸入:
+----+----+ |Week|Year| +----+----+ | 50|2012| | 50|2012| | 50|2012|
預期輸出:
+----+----+-----+ |Week|Year|Month| +----+----+-----+ | 50|2012|12 | | 50|2012|12 | | 50|2012|12 |
任何幫助,將不勝感激。由於
獲取一個月,我有一個數據幀以「周」 &「年」列,需要計算月供下同:SPARK SQL:從週數和年
輸入:
+----+----+ |Week|Year| +----+----+ | 50|2012| | 50|2012| | 50|2012|
預期輸出:
+----+----+-----+ |Week|Year|Month| +----+----+-----+ | 50|2012|12 | | 50|2012|12 | | 50|2012|12 |
任何幫助,將不勝感激。由於
感謝@ zero323,誰指出我出到sqlContext.sql查詢,我轉換的查詢如下所示:
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import static org.apache.spark.sql.functions.*;
public class MonthFromWeekSparkSQL {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("MonthFromWeekSparkSQL").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
List myList = Arrays.asList(RowFactory.create(50, 2012), RowFactory.create(50, 2012), RowFactory.create(50, 2012));
JavaRDD myRDD = sc.parallelize(myList);
List<StructField> structFields = new ArrayList<StructField>();
// Create StructFields
StructField structField1 = DataTypes.createStructField("week", DataTypes.IntegerType, true);
StructField structField2 = DataTypes.createStructField("year", DataTypes.IntegerType, true);
// Add StructFields into list
structFields.add(structField1);
structFields.add(structField2);
// Create StructType from StructFields. This will be used to create DataFrame
StructType schema = DataTypes.createStructType(structFields);
DataFrame df = sqlContext.createDataFrame(myRDD, schema);
DataFrame df2 = df.withColumn("yearAndWeek", concat(col("year"), lit(" "), col("week")))
.withColumn("month", month(unix_timestamp(col("yearAndWeek"), "yyyy w").cast(("timestamp")))).drop("yearAndWeek");
df2.show();
}
}
你居然用一年,周格式化爲創建新列「YYYY w「,然後使用unix_timestamp將其轉換爲可以從中看到的月份。
PS:看來,投行爲是火花1.5不正確 - 在這種情況下
因此,它是更普遍的做.cast("double").cast("timestamp")
就我而言,它只是增加時間而不改變月份和年份。請看看gist https://gist.github.com/nareshbab/7d945ccaaae07ca743dec0ea07bb50c0 – nareshbabral
你沒有正確複製代碼,所以請檢查你的代碼! – eliasah
現在感謝它的工作 – nareshbabral
什麼跨2個月跨越星期?不是一個月來推導出一個微弱的變量嗎? –