2017-10-06 34 views
0

請在下面找到代碼並讓我知道如何將列名更改爲小寫。我嘗試過使用ColumnRename,但是我必須爲每列執行並輸入所有列名稱。我只是想在列上做,所以我不想提及所有列名稱,因爲它們有太多。Spark Scala CSV列名爲小寫

斯卡拉版本:2.11 星火:2.2

import org.apache.spark.sql.SparkSession 
import org.apache.log4j.{Level, Logger} 
import com.datastax 


import org.apache.spark.SparkContext 
import org.apache.spark.SparkConf 
import com.datastax.spark.connector._ 
import org.apache.spark.sql._ 

object dataframeset { 

    def main(args: Array[String]): Unit = { 

    val conf = new SparkConf().setAppName("Sample1").setMaster("local[*]") 
    val sc = new SparkContext(conf) 
    sc.setLogLevel("ERROR") 
    val rdd1 = sc.cassandraTable("tdata", "map3") 
    Logger.getLogger("org").setLevel(Level.ERROR) 
    Logger.getLogger("akka").setLevel(Level.ERROR) 
    val spark1 = org.apache.spark.sql.SparkSession.builder().master("local").config("spark.cassandra.connection.host","127.0.0.1") 
     .appName("Spark SQL basic example").getOrCreate() 

    val df = spark1.read.format("csv").option("header","true").option("inferschema", "true").load("/Users/Desktop/del2.csv") 
    import spark1.implicits._ 
    println("\nTop Records are:") 
    df.show(1) 


    val dfprev1 = df.select(col = "sno", "year", "StateAbbr") 

    dfprev1.show(1) 
} 
} 

取出需要數量的認沽:

|sno|year|stateabbr| statedesc|cityname|geographiclevel 

All the Columns names should be in lower case. 

實際輸出:

Top Records are: 
+---+----+---------+-------------+--------+---------------+----------+----------+--------+--------------------+---------------+---------------+--------------------+----------+--------------------+---------------------+--------------------------+-------------------+---------------+-----------+----------+---------+--------+---------+-------------------+ 
|sno|year|StateAbbr| StateDesc|CityName|GeographicLevel|DataSource| category|UniqueID|    Measure|Data_Value_Unit|DataValueTypeID|  Data_Value_Type|Data_Value|Low_Confidence_Limit|High_Confidence_Limit|Data_Value_Footnote_Symbol|Data_Value_Footnote|PopulationCount|GeoLocation|categoryID|MeasureId|cityFIPS|TractFIPS|Short_Question_Text| 
+---+----+---------+-------------+--------+---------------+----------+----------+--------+--------------------+---------------+---------------+--------------------+----------+--------------------+---------------------+--------------------------+-------------------+---------------+-----------+----------+---------+--------+---------+-------------------+ 
| 1|2014|  US|United States| null|    US|  BRFSS|Prevention|  59|Current lack of h...|    %|  AgeAdjPrv|Age-adjusted prev...|  14.9|    14.6|     15.2|      null|    null|  308745538|  null| PREVENT| ACCESS2| null|  null| Health Insurance| 
+---+----+---------+-------------+--------+---------------+----------+----------+--------+--------------------+---------------+---------------+--------------------+----------+--------------------+---------------------+--------------------------+-------------------+---------------+-----------+----------+---------+--------+---------+-------------------+ 
only showing top 1 row 

+---+----+---------+ 
|sno|year|StateAbbr| 
+---+----+---------+ 
| 1|2014|  US| 
+---+----+---------+ 
only showing top 1 row 

回答

1

只需使用toDF

df.toDF(df.columns map(_.toLowerCase): _*) 
+0

我明白了。謝謝你。 –

0

實現它的其他方法是使用FoldLeft方法。

val myDFcolNames = myDF.columns.toList 
val rdoDenormDF = myDFcolNames.foldLeft(myDF)((myDF, c) => 
    myDF.withColumnRenamed(c.toString.split(",")(0), c.toString.toLowerCase()))