如何獲取Column的名稱或更改現有的名稱？

我必須建立一個函數「removePunctuation」這條標點符號和結果通過這項測試任務：如何獲取Column的名稱或更改現有的名稱？

# TEST Capitalization and punctuation (4b) 
testPunctDF = sqlContext.createDataFrame([(" The Elephant's 4 cats. ",)]) 
testPunctDF.show() 
Test.assertEquals(testPunctDF.select(removePunctuation(col('_1'))).first()[0], 
        'the elephants 4 cats', 
        'incorrect definition for removePunctuation function')

這是我設法寫。

def removePunctuation(column): 
    """Removes punctuation, changes to lower case, and strips leading and trailing spaces. 

    Note: 
     Only spaces, letters, and numbers should be retained. Other characters should should be 
     eliminated (e.g. it's becomes its). Leading and trailing spaces should be removed after 
     punctuation is removed. 

    Args: 
     column (Column): A Column containing a sentence. 

    Returns: 
     Column: A Column named 'sentence' with clean-up operations applied. 
    """ 

    return lower(trim(regexp_replace("column_name", "[\W_]+"," "))).alias("sentence");

但我仍然不能使函數regexp_replace使用別名「句子」。我收到此錯誤：

AnalysisException: u"cannot resolve 'sentence' given input columns: [_1];"

來源

2016-09-03 Dmitrij Kostyushko

我會嘗試：

stringWithPunctuation.translate(None, string.punctuation)

它採用c引擎蓋下，簡直是最好的在效率方面！

你嘗試：

return lower(trim(regexp_replace(, "[\W_]+"," "))).alias("sentence");

似乎並沒有使用參數column任何地方，這也許可以解釋的錯誤。

來源

2016-09-03 17:47:50 gsamaras

哦對不起，在我發佈的代碼中有一個錯誤，在regexp_replace（）第一個參數中必須有bean「column_name」，無論如何，我已經解決了它，但謝謝。 –

@DmitrijKostyushko很高興你解決了它！如果我知道您的問題中的代碼不是您正在使用的代碼，我可能會發布更好的問題。請記住稍後再接受答案。 ;） – gsamaras

令人驚訝的是我只能通過regexp_replace()參數中的列對象而不是列名。

來源

2016-09-03 17:48:31

如何獲取Column的名稱或更改現有的名稱？

回答

相關問題