1

從數據框查詢時,我嘗試使用rlike而沒有太多成功。Spark SQL像查找所有跟蹤數字的字符串

樣本數據:

column_a|column_b 
1|abc xyz 
2|123 abc xyz 
3|abc 123 xyz 
4|abc 123 
5|xyz 123 

預期輸出:

column_a|column_b 
4|abc 123 
5|xyz 123 

我曾嘗試:

select * from table_1 where column_b rlike '\d+$' (select * from table_1 where column_b rlike '/\d+$') 

輸出(沒有結果):

column_a|column_b 

我也試過:

select * from table_1 where column_b rlike '\d*$' (select * from table_1 where column_b rlike '/\d*$') 

輸出(所有行):

column_a|column_b 
1|abc xyz 
2|123 abc xyz 
3|abc 123 xyz 
4|abc 123 
5|xyz 123 

是我的正則表達式不正確的?我已經測試過使用python和在線測試器,它看起來是正確的。還是喜歡支持一些特定的正則表達式?

回答

2

您需要多一點逃避才能使其工作。特別是:

spark.sql("SELECT 'abc 123' RLIKE '\\\\d+$'").show() 
+------------------+ 
|abc 123 RLIKE \d+$| 
+------------------+ 
|    true| 
+------------------+ 
spark.sql("SELECT '123 abc xyz' RLIKE '\\\\d+$'").show() 
+----------------------+ 
|123 abc xyz RLIKE \d+$| 
+----------------------+ 
|     false| 
+----------------------+ 
相關問題