我試圖在由字符串組成的數據框中標識唯一的unicode值。我已經在使用grep功能試過,但我會遇到以下錯誤搜索字符串中的unicode值
Error: '\U' used without hex digits in character string starting ""\U"
甲示例數據幀
time sender message
1 2012-12-04 13:40:00 1 Hello handsome!
2 2012-12-04 13:40:08 1 \U0001f618
3 2012-12-04 14:39:24 1 \U0001f603
4 2012-12-04 16:04:25 2 <image omitted>
73 2012-12-05 06:02:17 1 Haha not white and blue... White with blue eyes \U0001f61c
40619 2015-05-08 10:00:58 1 \U0001f631\U0001f637
grep("\U", dat$messages)
數據
dat <-
structure(list(time = c("2012-12-04 13:40:00", "2012-12-04 13:40:08",
"2012-12-04 14:39:24", "2012-12-04 16:04:25", "2012-12-05 06:02:17",
"2015-05-08 10:00:58"), sender = c(1L, 1L, 1L, 2L, 1L, 1L), message = c("Hello handsome!",
"\U0001f618", "\U0001f603", "<image omitted>", "Haha not white and blue... White with blue eyes \U0001f61c",
"\U0001f631\U0001f637")), .Names = c("time", "sender", "message"
), class = "data.frame", row.names = c("1", "2", "3", "4", "73",
"40619"))
謝謝,這工作。那麼我將如何使用它來提取每行中的單個非ACSII字符? – Andrews
提取你想要使用'gregexpr'而不是'grep'。例如:'m <-gregexpr(「[^ \ 001- \ 177]」,dat $ message); regmatches(dat $ message,m)' – MrFlick