我有一個文本列,其中包含對客戶和代理之間的電話呼叫的文本記錄的語音。對原始文本值一些文本操作後,說我有類似下面的矢量爲例:(注意在矢量文本開頭的空間)取代r中的正則表達式
text <- " customer:customer text1 agent:agent text 1 customer:customer text2 agent:agent text 2
「
問題:我怎樣才能提取客戶和代理文本翻譯成從原來的源字段兩個獨立的字段(在這種情況下text
矢量)
# desired outputs:
# field for customer texts
"customer text1, customer text2"
# field for agent texts
"agent text1, agent text2"
我有什麼能到目前爲止做(無線? TH上的正則表達式受試者有限的經驗)是:
customerText <- gsub("^ customer:| agent:(.*)", "", text)
customerText
[1] "customer text1"
編輯:
請考慮以下爲基於數據幀的方法可重放代碼,而不是基於一個以上向量。
> callid <- c("1","2")
> conversation <- c(" customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2",
+ " agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9")
> conversationCustomer <- c("customer text 1, customer text 2", "customer text 8, customer text 9")
> conversationAgent <- c("agent text 1, agent text 2", "agent text 8, agent text 9")
> df <- data.frame(callid, conversation)
> dfDesired <- data.frame(callid, conversation, conversationCustomer, conversationAgent)
> rm(callid, conversation, conversationCustomer, conversationAgent)
>
> df
callid conversation
1 1 customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2
2 2 agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9
> dfDesired
callid conversation conversationCustomer conversationAgent
1 1 customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2 customer text 1, customer text 2 agent text 1, agent text 2
2 2 agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9 customer text 8, customer text 9 agent text 8, agent text 9
謝謝!
R爲文本解析?上帝祝福你。 – Matt