2017-08-15 68 views
-1

我在我的csv中有一個列,其中有一個字段「features」。該領域有數據以這種格式將列拆分爲多個字段使用R

{""Air conditioning"",""Elevator"",""Smoke detector""} 
{""Air conditioning"",""Railing Lights"",""Smoke detector""} 
{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""} 

他們是20000點的記錄裏面是不以任何特定的順序領域的「功能」這些字符串。

我怎樣才能把它們分成不同的列,所有的「空調」都屬於第一列,第二列的「電梯」等等。

  a   b  c    d    
air conditioning elevators smokedetectors 
air conditioning elevators smokedetectors washer 
air conditioning elevators smokedetectors washer 
+0

檢查''從包splitstackshape' cSplit'? –

+0

您可以使用'read.csv(text = gsub('[{}]','',txt), header = FALSE, quote ='「'')'其中'txt'是上面的文本單個字符串 – alistaire

回答

0

separatetidyrmutate_atdplyr組合(與拋出gsub):

dfr <- data.frame(features = c('{""Air conditioning"",""Elevator"",""Smoke detector""}', 
           '{""Air conditioning"",""Railing Lights"",""Smoke detector""}', 
           '{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}')) 

library(tidyr) 
library(dplyr) 

# Remove {,}, and quotes (") 
fix_txt <- function(x)gsub("[{]\"|\"|[}]", "", x) 
separate(dfr, features, c("a","b","c"), sep=",", extra="merge") %>% 
mutate_at(vars(a:c), fix_txt) 

    a    b     c 
1 Air conditioning  Elevator  Smoke detector 
2 Air conditioning Railing Lights  Smoke detector 
3 Air conditioning   Washer Dryer,Smoke detector 

需要注意的是額外的字段合併(如第三條記錄),請查看?separate瞭解更多選項。

+0

謝謝。只要您在輸出欄中注意到B的電梯爲1號電梯而3號電梯爲墊圈。如何將所有洗衣機安裝在一列下,將所有電梯安置在另一列下。 – SNT

+0

你原來的問題並沒有真正表明這一點!我認爲我們不得不重新考慮解決方案。 –

0

如前所述,您可以查看「splitstackshape」包,具體來說就是cSplit_e函數。有了它,你可以嘗試:

library(splitstackshape) 
cSplit_e(as.data.table(dfr)[, features := (gsub("[{}\"]", "", features))], 
     "features", ",", mode = "value", type = "character", drop = TRUE) 
## features_Air conditioning features_Dryer features_Elevator features_Railing Lights features_Smoke detector features_Washer 
## 1:   Air conditioning    NA   Elevator      NA   Smoke detector    NA 
## 2:   Air conditioning    NA    NA   Railing Lights   Smoke detector    NA 
## 3:   Air conditioning   Dryer    NA      NA   Smoke detector   Washer 

其中 「DFR」 被定義爲@的Remko的回答是:

dfr <- data.frame(features = c('{""Air conditioning"",""Elevator"",""Smoke detector""}', 
           '{""Air conditioning"",""Railing Lights"",""Smoke detector""}', 
           '{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}'))