2017-06-30 33 views
1

我有我的數據幀鬧分裂列:分割數據框中的列?

2010 census 2014 land area   city 
8175133  302.6 sq mi 783.8 km2  New york 
3792621  468.7 sq mi 1213.9 km2 Los Angeles 
2695598  227.7 sq mi 589.6 km2  Chicago 

我想:

2010 census area sq/mi  area sq/km   city 
8175133  302.6   783.8    New york 
3792621  468.7   1213.9    Los Angeles 
2695598  227.7   589.6    Chicago 
+0

請認真閱讀[如何在R中創建一個很好的重現示例](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)。 – Masoud

+0

*「我有麻煩」* ...有什麼問題?如果您顯示您嘗試過的代碼以及您收到的錯誤或輸出,這將有所幫助。 (請閱讀Masoud提供的鏈接,它非常有幫助,並且會提高更快得到答案的可能性。) – r2evans

+0

另一個基礎R:'with(df,cbind(df,do.call(rbind,regmatches(land.area ,gregexpr(「[0-9] + \\。* [0-9] +」,land.area)))))' – user20650

回答

1

這是一個tidyr解決方案。

library(tidyr) 

df <- data.frame(census = c(8175133, 3792621, 2695598), 
       land.area = c("302.6 sq mi 783.8 km2", "468.7 sq mi 1213.9 km2", "227.7 sq mi 589.6 km2"), 
       city = c("New york","Los Angeles","Chicago"), stringsAsFactors = FALSE) 

df$land.area <- sapply(df$land.area, sub, pattern = " km2", replacement = "") 

df <- df %>% separate(col = land.area, into = c("area sq/mi", "area sq/km"), sep = " sq mi ") 
1

您可以使用stringr::str_split_fixed

library(stringr) 
splitted <- str_split_fixed(dt$X2014.land.area, " sq mi ", 2) 

splitted[,2] <- gsub(" km2", "", as.character(splitted[,2])) 

colnames(splitted) <- c("area sq. mi", "area sq km") 

splitted <- data.frame(splitted) 


dt.2 <- cbind(dt[,c(1,3)], splitted) 

dt.2 

# X2010.census  city area.sq..mi area.sq.km 
# 1  8175133 New york  302.6  783.8 
# 2  3792621 Los Angeles  468.7  1213.9 
# 3  2695598  Chicago  227.7  589.6 

數據

structure(list(X2010.census = c(8175133L, 3792621L, 2695598L), 
    X2014.land.area = c("302.6 sq mi 783.8 km2", "468.7 sq mi 1213.9 km2", 
    "227.7 sq mi 589.6 km2"), city = c("New york", "Los Angeles", 
    "Chicago")), .Names = c("X2010.census", "X2014.land.area", 
    "city"), row.names = c(NA, -3L), class = "data.frame") -> dt 
0

下面是使用sub

dt$Area_SqMi = sub("\\s*sq\\s*mi.*", "", dt$X2014.land.area) 
dt$Area_km2 = sub(".*mi\\s+(\\S+)\\s+km2.*", "\\1", dt$X2014.land.area) 

dt 
    X2010.census  X2014.land.area  city Area_SqMi Area_km2 
1  8175133 302.6 sq mi 783.8 km2 New york  302.6 783.8 
2  3792621 468.7 sq mi 1213.9 km2 Los Angeles  468.7 1213.9 
3  2695598 227.7 sq mi 589.6 km2  Chicago  227.7 589.6 

當然,一個基礎R解決方案,如果你想擺脫原始列的,你可以添加dt = dt[,-2]

0

使用data.tabletstrsplit功能從該包裝:

dat <- fread("2010 census, 2014 land area, city 
       8175133, 302.6 sq mi 783.8 km2, New york 
       3792621, 468.7 sq mi 1213.9 km2, Los Angeles 
       2695598, 227.7 sq mi 589.6 km2, Chicago") 
dat[, c("area sq/mi", "area sq/km") := tstrsplit(`2014 land area`, " ", keep = c(1,4))] 
dat[, .(`2010 census`, `area sq/mi`, `area sq/km`, city)] 

# 2010 census area sq/mi area sq/km  city 
# 1:  8175133  302.6  783.8 New york 
# 2:  3792621  468.7  1213.9 Los Angeles 
# 3:  2695598  227.7  589.6  Chicago