2016-09-15 98 views
1

我有一個數據框,並希望在特定位置插入空格。下面是數據的一個例子:在字符串中的特定位置插入空格

0MHOCAN000006026421HOCAN000000392457HOCAN000005311227 
0FHOUSA000002272874HOUSA000002272874HOUSA000050206641 
0MHOUSA000002272874HOUSA000002076121HOUSA000014569699 

,這裏是什麼,我想(任何字母H之間有一個空格):

0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227 
0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641 
0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699 
+2

看看被接受的答案,你應該提到你正在從文件中讀取數據。 –

回答

2

我們可以使用固定寬度閱讀:

基本功能read.fwf

x1 <- read.fwf("temp.txt", 
       widths = c(2, 17, 17, 17), 
       col.names = paste0("myColName",1:4), 
       stringsAsFactors = FALSE) 
# check output 
str(x1) 
# 'data.frame': 3 obs. of 4 variables: 
# $ myColName1: chr "0M" "0F" "0M" 
# $ myColName2: chr "HOCAN000006026421" "HOUSA000002272874" "HOUSA000002272874" 
# $ myColName3: chr "HOCAN000000392457" "HOUSA000002272874" "HOUSA000002076121" 
# $ myColName4: chr "HOCAN000005311227" "HOUSA000050206641" "HOUSA000014569699" 
x1 
# myColName1  myColName2  myColName3  myColName4 
# 1   0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227 
# 2   0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641 
# 3   0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699 

使用read_fwfreadr包:

library(readr) 

x2 <- read_fwf("temp.txt", 
       fwf_widths(c(2, 17, 17, 17), 
          col_names = paste0("myColName",1:4))) 
# check output 
str(x2) 
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 4 variables: 
# $ myColName1: chr "0M" "0F" "0M" 
# $ myColName2: chr "HOCAN000006026421" "HOUSA000002272874" "HOUSA000002272874" 
# $ myColName3: chr "HOCAN000000392457" "HOUSA000002272874" "HOUSA000002076121" 
# $ myColName4: chr "HOCAN000005311227" "HOUSA000050206641" "HOUSA000014569699" 
# - attr(*, "spec")=List of 2 
# ..$ cols :List of 4 
# .. ..$ myColName1: list() 
# .. .. ..- attr(*, "class")= chr "collector_character" "collector" 
# .. ..$ myColName2: list() 
# .. .. ..- attr(*, "class")= chr "collector_character" "collector" 
# .. ..$ myColName3: list() 
# .. .. ..- attr(*, "class")= chr "collector_character" "collector" 
# .. ..$ myColName4: list() 
# .. .. ..- attr(*, "class")= chr "collector_character" "collector" 
# ..$ default: list() 
# .. ..- attr(*, "class")= chr "collector_guess" "collector" 
# ..- attr(*, "class")= chr "col_spec" 
x2 
# # A tibble: 3 × 4 
# myColName1  myColName2  myColName3  myColName4 
#  <chr>    <chr>    <chr>    <chr> 
# 1   0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227 
# 2   0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641 
# 3   0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699 

這些解決方案應該工作,即使ID不以字母H開始,ID可以包含多個H

5

您可以使用gsub以固定字符串替換:

x <- c("0MHOCAN000006026421HOCAN000000392457HOCAN000005311227", 
"0FHOUSA000002272874HOUSA000002272874HOUSA000050206641", 
"0MHOUSA000002272874HOUSA000002076121HOUSA000014569699") 
gsub("H", " H", x, fixed=TRUE) 

參見R demo

輸出:

[1] "0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227" 
[2] "0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641" 
[3] "0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699" 

如果您的數據幀df列名是col1,你可以使用

df$col1 = gsub("H", " H", df$col1, fixed=TRUE) 
相關問題