使用R將字符串轉換爲data.frame

-2

我從Excel工作表的一列中提取了1000多行字符串。數據如下（3行）：使用R將字符串轉換爲data.frame

雞（31％）;鴨（16％）;野鴨（14％）;土耳其（10％）;鴿子（4％）;鵝（4％）;野生鳥類（4％）;樹麻雀（2％）

麻雀（2％）

雞（1％）

我需要把數據插入到表（此例如：8列x 3行）。誰能幫忙？

x <- c("Chicken(31%);Duck(16%);Wild duck(14%);Turkey(10%);Pigeon(4%);Goose(4%);Wild bird(4%);Tree sparrow(2%)", 
"Tree sparrow(2%)", "Chicken(1%)")

來源

2015-09-05 Rbeginner

你試過了什麼？分號是列的分隔符嗎？如果一行少於8個條目，你想填寫什麼值？ – dd3

這裏有一個可能的解決方案：

library(qdapTools) 
mtabulate(strsplit(gsub("\\(\\d+%\\)", "", x), ";")) 



## Chicken Duck Goose Pigeon Tree sparrow Turkey Wild bird Wild duck 
## 1  1 1  1  1   1  1   1   1 
## 2  0 0  0  0   1  0   0   0 
## 3  1 0  0  0   0  0   0   0

來源

2015-09-05 19:01:56

則很可能是更簡潔的方式，但你可以嘗試這樣的事：

library(stringi) 
library(data.table) 

# Drop empty lines if any 
txt <- Filter(function(x) !stri_isempty(stri_trim(x)), x) 
# Extract matches 
matches <- stri_match_all_regex(txt, "([\\w\\s]+)\\(([1-9]+)%\\);?") 

matches[[1]] 

##  [,1]    [,2]   [,3] 
## [1,] "Chicken(31%);" "Chicken"  "31" 
## [2,] "Duck(16%);"  "Duck"   "16" 
## [3,] "Wild duck(14%);" "Wild duck" "14" 
## [4,] "Pigeon(4%);"  "Pigeon"  "4" 
## [5,] "Goose(4%);"  "Goose"  "4" 
## [6,] "Wild bird(4%);" "Wild bird" "4" 
## [7,] "Tree sparrow(2%)" "Tree sparrow" "2" 

# Rearrange 
rows <- lapply(
    matches, 
    function(x) setNames(as.list(as.numeric(x[, 3])), x[, 2])) 

rbindlist(rows, fill=TRUE) 

## Chicken Duck Wild duck Pigeon Goose Wild bird Tree sparrow 
## 1:  31 16  14  4  4   4   2 
## 2:  NA NA  NA  NA NA  NA   2 
## 3:  1 NA  NA  NA NA  NA   NA

正則表達式的解釋

([\\w\\s]+) # At least one word character or whitespace *, 1st group 
\\(# Left parenthesis 
([1-9]+) # At least one digit. You can replace + with {1,2}, 2nd group 
% # Percent sign 
\\) # Right parenthesis 
;? # Optional semicolon

*可能是\\w[\\w\\s]+

來源

2015-09-05 19:47:35 zero323

噢，這很有道理 –

非常感謝Tyler Rinker和zero323的指導。 Zero323的代碼正是我想要做的。謝謝你們倆！ – Rbeginner

是否可以解釋正則表達式（[\\ w \\ s] +）\\（（[1-9] +）％\\）？詳情？我調整你的代碼，但是在我的輸出的所有列表中獲得NA。非常感謝！ – Rbeginner

使用R將字符串轉換爲data.frame

回答

相關問題