data <- readLines(con = textConnection("014003051906,ETN5080 ,0450,BOLT KIT UPPER SHAFT WITH 5 SPEED,1.000,F
014003051906,ETN5967 ,0460,SENSOR SENSOR FH BACKSHAFT SPEED,1.000,F
014003051906,ETN64267 ,0470,TILT UNIT SENSOR,1.000,F
014003065376,03M7184 ,0020,BOLT - M 8.0 X 1.250 X 20.0 - 8.8-Zinc,4.000,G
014003065376,03M7386 ,0090,BOLT, RD HD SQ SHORT NECK, METRIC,18.000,G
014003065376,14M7296 ,0090,NUT, METRIC, HEX FLANGE,14.000,G"))
pattern <- "^([^,]*),([^,]*),([^,]*),(.*),([^,]*),([^,]*)$"
library(stringr)
str_match(data, pattern)[, - 1]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "014003051906" "ETN5080 " "0450" "BOLT KIT UPPER SHAFT WITH 5 SPEED" "1.000" "F"
# [2,] "014003051906" "ETN5967 " "0460" "SENSOR SENSOR FH BACKSHAFT SPEED" "1.000" "F"
# [3,] "014003051906" "ETN64267 " "0470" "TILT UNIT SENSOR" "1.000" "F"
# [4,] NA NA NA NA NA NA
# [5,] "014003065376" "03M7184 " "0020" "BOLT - M 8.0 X 1.250 X 20.0 - 8.8-Zinc" "4.000" "G"
# [6,] "014003065376" "03M7386 " "0090" "BOLT, RD HD SQ SHORT NECK, METRIC" "18.000" "G"
# [7,] "014003065376" "14M7296 " "0090" "NUT, METRIC, HEX FLANGE" "14.000" "G"
編輯:
正則表達式的解釋對於初學者來說,在平原的話,請原諒不準確:
- 初始
^
和終端$
意味着開始和字符串的結尾。
- Parens用於分組(
str_match()
將提取的組)。
.
表示任何字符,而.*
表示任何字符的任意數量。
[^,]
表示任何不是逗號的字符。
當放在一起時,它的意思是:start of string
- substring without a comma
- comma
(重複3次) - substring possibly containing commas
- comma
- substring without a comma
- comma
- substring without a comma
- end of string
,只有帶括號的組被提取。
你是怎麼看到這些數據的? (從Excel中保存爲CSV?)最好的解決方案是請求將數據保存爲引用數據或使用不同分隔符的格式。 – Benjamin
@Benjamin我想到了這一點。但不幸的是,這是我們唯一的來源。 – darkage
你可以用正則表達式 –