0
下面類似的描述是銀行對賬單的樣品部分:合併在銀行對賬單輸出
Description<-c(
"EXXONMOBIL 46344172 "
"EXXONMOBIL 97142239 "
"EXXONMOBIL 97523322 "
"EXXONMOBIL 99123183 "
"JIMMY JOHNS - 1236 "
"JIMMY JOHNS - 2453 "
"JIMMY JOHNS # 95612 "
"KWIK FILL 212 "
"KWIK TRIP 245000"
"KWIK TRIP0002342 "
"KWIK TRIP 67200003453 "
"MCDONALD'S F11123 "
"MCDONALD'S F11234 "
"MCDONALD'S F25345 "
"MCDONALD'S F5349 "
)
Debit<-as.numeric(c(
"25.98",
"24.54",
"29.59",
"31.85",
"7.61",
"17.82",
"10.58",
"26.5",
"22.48",
"146.62",
"52.51",
"2.57",
"7.77",
"9.59",
"11.85"
))
df<-data.frame(Description,Debit)
與下面的輸出:
Description Debit
EXXONMOBIL 46946182 25.98
EXXONMOBIL 97302509 24.54
EXXONMOBIL 97585822 29.59
EXXONMOBIL 99374183 31.85
JIMMY JOHNS - 1476 7.61
JIMMY JOHNS - 2763 17.82
JIMMY JOHNS # 90012 10.58
KWIK FILL 228 26.5
KWIK TRIP 24500002451 22.48
KWIK TRIP 146.62
KWIK TRIP 67200006726 52.51
MCDONALD'S F11780 2.57
MCDONALD'S F11883 7.77
MCDONALD'S F25398 9.59
MCDONALD'S F4789 11.85
我wondernig怎麼會是可能的彙總結果由描述,以便獨特的代碼被刪除,我得到像Exxonmobil,吉米瓊斯等每個公司的費用總結量。不知道如果消除所有空白後的一切,消除所有的數字字符,或者(在我看來可能是最好的)得到r所有數字和特殊字符的編號,只保留字母?
以任何方式所需的輸出會是這樣的:
Description Debit
EXXONMOBIL 111.96
JIMMY JOHNS 36.01
KWIK FILL 26.5
KWIK TRIP 221.61
MCDONALD'S 31.78
有什麼建議?
結賬[OpenRefine](http://openrefine.org) –
@BenBolker謝謝,雖然不理想,但它是一個不錯的選擇,它的集羣功能 – Oposum