2015-06-23 22 views

我有一個看起來像這樣的字符數據:的R - 組特定列表項

x= c("Clause 1 - AGREEMENT. Buyer agrees to buy, and Seller agrees to sell, the Property described below on the terms and conditions set forth in this contract.", 
     "Clause 2 - Buyer. Buyer, will take title to the Property described below:", 
     "Item 2.1 - Seller. Seller, is the current owner of the Property described below assignable by Buyer without Seller’s prior written consent.", 
     "Clause 3 - Inclusions. The Purchase Price includes the following items: ", 
     "Item 3.1 - Fixtures. If attached to the Property on the date of this Contract, the following items are included unless") 


x[grep("Clause . - ", x)]= c(x[1], paste(x[2], x[3]), paste(x[4], x[5])) 

x= x[grep("Clause . - ", x)] 





> nums <- gsub("^..* (\\d+\\.*\\d*) -..*$", "\\1", x, perl = T) 
> nums 
[1] "1" "2" "2.1" "3" "3.1" 


> nums <- as.integer(nums) 
> nums 
[1] 1 2 2 3 3 


> grouped <- tapply(x, nums, paste, collapse='\n') 
> cat(grouped[2]) 
Clause 2 - Buyer. Buyer, will take title to the Property described below: 
Item 2.1 - Seller. Seller, is the current owner of the Property described below assignable by Buyer without Seller’s prior written consent. 

不錯!謝謝!這解決了我70%的問題。但爲了解決其他問題,是否可以使用'grep(「Clause。 - 」,x)''的索引來生成您稱爲nums的變量,而不是'gsub'?直到它找到一個非缺失的號碼爲止的代表號碼 – victor


你可以發佈一個示例,這是不工作?我發佈的模式至少在可能的數字後面至少匹配一個句點和零個或多個其他數字 – Zelazny7



> x= c("Clause 1 - AGREEMENT. Buyer agrees to buy", 
     "Item 1.2 - Seller agrees to sell", 
     "Item 1.2 - the Property described below", 
     "Item 1.3 - on the terms and conditions set forth in this contract", 
     "Item 1.4 - If attached to the Property on the date of this Contract", 
     "Item 1.5 - the following items are included:", 
     "I - property", 
     "II - car", 
     "III - motorcycle", 
     "Clause 2 - Buyer, will take title to the Property described below:", 
     "Item 2.1 - Seller. Seller, is the current owner of the Property", 
     "I - this is binding contract", 
     "Item 2.2 - by Buyer without Seller’s prior written consent.", 
     "Clause 3 - The Purchase Price includes the following items", 
     "Clause 4 - property will be transmited", 
     "Clause 5 - as discribed in", 
     "Each party is signing this agreement on the date stated opposite that party’s signature.", 
     "city, date") 


> f= grep("Clause . - ", x) 
> f 
[1] 1 10 14 15 16 


> nums= f 
> for (i in 1:length(f)-1){ 
> a= f[i+1]-f[i]-1 #times to repeat the number 
> nums= c(nums, rep(f[i], times= a)) 
> } 
> sort(nums) 
[1] 1 1 1 1 1 1 1 1 1 10 10 10 10 14 15 16 


> nums= sort(c(nums, (1+f[length(f)]):length(x))) 
> nums 
[1] 1 1 1 1 1 1 1 1 1 10 10 10 10 14 15 16 17 18 


> grouped <- tapply(x, nums, paste, collapse='\n') 
> cat(grouped[1]) 
Clause 1 - AGREEMENT. Buyer agrees to buy 
Item 1.2 - Seller agrees to sell 
Item 1.2 - the Property described below 
Item 1.3 - on the terms and conditions set forth in this contract 
Item 1.4 - Fixtures. If attached to the Property on the date of this Contract 
Item 1.5 - the following items are included: 
I - property 
II - car 
III - motorcycle