2015-06-23 22 views
0

我有一個看起來像這樣的字符數據:的R - 組特定列表項

x= c("Clause 1 - AGREEMENT. Buyer agrees to buy, and Seller agrees to sell, the Property described below on the terms and conditions set forth in this contract.", 
     "Clause 2 - Buyer. Buyer, will take title to the Property described below:", 
     "Item 2.1 - Seller. Seller, is the current owner of the Property described below assignable by Buyer without Seller’s prior written consent.", 
     "Clause 3 - Inclusions. The Purchase Price includes the following items: ", 
     "Item 3.1 - Fixtures. If attached to the Property on the date of this Contract, the following items are included unless") 

我tryng將所有項目到列表中的條款。基本上,我希望它這樣做

x[grep("Clause . - ", x)]= c(x[1], paste(x[2], x[3]), paste(x[4], x[5])) 

x= x[grep("Clause . - ", x)] 

但動態。我怎樣才能做到這一點,而不需要指定我想要組合的列表項呢?謝謝你們。

回答

1

首先帶出只是數字:

> nums <- gsub("^..* (\\d+\\.*\\d*) -..*$", "\\1", x, perl = T) 
> nums 
[1] "1" "2" "2.1" "3" "3.1" 

集團他們通過刪除小數位:

> nums <- as.integer(nums) 
> nums 
[1] 1 2 2 3 3 

遍歷這些分組,並將它們粘貼在一起:

> grouped <- tapply(x, nums, paste, collapse='\n') 
> cat(grouped[2]) 
Clause 2 - Buyer. Buyer, will take title to the Property described below: 
Item 2.1 - Seller. Seller, is the current owner of the Property described below assignable by Buyer without Seller’s prior written consent. 
+0

不錯!謝謝!這解決了我70%的問題。但爲了解決其他問題,是否可以使用'grep(「Clause。 - 」,x)''的索引來生成您稱爲nums的變量,而不是'gsub'?直到它找到一個非缺失的號碼爲止的代表號碼 – victor

+0

你可以發佈一個示例,這是不工作?我發佈的模式至少在可能的數字後面至少匹配一個句點和零個或多個其他數字 – Zelazny7

0

我解決了我適應Zelazny提供的答案的問題。隨着數據:

> x= c("Clause 1 - AGREEMENT. Buyer agrees to buy", 
     "Item 1.2 - Seller agrees to sell", 
     "Item 1.2 - the Property described below", 
     "Item 1.3 - on the terms and conditions set forth in this contract", 
     "Item 1.4 - If attached to the Property on the date of this Contract", 
     "Item 1.5 - the following items are included:", 
     "I - property", 
     "II - car", 
     "III - motorcycle", 
     "Clause 2 - Buyer, will take title to the Property described below:", 
     "Item 2.1 - Seller. Seller, is the current owner of the Property", 
     "I - this is binding contract", 
     "Item 2.2 - by Buyer without Seller’s prior written consent.", 
     "Clause 3 - The Purchase Price includes the following items", 
     "Clause 4 - property will be transmited", 
     "Clause 5 - as discribed in", 
     "Each party is signing this agreement on the date stated opposite that party’s signature.", 
     "city, date") 

首先找到那些條款的項目:

> f= grep("Clause . - ", x) 
> f 
[1] 1 10 14 15 16 

由於rep不到風度允許的時間,循環列表上,並重復前面的項目編號爲所有失蹤itens:

> nums= f 
> for (i in 1:length(f)-1){ 
> a= f[i+1]-f[i]-1 #times to repeat the number 
> nums= c(nums, rep(f[i], times= a)) 
> } 
> sort(nums) 
[1] 1 1 1 1 1 1 1 1 1 10 10 10 10 14 15 16 

添加的所有號碼的最後一個子句後:

> nums= sort(c(nums, (1+f[length(f)]):length(x))) 
> nums 
[1] 1 1 1 1 1 1 1 1 1 10 10 10 10 14 15 16 17 18 

最後組子句中的項目:

> grouped <- tapply(x, nums, paste, collapse='\n') 
> cat(grouped[1]) 
Clause 1 - AGREEMENT. Buyer agrees to buy 
Item 1.2 - Seller agrees to sell 
Item 1.2 - the Property described below 
Item 1.3 - on the terms and conditions set forth in this contract 
Item 1.4 - Fixtures. If attached to the Property on the date of this Contract 
Item 1.5 - the following items are included: 
I - property 
II - car 
III - motorcycle