2016-09-16 22 views
-1

我有以下格式的表:什麼是適當的nokogiri xpath來獲得一系列的行?

<tr class="style6"><td>SomeStuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr class="style6"><td>SomeStuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr><td>Some other stuff</td></tr> 
<tr><td>Some other stuff</td></tr> 

我想行(開始用style6類到最後一行的下一個style6發生之前)塊分成,我可以遍歷組。有沒有將它分成塊的方法?我知道Xpath position函數,但不確定它在這種情況下是否合理。

任何想法?

回答

-1

一個有用的模式是計算以前出現的<tr class="style6"><td>SomeStuff</td></tr>

對於您的示例中的第一組,這將是:

//tr[not(@class="style6")][count(preceding-sibling::tr[@class="style6"])=1]

對於第二組:

//tr[not(@class="style6")][count(preceding-sibling::tr[@class="style6"])=2]

我不使用nokogiri所以這裏有一個例子使用Python和lxml

>>> import lxml.html 
>>> from pprint import pprint 

>>> doc = lxml.html.fromstring('''<tr class="style6"><td>SomeStuff</td></tr> 
... <tr><td>Some other stuff group 1</td></tr> 
... <tr><td>Some other stuff group 1</td></tr> 
... <tr><td>Some other stuff group 1</td></tr> 
... <tr><td>Some other stuff group 1</td></tr> 
... <tr><td>Some other stuff group 1</td></tr> 
... <tr class="style6"><td>SomeStuff</td></tr> 
... <tr><td>Some other stuff group 2</td></tr> 
... <tr><td>Some other stuff group 2</td></tr> 
... <tr><td>Some other stuff group 2</td></tr> 
... <tr><td>Some other stuff group 2</td></tr> 
... <tr><td>Some other stuff group 2</td></tr> 
... <tr class="style6"><td>SomeStuff</td></tr> 
... <tr><td>Some other stuff group 3</td></tr> 
... <tr><td>Some other stuff group 3</td></tr> 
... <tr><td>Some other stuff group 3</td></tr> 
... <tr><td>Some other stuff group 3</td></tr> 
... <tr><td>Some other stuff group 3</td></tr>''') 

>>> pprint(list(lxml.html.tostring(row) 
...   for row in doc.xpath(''' 
...     //tr[not(@class="style6")] 
...      [count(preceding-sibling::tr[@class="style6"])=1]'''))) 
[b'<tr><td>Some other stuff group 1</td></tr>\n', 
b'<tr><td>Some other stuff group 1</td></tr>\n', 
b'<tr><td>Some other stuff group 1</td></tr>\n', 
b'<tr><td>Some other stuff group 1</td></tr>\n', 
b'<tr><td>Some other stuff group 1</td></tr>\n'] 
>>> pprint(list(lxml.html.tostring(row) 
...   for row in doc.xpath(''' 
...     //tr[not(@class="style6")] 
...      [count(preceding-sibling::tr[@class="style6"])=2]'''))) 
[b'<tr><td>Some other stuff group 2</td></tr>\n', 
b'<tr><td>Some other stuff group 2</td></tr>\n', 
b'<tr><td>Some other stuff group 2</td></tr>\n', 
b'<tr><td>Some other stuff group 2</td></tr>\n', 
b'<tr><td>Some other stuff group 2</td></tr>\n'] 
>>> pprint(list(lxml.html.tostring(row) 
...   for row in doc.xpath(''' 
...     //tr[not(@class="style6")] 
...      [count(preceding-sibling::tr[@class="style6"])=3]'''))) 
[b'<tr><td>Some other stuff group 3</td></tr>\n', 
b'<tr><td>Some other stuff group 3</td></tr>\n', 
b'<tr><td>Some other stuff group 3</td></tr>\n', 
b'<tr><td>Some other stuff group 3</td></tr>\n', 
b'<tr><td>Some other stuff group 3</td></tr>'] 
>>> 
相關問題