這是我過去4天一直在努力解決的問題。我閱讀谷歌和SOF的教程,但沒有人能幫助我。我在這裏拋出它作爲一個問題,以便其他人可以嘗試並幫助我解決它。我已經用粗糙的方法解決了這個問題,但考慮是否有更聰明的方法。所以有一個文件列出了球軸承及其屬性。它看起來像這樣:匹配模式,將其保存在一個變量中,並使用sed/awk/grep將其追加到行尾
<li class="odd first">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003030&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=1">33030</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003030&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=1&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 59 mm
|<strong>Bore diameter: </strong> 150 mm
|<strong>Outside diameter: </strong> 225 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 2600 r/min
|<strong>Reference speed: </strong> 2000 r/min
</li>
<li class="even ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310000230&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=2">30230</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310000230&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=2&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 49 mm
|<strong>Bore diameter: </strong> 150 mm
|<strong>Outside diameter: </strong> 270 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 2400 r/min
|<strong>Reference speed: </strong> 1800 r/min
</li>
<li class="odd ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003024&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=3">33024</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003024&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=3&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 48 mm
|<strong>Bore diameter: </strong> 120 mm
|<strong>Outside diameter: </strong> 180 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3400 r/min
|<strong>Reference speed: </strong> 2600 r/min
</li>
<li class="even ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003022&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=4">33022</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003022&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=4&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 47 mm
|<strong>Bore diameter: </strong> 110 mm
|<strong>Outside diameter: </strong> 170 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3600 r/min
|<strong>Reference speed: </strong> 2600 r/min
</li>
<li class="odd ">
<a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003220&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=5">33220</a>
|<strong>Product category: </strong> <a href="/productcatalogue/prodlink.html?lang=en&imperial=false&prodid=1310003220&pubid=21&WT.oss=&WT.z_oss_boost=0&WT.z_oss_ref=ProductSearch&WT.z_oss_rank=5&isTableView=true" class="product-table-link">Tapered roller bearings single row</a>
|<strong>Width: </strong> 63 mm
|<strong>Bore diameter: </strong> 100 mm
|<strong>Outside diameter: </strong> 180 mm
|<strong>Source: </strong> -
|<strong>Limiting speed: </strong> 3600 r/min
|<strong>Reference speed: </strong> 2400 r/min
</li>
現在,如果你看看HTML(而不是html本身)的響應。我想解析它,提取href鏈接中的參數(在第一個條目中,href鏈接中有prodid參數,prodid = 1310003030)。如果可能的話,我想在每行的末尾附加整個鏈接。
我想提取它並追加在每行的末尾,使條目看起來像這樣。
33030 |Product category: Tapered roller bearings single row |Width: 59 mm |Bore diameter: 150 mm |Outside diameter: 225 mm |Source: - |Limiting speed: 2600 r/min |Reference speed: 2000 r/min | 1310003030
30230 |Product category: Tapered roller bearings single row |Width: 49 mm |Bore diameter: 150 mm |Outside diameter: 270 mm |Source: - |Limiting speed: 2400 r/min |Reference speed: 1800 r/min | 1310000230
33024 |Product category: Tapered roller bearings single row |Width: 48 mm |Bore diameter: 120 mm |Outside diameter: 180 mm |Source: - |Limiting speed: 3400 r/min |Reference speed: 2600 r/min | 1310003024
33022 |Product category: Tapered roller bearings single row |Width: 47 mm |Bore diameter: 110 mm |Outside diameter: 170 mm |Source: - |Limiting speed: 3600 r/min |Reference speed: 2600 r/min | 1310003022
將HTML刮到機器可讀的輸出中是徒勞的練習。看看你是否可以連接到任何產生HTML的人。 – tripleee 2014-12-13 13:22:42
嗯,我想從我的分析網站刮一些數據。所以我無法控制源代碼。 – 2014-12-13 14:27:34
用正確的工具節省你的時間,試試'xmllint'來完成這項工作 – BMW 2014-12-13 21:55:50