2017-07-15 34 views
1

我需要從下面的HTML代碼中提取下面以粗體顯示的數據中提取HTML數據?如果沒有,我該怎麼做?使用sed

預先感謝您!

+2

我建議使用XML/HTML解析器(xmlstarlet,xmllint ...)。 – Cyrus

+2

您的代碼段無效。請,你可以發佈有效的輸入嗎? –

回答

1

awk試試:

$ cat file 
<div class="name-ad hidden" data-count="91"> 
<div class="name-data-item" data-name="**I NEED TO SCRAPE THIS**" data- 
count="92"> 
<div class="name-data-name">Washington NH</div>     
<div class="name-data-location">Sullivan, Washington, 
NH<br></div><div class="name-data-status">**I NEED TO 
SCRAPE THIS AS WELL**</div> </div> 

$ awk -F\" '/name-data-item/ {print $4}' file 
**I NEED TO SCRAPE THIS** 
+0

你錯過了'**我需要 這是好的**'' –

+0

'awk -F \''/ name-data-(it​​em | status)/ {print $ 4}'file'應該可以解決這兩個問題。 –

1

隨着xmlstarlet,這更vaild HTML(file.html):

<html> 
    <body> 
    <div class="name-ad hidden" data-count="91"> 
     <div class="name-data-item" data-name="**I NEED TO SCRAPE THIS**" data-count="92"> 
     <div class="name-data-name">Washington NH</div>     
     <div class="name-data-location">Sullivan, Washington, NH<br /></div> 
     <div class="name-data-status">**I NEED TO SCRAPE THIS AS WELL**</div> 
     </div> 
    </div> 
    </body> 
</html> 

命令:

xmlstarlet sel --html -t \ 
    -v "//html/body/div/div/@data-name" \ 
    -v "//html/body/div/div/div[@class='name-data-status']" file.html 

輸出:

 
**I NEED TO SCRAPE THIS****I NEED TO SCRAPE THIS AS WELL** 

或以新行:

xmlstarlet sel --html -t \ 
    -v "//html/body/div/div/@data-name" \ 
    -n \ 
    -v "//html/body/div/div/div[@class='name-data-status']" file.html 

輸出:

 
**I NEED TO SCRAPE THIS** 
**I NEED TO SCRAPE THIS AS WELL**