2016-02-20 124 views
0

HTML文件的腳本標籤中消毒JS腳本,我嘗試使用紅寶石(不ROR)從HTML文件中提取內容上紅寶石

我這樣做:

require 'sanitize' 
require 'nokogiri' 

doc = doc = Nokogiri::HTML(html_document) 
a = Sanitize.fragment(doc.css('body')) 

該提取物的內容<body>標籤內,並刪除所有的html標籤。但不幸的是,JS腳本仍然存在於<script>標籤內。

除html標籤外,我如何移除JS腳本?

回答

1

我假定您正在使用最新版本的Sanitize

html = "<html><head><title></title><style>.red{color:red;}</style></head><body><div>... <b>some content</b> ...</div><script>... a script ...</script></body></html>" 

Sanitize.fragment(html, :remove_contents => ['script']) 
# => ".red{color:red;} ... some content ... " 

Sanitize.fragment(html, :remove_contents => ['script', 'style']) 
# => " ... some content ... " 

請參閱::remove_contents