2012-01-02 75 views
0

我有一個文本文件:如何搜索和匹配文件中的內容?

<table style="background-color: #f3f3f3; font-family: Arial; font-size: 8pt; border-top: #e7e7e7 5px solid" border="0" cellspacing="0" cellpadding="0"> 
    <tbody> 
<tr> 
<td style="padding-bottom: 20px; padding-left: 20px; padding-right: 20px; padding-top: 20px"> 
<p style="color: #b0b0b0"><font color="#808080" size="1"><strong>Important information</strong>: on this communication as it does not purport to be comprehensive. This disclaimer does not purport to exclude any warranties implied by law which may not be lawfully excluded. We have taken precautions to minimise the risk of transmitting software viruses, but we advise you to carry out your own virus checks on any attachment to this e-mail. We cannot accept liability for any loss or damage caused by software </p> 

這是不是網站的轉儲,這是什麼應用程序將到文件中。

我檢查文本文件的方法是這樣的:

def check_email_exists(firstname, email_sub, search_string) 
email_fldr="C:\\Agent\\TestMailFolder" 
email_id="[email protected]" 
Dir.chdir("#{email_fldr}\\#{firstname}") do 
    Dir.glob("#{email_id}*#{email_sub}*") do |filename| 
    File.open(filename) do |file| 
     file.readlines(filename).index("#{search_string}") 
    end 
    end 
    end 
end 

這是行不通的。

我傳遞值爲我的search_string這是字符串。例如,我試圖查看string = "transmitting software"是否在文件中。此外,我正在檢查文件是否包含一些不存在的隨機字符串。在這種情況下,如果它找到並匹配文件中的值,則會通過,如果不能,則會失敗。

回答

0

您的文件包含HTML。對於涉及HTML的90%以上的應用程序,您應該使用解析器。我建議Nokogiri

require 'nokogiri' 

html = <<EOT 
<table style="background-color: #f3f3f3; font-family: Arial; font-size: 8pt; border-top: #e7e7e7 5px solid" border="0" cellspacing="0" cellpadding="0"> 
    <tbody> 
<tr> 
<td style="padding-bottom: 20px; padding-left: 20px; padding-right: 20px; padding-top: 20px"> 
<p style="color: #b0b0b0"><font color="#808080" size="1"><strong>Important information</strong>: on this communication as it does not purport to be comprehensive. This disclaimer does not purport to exclude any warranties implied by law which may not be lawfully excluded. We have taken precautions to minimise the risk of transmitting software viruses, but we advise you to carry out your own virus checks on any attachment to this e-mail. We cannot accept liability for any loss or damage caused by software </p> 
EOT 

doc = Nokogiri::HTML::DocumentFragment.parse(html) 

content = doc.content 

puts content 

,輸出:除了

Important information: on this communication as it does not purport to be comprehensive. This disclaimer does not purport to exclude any warranties implied by law which may not be lawfully excluded. We have taken precautions to minimise the risk of transmitting software viruses, but we advise you to carry out your own virus checks on any attachment to this e-mail. We cannot accept liability for any loss or damage caused by software 

如果你想看看結果包含字符串「傳輸軟件」試試這個:

puts "contains tranmitting software" if (content['transmitting software']) 
+0

謝謝您的回覆,我做得到使用nokogiri的想法。 – user1126946 2012-01-04 01:11:22

+0

我之前沒有提到我正在使用黃瓜來測試這些場景 – user1126946 2012-01-04 01:12:12