這裏是我的html的開始:如何使用Nokogiri訪問此節點?
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]><style>v\\:* {behavior:url(#default#VML);}\no\\:* {behavior:url(#default#VML);}\nw\\:* {behavior:url(#default#VML);}\n.shape {behavior:url(#default#VML);}\n</style><![endif]--><style><!--\n/* Font Definitions */\[email protected]\n\t{font-family:"Cambria Math";\n\tpanose-1:2 4 5 3 5 4 6 3 2 4;}\[email protected]\n\t{font-family:Calibri;\n\tpanose-1:2 15 5 2 2 2 4 3 2 4;}\[email protected]\n\t{font-family:Tahoma;\n\tpanose-1:2 11 6 4 3 5 4 4 2 4;}\n/* Style Definitions */\np.MsoNormal, li.MsoNormal, div.MsoNormal\n\t{margin:0in;\n\tmargin-bottom:.0001pt;\n\tfont-size:12.0pt;\n\tfont-family:"Times New Roman","serif";}\na:link, span.MsoHyperlink\n\t{mso-style-priority:99;\n\tcolor:blue;\n\ttext-decoration:underline;}\na:visited, span.MsoHyperlinkFollowed\n\t{mso-style-priority:99;\n\tcolor:purple;\n\ttext-decoration:underline;}\np\n\t{mso-style-priority:99;\n\tmso-margin-top-alt:auto;\n\tmargin-right:0in;\n\tmso-margin-bottom-alt:auto;\n\tmargin-left:0in;\n\tfont-size:12.0pt;\n\tfont-family:"Times New Roman","serif";}\nspan.EmailStyle18\n\t{mso-style-type:personal-reply;\n\tfont-family:"Calibri","sans-serif";\n\tcolor:#1F497D;}\n.MsoChpDefault\n\t{mso-style-type:export-only;\n\tfont-size:10.0pt;}\[email protected] WordSection1\n\t{size:8.5in 11.0in;\n\tmargin:1.0in 1.0in 1.0in 1.0in;}\ndiv.WordSection1\n\t{page:WordSection1;}\n--> </style>
<!--[if gte mso 9]><xml>\n<o:shapedefaults v:ext="edit" spidmax="1026" />\n</xml><![endif]--> <!--[if gte mso 9]> <xml>\n<o:shapelayoutv:ext="edit">\n<o:idmapv:ext="edit"data="1"/>\n</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><p> </p></span></p>
<p class="MsoNormal"><a name="_MailEndCompose"><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><p> </p></span></a></p>
<div><div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in"><p class="MsoNormal"><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> EMAIL SENDER NAME [mailto:[email protected]] <br><b>Sent:</b>!! DATE I NEED TO GRAB HERE !! <br><b>To:</b> EMAIL ADDRESS HERE <br><b>Subject:</b> SUBJECT LINE HERE <p></p></span></p></div></div>
我需要獲取電子郵件的發送日期。以下是我已經試過:
label_tag_name = 'div div p span br b'
if label_tag = @doc.at_css(%Q{#{label_tag_name}:contains("#{label}:")})
@attributes[field] = label_tag.text.gsub("#{label}:",'').gsub("\\n", "").strip
end
我也嘗試了一些更短的路徑在label_tag_name
,基本上添加另一個HTML標記的開始。
雖然每次都發送日期回來nil
。
當您添加示例數據時,請將其作爲示例所需的裸露數量。更多的東西,你會浪費時間回答這些不必要的東西。 –