2014-07-17 53 views
0

我想通過這樣的引入nokogiri獲取某個文件中的數據:查找XML內容

From: XXX <[email protected]> 
To: [email protected] 
Subject: Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. 

----- BEGIN PGP SIGNED MESSAGE ----- 
Hash: SHA1 

Dear [email protected]: 

Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. "Smalleye squaretail antenna codlet dartfish peacock flounder plaice, luminous hake oceanic flyingfish tiger shark, bramble shark, California halibut. Australian prowfish lake chub knifefish African lungfish; southern Dolly Varden pike conger. Gouramie glass catfish loosejaw, three-toothed puffer. Nase ridgehead featherfin knifefish Rattail gulper false brotula Atlantic eel zebra oto. Marlin mahi-mahi freshwater eel false brotula mojarra naked-back knifefish Steve fish bocaccio. Amago kanyu algae eater bullhead shark orangespine unicorn fish bangus, "Pacific cod zander banjo catfish half-gill pejerrey Indian mul." 
<? xml version = "1.0" encoding = "UTF-8"?> 
<Case> 
   <ID> 48456856568 </ ID> 
   <Status> Open </ Status> 
   <Severity> Normal </ Severity> 
</ Case> 
<Complainant> 
   <Entity> Sabertooth </ Entity> 
   <Contact> California halibut </ Contact> 
   <Address> Pacific cod zander banjo catfish half-gill pejerrey Indian mul. </ Address> 
   <phone> +1 (352) 584 8413 </ Phone> 
   <Email> [email protected] </ Email> 
</ Complainant> 
<Service_Provider> 
   <Entity> Hammerjaw pompano </ Entity> 
   <Contact/> 
   <Address/> 
   <Phone/> 
   <Email> [email protected] </ Email> 
</ Service_Provider> 
<Source> 
   <TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp> 
   <IP_Address> 158.01.52.23 </ IP_Address> 
   <Port> 8080 </ Port> 
   <Type> Browser </ Type> 
   <Protocol="IP"/> 
   <UserName/> 
   <Number_Files> 5 </ Number_Files> 
</ Source> 
<Content> 
   <Item> 
   <TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp> 
    <Title> Dolly Varden pike conger </ Title> 
    <FileName> Dolly Varden pike conger </ FileName> 
    <FileSize> 2143534544 </ FileSize> 
    <InfoHash> 67asdv6a6sdv7d7sfb3c32da79dcc9a6cdc70 </ InfoHash> 
   </ Item> 
</ Content> 
<History/> 
<Notes/> 
<Type Retraction="false"/> 
<Verification/> 
</ Infringement> 

----- BEGIN PGP SIGNATURE ----- 
Version: GnuPG 

0zjdfbkHGBVJKhdbvskjdvbhBHSDJvhbvEtqs/WYMcIAL1 +4 ufOjdvXiDLcN1PzM/QJ 
IIj9KCq +/PYuMU6fTd800EOcbRX43RgeX6Qrgu + MDdDbte + CwKZL2Q28IZ0Viv +8 
YItYXdgwhNnUO2QE7jn/g5KXn4v72QqpnsPJjWQVVD12 + h6DDUdaQHMsTdYyYIVD 
Jkc8dPDVTLutVnuK2HZ4wQWRoiIWIMsUzePUht0eWi7DJFOlC5NuwS + E6FuxtgFj 
IwJyCr/dLC/u6YtVCAb37UUSu7k3F5iD3hFTt1RyswK7HBDizV1CHIlc2diARfkL 
CwRpYc/SlpZNgbAXaUzwHhtIQjCuRXQGsXtvDFke4CvM9nGe6Uk095yVOAKla1Y = 
= mVny 
----- END PGP SIGNATURE ----- 

我需要的信息,如源IP是在/來源/ IP_ADDRESS,電子郵件發送者,誰是在地址/電子郵件中,字段位於信件的開頭,即信件本身。如何使用Nokogiri在Ruby中實現它?

我試圖獲取數據IP地址如下:

def ip_address 
ip = Nokogiri :: XML ("mail/*. txt") 
ip.each {| node | 
p node.inner_xml if node.name == "IP_Address" 
} 

但我沒有出去。有沒有人有想法如何從這種類型的文件中獲取數據?

+0

在將整個電子郵件傳遞給nokogiri之前,您是否咬了一口XML? – mudasobwa

+0

@mudasobwa不,我以爲nokogiri在文件中識別xml – D7na

+0

嗯,它不。 – mudasobwa

回答

0

引入nokogiri不會解析郵件,所以你必須擺脫非XML內容:

message = 'From: XXX <[email protected]> 
To: [email protected] 
Subject: Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. 

----- BEGIN PGP SIGNED MESSAGE ----- 
Hash: SHA1 

Dear [email protected]: 

Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. "Smalleye squaretail antenna codlet dartfish peacock flounder plaice, luminous hake oceanic flyingfish tiger shark, bramble shark, California halibut. Australian prowfish lake chub knifefish African lungfish; southern Dolly Varden pike conger. Gouramie glass catfish loosejaw, three-toothed puffer. Nase ridgehead featherfin knifefish Rattail gulper false brotula Atlantic eel zebra oto. Marlin mahi-mahi freshwater eel false brotula mojarra naked-back knifefish Steve fish bocaccio. Amago kanyu algae eater bullhead shark orangespine unicorn fish bangus, "Pacific cod zander banjo catfish half-gill pejerrey Indian mul." 
<? xml version = "1.0" encoding = "UTF-8"?> 
<Case> 
    <ID> 48456856568 </ ID> 
    <Status> Open </ Status> 
    <Severity> Normal </ Severity> 
</ Case> 
<Complainant> 
    <Entity> Sabertooth </ Entity> 
    <Contact> California halibut </ Contact> 
    <Address> Pacific cod zander banjo catfish half-gill pejerrey Indian mul. </ Address> 
    <phone> +1 (352) 584 8413 </ Phone> 
    <Email> [email protected] </ Email> 
</ Complainant> 
<Service_Provider> 
    <Entity> Hammerjaw pompano </ Entity> 
    <Contact/> 
    <Address/> 
    <Phone/> 
    <Email> [email protected] </ Email> 
</ Service_Provider> 
<Source> 
    <TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp> 
    <IP_Address> 158.01.52.23 </ IP_Address> 
    <Port> 8080 </ Port> 
    <Type> Browser </ Type> 
    <Protocol="IP"/> 
    <UserName/> 
    <Number_Files> 5 </ Number_Files> 
</ Source> 
<Content> 
    <Item> 
    <TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp> 
    <Title> Dolly Varden pike conger </ Title> 
    <FileName> Dolly Varden pike conger </ FileName> 
    <FileSize> 2143534544 </ FileSize> 
    <InfoHash> 67asdv6a6sdv7d7sfb3c32da79dcc9a6cdc70 </ InfoHash> 
    </ Item> 
</ Content> 
<History/> 
<Notes/> 
<Type Retraction="false"/> 
<Verification/> 
</ Infringement> 

----- BEGIN PGP SIGNATURE ----- 
Version: GnuPG 

0zjdfbkHGBVJKhdbvskjdvbhBHSDJvhbvEtqs/WYMcIAL1 +4 ufOjdvXiDLcN1PzM/QJ 
IIj9KCq +/PYuMU6fTd800EOcbRX43RgeX6Qrgu + MDdDbte + CwKZL2Q28IZ0Viv +8 
YItYXdgwhNnUO2QE7jn/g5KXn4v72QqpnsPJjWQVVD12 + h6DDUdaQHMsTdYyYIVD 
Jkc8dPDVTLutVnuK2HZ4wQWRoiIWIMsUzePUht0eWi7DJFOlC5NuwS + E6FuxtgFj 
IwJyCr/dLC/u6YtVCAb37UUSu7k3F5iD3hFTt1RyswK7HBDizV1CHIlc2diARfkL 
CwRpYc/SlpZNgbAXaUzwHhtIQjCuRXQGsXtvDFke4CvM9nGe6Uk095yVOAKla1Y = 
= mVny 
----- END PGP SIGNATURE ----- 
' 

這是如何將消息分解成XML的:

require 'nokogiri' 
xml = message[/(<\? xml .+)----- BEGIN/m, 1] 
doc = Nokogiri::XML::DocumentFragment.parse(xml) 
doc.at('IP_Address').text # => " 158.01.52.23 " 

mag IC型號是:

xml = message[/(<\? xml .+)----- BEGIN/m, 1] 

,抓住從<? xml----- BEGIN前行的一切。然後Nokogiri::XML::DocumentFragment.parse可以創建一個可搜索的DOM。

0

因爲你似乎只是狩獵的IP地址,我會忘記引入nokogiri:

puts $~[1] if s =~ /<IP_Address>\s*([\d.]+)\s*<\/\s*IP_Address>/m 

會做的伎倆假設文件內容加載中s

s = File.read(...) 

希望它可以幫助。

UPD到grep出來的XML:

xml = $~[1] if s =~ /(<\?\s*xml.*?Infringement>)/m 
+0

def ip s = Nokogiri :: XML(「mail/*。txt」) 如果s =〜/ \ s *([\ d。] +)\ s */m end return nill – D7na

+0

請忘記nokogiri。如果你的文件是'a.txt',只要運行'puts $〜[1] if File.read('a.txt')=〜/ \ s *([\ d。] +)\ s * <\ \ s * IP_Address>/m' – mudasobwa

+0

如何在<?xml之前和結束標記之後清除所有內容,並使用nokogiri進行分析? – D7na