我從Nokogiri :: XML :: Reader上使用Xml :: Parser從XML文件中提取條目。我想只抓住「Property/PropertyID/Identification ['OrganizationName'=='northsteppe']」的標籤,但無法找出正確的語法來完成此操作,這裏是我一直在構建的整個耙子任務接下來是一個樣本節點,其中包含所有信息和標籤。任何指導將不勝感激。在特定節點上只抓取具有特定屬性值的條目
================ UPDATE ===============
我解析該文件正在使用中的拉open-uri,因爲它來自外部來源,我只是在本地機器上使用舊版本的硬拷貝,以便在開發過程中加快速度,因爲文件大小爲300MB +。我試圖使用一個SAX解析器,但是這個邏輯似乎有點複雜,我真的能夠掌握髮生了什麼,並且遇到了同樣的問題,這限制了我只抓住那些'northsteppe'作爲Identification標籤中的OrganizationName,我說過,我選擇使用當前的方法嘗試相同的任務,我能夠抓住幾乎所有我需要的信息,我只是錯過了上面提到的最後一部分。
===============抵達儘可能具體=============
所以,我覺得好像描述的確切我正在嘗試預成型的任務將有助於消除任何缺失的空白。任務如下。
從<Identification>
標記中的OraganizationName ='northsteppe'的XML文件中抓取每個屬性,然後分別獲取與每個屬性相關的所有相應信息並將其插入散列。在將單個財產的所有信息收集並放入該散列之後,需要將其作爲單獨條目上載到數據庫,該數據庫已按照其需要的方式構建。一旦該屬性被插入到數據庫中,則耙取任務將移動到Property
的下一個條目,該條目符合<Identification>
標記中具有OrganizationName ='northsteppe'的規範並重復該過程,直到滿足上述列表中的所有屬性規格已插入到數據庫中。這樣做的目的是爲了讓我可以快速搜索Northsteppe屬性的數據,而無需使用XML文件中的每個屬性將系統陷入困境。
最終,我將使用open-uri從該文件的外部源中提取文件,並運行一個cron作業,每6小時執行一次這個rake任務並替換數據庫。
================= CODE =================
namespace :db do
# RAKE TASK DESCRIPTION
desc "Fetch property information and insert it into the database"
# RAKE TASK NAME
task :print_properties => :environment do
require 'rubygems'
require 'nokogiri'
module Xml
class Parser
def initialize(node, &block)
@node = node
@node.each do
self.instance_eval &block
end
end
def name
@node.name
end
def inner_xml
@node.inner_xml.strip
end
def is_start?
@node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
end
def is_end?
@node.node_type == Nokogiri::XML::Reader::TYPE_END_ELEMENT
end
def attribute(attribute)
@node.attribute(attribute)
end
def for_element(name, &block)
return unless self.name == name and is_start?
self.instance_eval &block
end
def inside_element(name=nil, &block)
return if @node.self_closing?
return unless name.nil? or (self.name == name and is_start?)
name = @node.name
depth = @node.depth
@node.each do
return if self.name == name and is_end? and @node.depth == depth
self.instance_eval &block
end
end
end
end
Xml::Parser.new(Nokogiri::XML::Reader(open("app/assets/xml/mits.xml"))) do
inside_element 'Property' do
# OPEN AND PARSE THE <PropertyID> TAG
inside_element 'PropertyID' do
inside_element 'Identification' do
puts attribute_nodes()
end
# OPEN AND PARSE THE <Address> TAG
inside_element 'Address' do
for_element 'AddressLine1' do puts "Street Address: #{inner_xml}" end
for_element 'City' do puts "City: #{inner_xml}" end
for_element 'PostalCode' do puts "Zipcode: #{inner_xml}" end
end
for_element 'MarketingName' do puts "Short Description: #{inner_xml}" end
end
# OPEN AND PARSE THE <Information> TAG
inside_element 'Information' do
for_element 'LongDescription' do puts "Long Description: #{inner_xml}" end
inside_element 'Rents' do
for_element 'StandardRent' do puts "Rent: #{inner_xml}" end
end
end
inside_element 'Fee' do
for_element 'ApplicationFee' do puts "Application Fee: #{inner_xml}" end
end
inside_element 'ILS_Identification' do
for_element 'Latitude' do puts "Latitude: #{inner_xml}" end
for_element 'Longitude' do puts "Longitude: #{inner_xml}" end
end
end
end
end #END INSERT_PROPERTIES TASK
end #END NAMESPACE
和樣品該XML -
<Property IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<PropertyID>
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/>
<Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>
<MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
<WebSite>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</WebSite>
<Address AddressType="property">
<Description>Address of Available Listing</Description>
<AddressLine1>1689 N 4th St </AddressLine1>
<City>Columbus</City>
<State>OH</State>
<PostalCode>43201</PostalCode>
<Country>US</Country>
</Address>
<Phone PhoneType="office">
<PhoneNumber>(614) 299-4110</PhoneNumber>
</Phone>
<Email>[email protected]</Email>
</PropertyID>
<ILS_Identification ILS_IdentificationType="Apartment" RentalType="Market Rate">
<Latitude>39.997694</Latitude>
<Longitude>-82.99903</Longitude>
<LastUpdate Month="11" Day="11" Year="2013"/>
</ILS_Identification>
<Information>
<StructureType>Standard</StructureType>
<UnitCount>1</UnitCount>
<ShortDescription>Spacious House Central Campus OSU, available fall</ShortDescription>
<LongDescription>One of our favorites! This great house is perfect for students or a single family. With huge living and sleeping rooms, there is plenty of space. The kitchen is totally modernized with new appliances, and the bathroom has been updated. Natural woodwork and brick accents are seen within the house, and the decorative mantles. Ceiling fans and mini-blinds are included, as well as a FREE stack washer and dryer. The front and side deck. On site parking available.</LongDescription>
<Rents>
<StandardRent>2000.00</StandardRent>
</Rents>
<PropertyAvailabilityURL>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</PropertyAvailabilityURL>
</Information>
<Fee>
<ProrateType>Standard</ProrateType>
<LateType>Standard</LateType>
<LatePercent>0</LatePercent>
<LateMinFee>0</LateMinFee>
<LateFeePerDay>0</LateFeePerDay>
<NonRefundableHoldFee>0</NonRefundableHoldFee>
<AdminFee>0</AdminFee>
<ApplicationFee>30.00</ApplicationFee>
<BrokerFee>0</BrokerFee>
</Fee>
<Deposit DepositType="Security Deposit">
<Amount AmountType="Actual">
<ValueRange Exact="2000.00" Currency="USD"/>
</Amount>
</Deposit>
<Policy>
<Pet Allowed="false"/>
</Policy>
<Phase IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<Description/>
<UnitCount>1</UnitCount>
<RentableUnits>1</RentableUnits>
<TotalSquareFeet>0</TotalSquareFeet>
<RentableSquareFeet>0</RentableSquareFeet>
</Phase>
<Building IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<Description/>
<UnitCount>1</UnitCount>
<SquareFeet>0</SquareFeet>
</Building>
<Floorplan IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<UnitCount>1</UnitCount>
<Room RoomType="Bedroom">
<Count>4</Count>
<Comment/>
</Room>
<Room RoomType="Bathroom">
<Count>1</Count>
<Comment/>
</Room>
<SquareFeet Min="0" Max="0"/>
<MarketRent Min="2000" Max="2000"/>
<EffectiveRent Min="2000" Max="2000"/>
</Floorplan>
<ILS_Unit IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Units>
<Unit>
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="UL Portfolio"/>
<MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
<UnitBedrooms>4</UnitBedrooms>
<UnitBathrooms>1.0</UnitBathrooms>
<MinSquareFeet>0</MinSquareFeet>
<MaxSquareFeet>0</MaxSquareFeet>
<SquareFootType>internal</SquareFootType>
<UnitRent>2000.00</UnitRent>
<MarketRent>2000.00</MarketRent>
<Address AddressType="property">
<AddressLine1>1689 N 4th St </AddressLine1>
<City>Columbus</City>
<PostalCode>43201</PostalCode>
<Country>US</Country>
</Address>
</Unit>
</Units>
<Availability>
<VacateDate Month="7" Day="23" Year="2014"/>
<VacancyClass>Unoccupied</VacancyClass>
<MadeReadyDate Month="7" Day="23" Year="2014"/>
</Availability>
<Amenity AmenityType="Other">
<Description>All new stainless steel appliances! Refinished hardwood floors</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Ceramic tile</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Ceiling fans</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Wrap-around porch</Description>
</Amenity>
<Amenity AmenityType="Dryer">
<Description>Free Washer and Dryer</Description>
</Amenity>
<Amenity AmenityType="Washer">
<Description>Free Washer and Dryer</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>off-street parking available</Description>
</Amenity>
</ILS_Unit>
<File Active="true" FileID="820982141">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/31077069-6e81-4373-8a89-508c57585543/medium.jpg</Src>
<Width>360</Width>
<Height>300</Height>
<Rank>1</Rank>
</File>
<File Active="true" FileID="820982145">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/84e1be40-96fd-4717-b75d-09b39231a762/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>2</Rank>
</File>
<File Active="true" FileID="820982149">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/cd419635-c37f-4676-a43e-c72671a2a748/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>3</Rank>
</File>
<File Active="true" FileID="820982152">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/6b68dbd5-2cde-477c-99d7-3ca33f03cce8/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>4</Rank>
</File>
<File Active="true" FileID="820982155">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/17b6c7c0-686c-4e46-865b-11d80744354a/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>5</Rank>
</File>
<File Active="true" FileID="820982157">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/3545ac8b-471f-404a-94b2-fcd00dd16e25/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>6</Rank>
</File>
<File Active="true" FileID="820982160">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/02471172-2183-4bf1-a3d7-33415f902c1c/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>7</Rank>
</File>
</Property>
http://amolnpujari.wordpress.com/2012/03/31/reading_huge_xml-rb/ 我還發現,在閱讀大型XML時,牛比nokogiri快5倍。 另外我有一個包裝器,它只是讓你用ox來搜索大的xml,允許你迭代指定的元素。 https://gist.github.com/amolpujari/5966431 –