3
我想解析一個非常大的文件240Mb,並且必須通過SAX來避免在內存中加載文件。如何使用SAX和Nokogiri?
我的XML看起來像:
<?xml version="1.0" encoding="utf-8"?>
<hotels>
<hotel>
<hotelId>1568054</hotelId>
<hotelFileName>Der_Obere_Wirt_zum_Queri</hotelFileName>
<hotelName>"Der Obere Wirt" zum Queri</hotelName>
<rating>3</rating>
<cityId>34633</cityId>
<cityFileName>Andechs</cityFileName>
<cityName>Andechs</cityName>
<stateId>212</stateId>
<stateFileName>Bavaria</stateFileName>
<stateName>Bavaria</stateName>
<countryCode>DE</countryCode>
<countryFileName>Germany</countryFileName>
<countryName>Germany</countryName>
<imageId>51498149</imageId>
<Address>Georg Queri Ring 9</Address>
<minRate>85.9800</minRate>
<currencyCode>EUR</currencyCode>
<Latitude>48.009423000000</Latitude>
<Longitude>11.214504000000</Longitude>
<NumberOfReviews>16</NumberOfReviews>
<ConsumerRating>4.25</ConsumerRating>
<PropertyType>0</PropertyType>
<ChainID>0</ChainID>
<Facilities>1|3|5|8|22|27|45|49|53|56|64|66|67|139|202|209|213|256|</Facilities>
</hotel>
<hotel>
<hotelId>1658359</hotelId>
<hotelFileName>Seclusions_of_Yallingup</hotelFileName>
<hotelName>"Seclusions" of Yallingup</hotelName>
<rating>4</rating>
<cityId>72257</cityId>
<cityFileName>Yallingup</cityFileName>
<cityName>Yallingup</cityName>
<stateId>172</stateId>
<stateFileName>Western_Australia</stateFileName>
<stateName>Western Australia</stateName>
<countryCode>AU</countryCode>
<countryFileName>Australia</countryFileName>
<countryName>Australia</countryName>
<imageId>53234107</imageId>
<Address>58 Zamia Grove</Address>
<minRate>218.1825</minRate>
<currencyCode>AUD</currencyCode>
<Latitude>-33.691192000000</Latitude>
<Longitude>115.061938999999</Longitude>
<NumberOfReviews>0</NumberOfReviews>
<ConsumerRating>0</ConsumerRating>
<PropertyType>3</PropertyType>
<ChainID>0</ChainID>
<Facilities>3|6|13|14|21|22|28|39|40|41|51|53|54|56|57|58|65|66|141|191|202|204|209|210|211|292|</Facilities>
</hotel>
<hotel>
<hotelId>1491947</hotelId>
<hotelFileName>1_Melrose_Blvd</hotelFileName>
<hotelName>#1 Melrose Blvd</hotelName>
<rating>5</rating>
<cityId>964</cityId>
<cityFileName>Johannesburg</cityFileName>
<cityName>Johannesburg</cityName>
<stateId/>
<stateFileName/>
<stateName/>
<countryCode>ZA</countryCode>
<countryFileName>South_Africa</countryFileName>
<countryName>South Africa</countryName>
<imageId>46777171</imageId>
<Address>1 Melrose Boulevard Melrose Arch</Address>
<minRate/>
<currencyCode>ZAR</currencyCode>
<Latitude>-26.135656000000</Latitude>
<Longitude>28.067751000000</Longitude>
<NumberOfReviews>0</NumberOfReviews>
<ConsumerRating>0</ConsumerRating>
<PropertyType>9</PropertyType>
<ChainID>0</ChainID>
<Facilities>6|7|9|11|12|15|17|18|21|32|34|39|41|42|50|51|56|58|60|140|173|202|293|296|</Facilities>
</hotel>
<hotel>
<hotelId>1726938</hotelId>
<hotelFileName>1_Value_Inn_Clovis</hotelFileName>
<hotelName>#1 Value Inn Clovis</hotelName>
<rating>2</rating>
<cityId>28538</cityId>
<cityFileName>Clovis_New_Mexico</cityFileName>
<cityName>Clovis (New Mexico)</cityName>
<stateId>32</stateId>
<stateFileName>New_Mexico</stateFileName>
<stateName>New Mexico</stateName>
<countryCode>US</countryCode>
<countryFileName>United_States</countryFileName>
<countryName>United States</countryName>
<imageId/>
<Address>1720 Mabry</Address>
<minRate/>
<currencyCode>USD</currencyCode>
<Latitude>34.396549224853</Latitude>
<Longitude>-103.182769775390</Longitude>
<NumberOfReviews>0</NumberOfReviews>
<ConsumerRating>0</ConsumerRating>
<PropertyType>2</PropertyType>
<ChainID>0</ChainID>
<Facilities>6|7|8|18|21|22|27|41|50|52|56|222|281|292|</Facilities>
</hotel>
</hotels>
我試過這段代碼:
class Wikihandler < Nokogiri::XML::SAX::Document
def initialize
# do one-time setup here, called as part of Class.new
end
def start_element(name, attributes = [])
# check the element name here and create an active record object if appropriate
if name == 'hotel'
a = Hash[*attributes]
puts attributes
# more business...
end
end
def characters(s)
# save the characters that appear here and possibly use them in the current tag object
end
def end_element(name)
# check the tag name and possibly use the characters you've collected
# and save your activerecord object now
end
end
parser = Nokogiri::XML::SAX::Parser.new(Wikihandler.new)
parser.parse_file('HotelCombinedXml/Hotels_All.xml')
我可以訪問標記的標籤,但我怎麼能訪問其內容?
謝謝您的幫助! – Sebastien 2012-01-11 10:11:21
Sax機器仍會嘗試首先讀取整個文檔,這對於較大的文件不起作用。 :( – unflores 2013-12-03 14:30:22
你救了我的一天!非常感謝! – sadfuzzy 2015-03-26 09:05:21