我使用Nokogiri和Ruby 1.9.2解析XML文件。在閱讀Descriptions
(下面)之前,一切似乎都正常工作。文本正在被截斷。輸入的文字:爲什麼Nokogiri會截斷這個元素?
<Value>The Copthorne Aberdeen enjoys a location proximate to several bars, restaurants and other diversions. This Aberdeen hotel is located on the city’s West End, roughly a mile from the many opportunities to engage in sightseeing or simply shopping the day away. The Aberdeen International Airport is approximately 10 miles from the Copthorne Hotel in Aberdeen.
There are 89 rooms in total at the Copthorne Aberdeen Hotel. Each of the is provided with direct-dial telephone service, trouser presses, coffee and tea makers and a private bath with a bathrobe and toiletries courtesy of the hotel. The rooms are light in color.
The Hotel Copthorne Aberdeen offers its guests a restaurant where they can enjoy their meals in a somewhat formal setting. For something more laid-back, guests may have a drink and a light meal in the hotel bar. This hotel does offer business services and there are rooms for meetings located onsite. The hotel also provides a secure parking facility for those who arrive by private car.</Value>
但不是我越來越:
g. For something more laid-back, guests may have a drink and a light meal in the hotel bar. This hotel does offer business services and there are rooms for meetings located onsite. The hotel also provides a secure parking facility for those who arrive by private car.
注意到它開始於g.
這是離開過一半以上。
下面是完整的XML文件:
<?xml version="1.0" encoding="utf-8"?>
<Hotel>
<HotelID>1040900</HotelID>
<HotelFileName>Copthorne_Hotel_Aberdeen</HotelFileName>
<HotelName>Copthorne Hotel Aberdeen</HotelName>
<CityID>10</CityID>
<CityFileName>Aberdeen</CityFileName>
<CityName>Aberdeen</CityName>
<CountryCode>GB</CountryCode>
<CountryFileName>United_Kingdom</CountryFileName>
<CountryName>United Kingdom</CountryName>
<StarRating>4</StarRating>
<Latitude>57.146068572998</Latitude>
<Longitude>-2.111680030823</Longitude>
<Popularity>1</Popularity>
<Address>122 Huntly Street</Address>
<CurrencyCode>GBP</CurrencyCode>
<LowRate>36.8354</LowRate>
<Facilities>1|2|3|5|6|8|10|11|15|17|18|19|20|22|27|29|30|34|36|39|40|41|43|45|47|49|51|53|55|56|60|62|140|154|209</Facilities>
<NumberOfReviews>239</NumberOfReviews>
<OverallRating>3.95</OverallRating>
<CleanlinessRating>3.98</CleanlinessRating>
<ServiceRating>3.98</ServiceRating>
<FacilitiesRating>3.83</FacilitiesRating>
<LocationRating>4.06</LocationRating>
<DiningRating>3.93</DiningRating>
<RoomsRating>3.68</RoomsRating>
<PropertyType>0</PropertyType>
<ChainID>92</ChainID>
<Checkin>14</Checkin>
<Checkout>12</Checkout>
<Images>
<Image>19305754</Image>
<Image>19305755</Image>
<Image>19305756</Image>
<Image>19305757</Image>
<Image>19305758</Image>
<Image>19305759</Image>
<Image>19305760</Image>
<Image>19305761</Image>
<Image>19305762</Image>
<Image>19305763</Image>
<Image>19305764</Image>
<Image>19305765</Image>
<Image>19305766</Image>
<Image>19305767</Image>
<Image>37102984</Image>
</Images>
<Descriptions>
<Description>
<Name>General Description</Name>
<Value>The Copthorne Aberdeen enjoys a location proximate to several bars, restaurants and other diversions. This Aberdeen hotel is located on the city’s West End, roughly a mile from the many opportunities to engage in sightseeing or simply shopping the day away. The Aberdeen International Airport is approximately 10 miles from the Copthorne Hotel in Aberdeen.
There are 89 rooms in total at the Copthorne Aberdeen Hotel. Each of the is provided with direct-dial telephone service, trouser presses, coffee and tea makers and a private bath with a bathrobe and toiletries courtesy of the hotel. The rooms are light in color.
The Hotel Copthorne Aberdeen offers its guests a restaurant where they can enjoy their meals in a somewhat formal setting. For something more laid-back, guests may have a drink and a light meal in the hotel bar. This hotel does offer business services and there are rooms for meetings located onsite. The hotel also provides a secure parking facility for those who arrive by private car.</Value>
</Description>
<Description>
<Name>LocationDescription</Name>
<Value>Aberdeen's premier four star hotel located in the city centre just off Union Street and the main business and entertainment areas. Within 10 minutes journey of Aberdeen Railway Station and only 10-20 minutes journey from International Airport.</Value>
</Description>
</Descriptions>
</Hotel>
這裏是我的Ruby程序:
require 'rubygems'
require 'nokogiri'
require 'ap'
include Nokogiri
class Hotel < Nokogiri::XML::SAX::Document
def initialize
@h = {}
@h["Images"] = Array.new([])
@h["Descriptions"] = Array.new([])
@desc = {}
end
def end_document
ap @h
puts "Finished..."
end
def start_element(element, attributes = [])
@element = element
@desc = {} if element == "Description"
end
def end_element(element, attributes = [])
@h["Images"] << @characters if element == "Image"
@desc["Name"] = @characters if element == "Name"
if element == "Value"
@desc["Value"] = @characters
@h["Descriptions"] << @desc
end
@h[element] = @characters unless %w(Images Image Descriptions Description Hotel Name Value).include? element
end
def characters(string)
@characters = string
end
end
# Create a new parser
parser = Nokogiri::XML::SAX::Parser.new(Hotel.new)
# Feed the parser some XML
parser.parse(File.open("/Users/cbmeeks/Projects/shared/data/text/HotelDatabase_EN/00/1040900.xml", 'rb'))
感謝
沒有真正的理由爲什麼我使用SAX。除了我有200k這些解析。 :-)會喜歡使用DOM將XML文件轉換爲Ruby對象的示例 – cbmeeks 2011-04-07 00:47:17
@cbmeeks,「會喜歡使用DOM將XML文件轉換爲Ruby對象的示例」爲什麼?通過一個DOM來翻譯真是太簡單了,並且抓住了我從來不想用XML來對象轉換器所需的東西。我曾經在Perl中使用它們,並且聽說Rails能夠做到這一點,但我只是看不到這一點;我寫了一些使用Nokogiri解析許多RDF/RSS/Atom提要的大型應用程序,並且它毫不費力地處理了這個工作。 – 2011-04-07 00:51:23
我想我最後的經驗是使用Java DOM解析器,這是一個痛苦。我也會尋找一些DOM Ruby教程 – cbmeeks 2011-04-07 00:57:39