2011-10-21 140 views
1

我使用Nokogiri代碼來提取HTML節點之間的文本,並在讀取文件列表時獲取這些錯誤。我沒有使用簡單的嵌入式HTML獲取錯誤。我想消除或壓制警告,但不知道如何。這些警告都在每個塊的結尾:已經初始化常量警告

extract.rb:18: warning: already initialized constant EXTRACT_RANGES 
extract.rb:25: warning: already initialized constant DELIMITER_TAGS 

這裏是我的代碼:

#!/usr/bin/env ruby -wKU 
require 'rubygems' 
require 'nokogiri' 
require 'fileutils' 

source = File.open('/documents.txt') 
source.readlines.each do |line| 
    line.strip! 
    if File.exists? line 
    file = File.open(line) 

doc = Nokogiri::HTML(File.read(line)) 

# suggested by dan healy, stackoverflow 
# Specify the range between delimiter tags that you want to extract 
# triple dot is used to exclude the end point 
# 1...2 means 1 and not 2 
EXTRACT_RANGES = [ 
    1...2 
] 

# Tags which count as delimiters, not to be extracted 
DELIMITER_TAGS = [ 
    "h1", 
    "h2", 
    "h3" 
] 

extracted_text = [] 

i = 0 
# Change /"html"/"body" to the correct path of the tag which contains this list 
(doc/"html"/"body").children.each do |el| 

    if (DELIMITER_TAGS.include? el.name) 
    i += 1 
    else 
    extract = false 
    EXTRACT_RANGES.each do |cur_range| 
     if (cur_range.include? i) 
     extract = true 
     break 
     end 
    end 

    if extract 
     s = el.inner_text.strip 
     unless s.empty? 
     extracted_text << el.inner_text.strip 
     end 
    end 
    end 
end 

print("\n") 
puts line 
print(",\n") 
# Print out extracted text (each element's inner text is separated by newlines) 
puts extracted_text.join("\n\n") 
    end 
end 
+0

您是否嘗試將常量重命名爲EXTRACT_RANGES/DELIMITER_TAGS之外的其他東西? –

+0

是的,我重命名了常量,並且我得到了相同的警告,只是反映了新的名字。 – chuckfinley

+0

這個問題與Nokogiri無關,它是「我應該在哪裏定義一個恆定的問題」,所以我刪除了Nokogiri標籤並調整了標題。 –

回答

1

之前沒注意到。只要將常量出來的每一塊

EXTRACT_RANGES = [ 
    1...2 
] 

# Tags which count as delimiters, not to be extracted 
DELIMITER_TAGS = [ 
"h1", 
"h2", 
"h3" 
] 

source.readlines.each do |line| 
line.strip! 
    if File.exists? line 
    file = File.open(line) 

doc = Nokogiri::HTML(File.read(line)) 

extracted_text = [] 

i = 0 
# Change /"html"/"body" to the correct path of the tag which contains this list 
(doc/"html"/"body").children.each do |el| 

    if (DELIMITER_TAGS.include? el.name) 
    i += 1 
    else 
    extract = false 
    EXTRACT_RANGES.each do |cur_range| 
     if (cur_range.include? i) 
     extract = true 
     break 
     end 
    end 

    if extract 
    s = el.inner_text.strip 
     unless s.empty? 
     extracted_text << el.inner_text.strip 
     end 
    end 
    end 
end 

print("\n") 
puts line 
print(",\n") 
# Print out extracted text (each element's inner text is separated by newlines) 
puts extracted_text.join("\n\n") 
    end 
end 
+0

我是Dan Healy,我同意這條消息 –

+0

嗨Dan(和用戶) - 向後退兩步。謝謝,但 - 我不知道如何移動常量。我試着用谷歌搜索一些提示,並不斷提出我自己的問題。 – chuckfinley

+0

現在,只需將您的代碼替換爲我發佈的代碼即可。然後做一個差異,你應該看看我如何將常量從循環中移出 –

2

的如果代碼是適當的縮進,它會更容易注意到常量定義正在一個循環內完成。

比較

source.readlines.each do |line| 
    # code 
    if true 

# Wrongly indented code 

# More 
# Wrongly 
# Indented 
# Code 
EXTRACT_RANGES = [ 
    1...2 
] 

# Several more pages of code 
    end 
end 

source.readlines.each do |line| 
    # code 
    if true 

    # Correctly indented code 

    # What is a constant doing being defined 
    # this far indented? 
    # Oh no - it's in a loop! 

    EXTRACT_RANGES = [ 
     1...2 
    ] 

    # Several more pages of code 
    end 
end 
0

作爲一種編程技巧:

小心使用.....的範圍定義。三點版本不像雙點版本那樣普遍使用,並且額外的點可能很容易錯過,使代碼更難以維護。我必須有一個非常好的理由來使用三點。從IRB比較這些輸出:

(1...2).to_a 
=> [1] 

(1..1).to_a 
=> [1] 

怎麼看誤導首先是。

+0

謝謝田文,我從你在這個頁面上的所有評論中學到了東西 – chuckfinley

相關問題