2011-05-03 20 views
1

嗨,我們的應用程序是一個web應用程序。在考試文件夾我有html顯示問題論文的文件,表示它包含問題和答案選項。在代碼後面獲取html文件並在刪除html標籤後獲取內容

我想讀取html文件的問題和答案選項的內容並保存到數據庫。

任何人都可以幫忙嗎? HTML文件看起來像這樣

出口等級:4 出口主題:閱讀NGS

Item ID: 4RMINI0521080000000076 
Group ID: passage_Bullying 
Benchmark: LA.4.1.6.7 
Webb's Cognitive Complexity:2 
Item Type: Multiple Choice 
Correct Answer: B 
Item Stem 

從通道讀這句話。

欺凌是指某人反覆說或做某事讓別人感到不快。

重複單詞的基本單詞是什麼?

回答

回答乙

重複

答案C

回答d

re

Item ID: 4RMINI0521080000000077 
    Group ID: passage_Bullying 
    Benchmark: LA.4.1.6.8 
    Webb's Cognitive Complexity:2 
    Item Type: Multiple Choice 
    Correct Answer: D 
    Item Stem 

從段落中讀這句話。

欺凌的常見形式包括:...

哪個詞有詞的共同相反的意思?

回答

通常

回答乙

流行

答案C

持續

回答d

罕見

HTML源代碼看起來像這樣

<BODY> 
<P><B>Export grade:</B> 4<BR><B>Export subject:</B> Reading NGS<BR><BR><BR></P><!-- ITEM_START --> 
<P><B>Item ID:</B> 4RMINI0521080000000076<BR><B>Group ID:</B> 

passage_Bullying<BR><B>Benchmark:</B> LA.4.1.6.7<BR><B>Webb's Cognitive 
Complexity:</B>2<BR><B>Item Type:</B> Multiple Choice<BR><B>Correct Answer:</B> 
B<BR></P> 
<P><B>Item Stem</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>Read this sentence 
from the passage.</P> 

<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 12pt"></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt; MARGIN-LEFT: 10pt" 
align=left><B>Bullying is when someone repeatedly says or does things to make 
someone else feel bad.</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 12pt"></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>What is the base word 
for the word <I>repeatedly</I>?</P> 
<P><B>Answer A</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>eat</P> 
<P><B>Answer B</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>repeat</P> 
<P><B>Answer C</B></P> 

<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>ed</P> 
<P><B>Answer D</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>re</P> 
<P><BR><BR></P><!-- ITEM_START --> 
<P><B>Item ID:</B> 4RMINI0521080000000077<BR><B>Group ID:</B> 
passage_Bullying<BR><B>Benchmark:</B> LA.4.1.6.8<BR><B>Webb's Cognitive 
Complexity:</B>2<BR><B>Item Type:</B> Multiple Choice<BR><B>Correct Answer:</B> 

D<BR></P> 
<P><B>Item Stem</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>Read this sentence 
from the passage.</P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 12pt"></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt; MARGIN-LEFT: 10pt" 
align=left><B>Common forms of bullying include:...</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 12pt"></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>Which word has the 
OPPOSITE meaning of the word <I>common</I>?</P> 
<P><B>Answer A</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>usual</P> 

<P><B>Answer B</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>popular</P> 
<P><B>Answer C</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>continual</P> 
<P><B>Answer D</B></P> 
<P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=left>rare</P> 
<P><BR><BR></P 
</BODY> 
+0

你能告訴我們一個HTML文件的例子,你需要提取哪些字段? – 2011-05-03 06:11:54

回答

1

上傳文件並與HTML解析器

http://htmlagilitypack.codeplex.com/提供給讀取HTML文檔內容的最好方式讀取..

+0

要添加什麼名稱空間以獲取HtmlDocument – user42348 2011-05-03 06:18:42

+0

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); – 2011-05-03 06:22:22