2010-10-20 53 views
0

當我搜索關鍵字「數據」,我得到abtract紙在數字圖書館:如何刪除字符串中的html標記?

Many organizations often underutilize their existing <span class='snippet'>data</span> warehouses. In this paper, we suggest a way of acquiring more information from corporate <span class='snippet'>data</span> warehouses without the complications and drawbacks of deploying additional software systems. Association-rule mining, which captures co-occurrence patterns within <span class='snippet'>data</span>, has attracted considerable efforts from <span class='snippet'>data</span> warehousing researchers and practitioners alike. Unfortunately, most <span class='snippet'>data</span> mining tools are loosely coupled, at best, with the <span class='snippet'>data</span> warehouse repository. Furthermore, these tools can often find association rules only within the main fact table of the <span class='snippet'>data</span> warehouse (thus ignoring the information-rich dimensions of the star schema) and are not easily applied on non-transaction level <span class='snippet'>data</span> often found in <span class='snippet'>data</span> warehouses 

我怎樣才能去除所有標籤<span class='snippet'>..</span>,但仍保持keywod數據有abtract這樣:

許多組織經常利用現有的數據倉庫。在本文中,我們建議從企業數據倉庫獲取更多信息的方法,而不會出現部署其他軟件系統的複雜性和缺陷。關聯規則挖掘捕獲數據中的同現模式,吸引了數據倉庫研究人員和從業人員的大量努力。不幸的是,大多數數據挖掘工具充其量與數據倉庫存在鬆散耦合。此外,這些工具通常只能在數據倉庫的主事實表中找到關聯規則(因此忽略了星型模式的信息豐富的維度),並且不容易應用於通常在數據倉庫中發現的非事務級數據

+0

它總是會是'?您可以使用簡單的字符串替換或正則表達式。 – Marko 2010-10-20 03:45:17

+0

如果有任何一種HTML可以存在,我建議你使用解析器而不是正則表達式。看看這個wiki如果你想要一個好的解析器... http://stackoverflow.com/questions/773340/can-you-provide-an-example-of-parsing-html-with-your-favorite-parser – InSane 2010-10-20 03:50:53

+0

re :正則表達式和HTML ...塔爾是龍。 – 2010-10-20 03:53:48

回答

2

strip_tags()是你的朋友。 Code kindly copied from here

public static String strip_tags(String text, String allowedTags) { 
     String[] tag_list = allowedTags.split(","); 
     Arrays.sort(tag_list); 

     final Pattern p = Pattern.compile("<[/!]?([^\\\\s>]*)\\\\s*[^>]*>", 
       Pattern.CASE_INSENSITIVE); 
     Matcher m = p.matcher(text); 

     StringBuffer out = new StringBuffer(); 
     int lastPos = 0; 
     while (m.find()) { 
      String tag = m.group(1); 
      // if tag not allowed: skip it 
      if (Arrays.binarySearch(tag_list, tag) < 0) { 
       out.append(text.substring(lastPos, m.start())).append(" "); 

      } else { 
       out.append(text.substring(lastPos, m.end())); 
      } 
      lastPos = m.end(); 
     } 
     if (lastPos > 0) { 
      out.append(text.substring(lastPos)); 
      return out.toString().trim(); 
     } else { 
      return text; 
     } 
    } 
相關問題