模式編譯器爲span html標記

您好我正在雲計算亞馬遜項目。我堅持的部分代碼是從亞馬遜獲取用戶希望列表。由於存在權限限制，我所做的是提取了給定願望清單url的整個頁面源。要提取我用圖案ITEMID編譯像模式編譯器爲span html標記

Pattern p = Pattern.compile("/dp/(\\w+)/"); 
        Matcher matcher = p.matcher(content);

這是容易的，它現在可以正確地列出了所有的產品與他們的itemId在願望清單。我也需要每個的價格。根據頁面來源的價格是

<span class="a-size-base a-color-price a-text-bold"> 
         $7.19 
        </span>

我需要爲這個寫一個模式，都困惑和卡住。我吸吮正則表達式。任何人都可以幫忙請。我看到了href的在線參考資料，但我認爲這不會對我有用。

感謝dkatzel，我發現這個工具Jsoup。我嘗試了在線轉換Online Jsoup Try，所以當我做CSS Query div時，我得到了所需的輸出。但我如何在我的java程序中硬編碼它。我有jsoup罐子。

來源

2013-12-17 sa_nyc

我建議您使用像http://jsoup.org/這樣的HTML解析庫來爲您完成所有這些工作。（除非你需要自己解析它以用於學校工作） – dkatzel

我不需要自己解析它。我的主要項目完全不同。 –

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – GriffeyDog

不是一個簡單的表達式的工作？

\\$\\d+(?:\\.\\d+)

\\$匹配的文字$。

\\d+符合數字。

(?:\\.\\d+)匹配潛在小數。我猜，除非你不需要美元符號，那麼你可以使用一個捕獲組，並採取第一組（即\\$(\\d+(?:\\.\\d+))）或一個倒序（即(?<=\\$)\\d+(?:\\.\\d+)））

來源

2013-12-17 21:23:28 Jerry

我做過'列表 price = new ArrayList （）; \t \t \t \t \t Pattern pr = Pattern.compile（「\\ $ \\ d +（？：\\。\\ d +）」）; \t \t \t \t \t Matcher priceMatcher = pr.matcher（content）; \t \t \t \t \t while（priceMatcher。找到（））{ \t \t \t \t \t \t如果） \t \t \t \t \t \t \t price.add（priceMatcher.group（1））（price.contains（priceMatcher.group（1）！）; \t \t \t \t \t} \t \t \t \t \t \t \t \t \t \t的System.out.println（「價格迭代取出」 +計數）; \t \t \t \t \t爲（字符串S：價格）{ \t \t \t \t \t \t的System.out.println（一個或多個）; '**給出IndexOutOfBoundsException（「No group」+ group）; ** –

@sa_nyc使用'.group（0）'，因爲它是整個匹配。 – Jerry

如果你想匹配整個標籤，你可以使用這個：' \\ s *（\\ $ \\ d + （？：\\。\\ d +））\\ s *'然後使用'.group（1）'，因爲有一個捕獲組。 – Jerry

使用Jsoup的替代答案。

Element e = doc.select("span.a-size-base").first();

在您的項目包括jsoup-1.x.x.jar或當您編譯，並添加以下的進口。

import org.jsoup.Jsoup; 
import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element;

來源

2013-12-17 22:12:51

模式編譯器爲span html標記

回答

相關問題