如何使用正則表達式從URL中獲取域名？

我需要在網頁上顯示word文檔。我正在使用名爲Docx4j的庫將.doc轉換爲html。這工作正常。但是，我以下面的格式獲取超鏈接。如何使用正則表達式從URL中獲取域名？

To search on google go to this link [#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google[#?] and type the text.

我可以使用下面的代碼，將其轉換爲

To search on google go to this link (http://www.google.com) google and type the text.

String myText = "To search on google go to this link [#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google[#?] and type the text."; 
System.out.println(myText); 
String firstReplace = myText.replaceAll("\\[", "").replaceAll("\\]", "").replaceAll("#\\?", ""); 
System.out.println(firstReplace); 
String secondReplace = firstReplace.replaceAll("HYPER\\S+\\s+\"", "("); 
System.out.println(secondReplace); 
String finalReplace = secondReplace.replaceAll("/*\".", ")"); 
System.out.println("\n" + finalReplace);

可有人請我提供一個正則表達式上面的字符串轉換爲

To search on google go to this link google (http://www.google.com) and type the text.

- EDIT--

有一些鏈接，其顯示爲

[#?] HYPERLINK \"http://www.google.com/\" [#?][#?] google page[#?]

我應該改變他們

google page (http://www.google.com)

我該怎麼辦呢？

來源

2015-11-06 Aakash

您可以使用組引用來匹配括號後面的單詞google。

可以替換以下的正則表達式的結果：

'(\([^)]*\))\s?(\w+)'

有了以下幾點：

'$2 $1'

可以使用str.replaceAll()功能這一目標。

精化：

第一捕獲組(\([^)]*\))將括號之間的部分相匹配，[^)]*是匹配字符的任意組合，其除了閉括號一個否定的字符類。

而第二個(\w+)將匹配該部分之後的詞，\w+將匹配單詞字符的任意組合。

來源

2015-11-06 08:05:41 Kasramvd

可以請您詳細說明嗎？ –

@SumodhS結帳編輯。 – Kasramvd

有什麼方法可以讓我「http://www.google.com/」直接替換爲「（http://www.google.com/）」？我不能在這個問題中使用這個腳本，因爲我擁有的是一個HTML並替換掉了「我的HTML –

只要您在問題中刪除[＃？]標記，就意味着您將失去基本信息以便稍後進行必要的文本調整。您輸入的基本模板是：

[#?] HYPERLINK *target* [#?] [#?] *clickable textual description of link* [#?]

那麼，爲什麼你不使用這些標記對你有利呢？

一些正則表達式這樣的（注：沒有測試過，可能是錯誤的，但只給你基本的想法）：

mystring.replaceAll("\\[#\\?\\] HYPERLINK (.*) \\[#\\?\\] \\[#\\?\\] (.*) \\[#\\?\\]", "$2 ($1)");

以上的目的是給你「谷歌網頁（http://www.google.com）」。但我也會質疑你爲什麼要這樣展示它。通常對於HTML網頁，您希望它是<a href="http://www.google.com">google page</a>。要做到這一點，只需更改上面的代碼。

來源

2015-11-06 12:46:19

如何使用正則表達式從URL中獲取域名？

回答

相關問題