2015-09-08 17 views
1

我幾乎設法達到我的目標:一個函數,檢查有效的網址,然後返回其部分。有效的網址可能無模式或無協議(這用於前端偵聽文本區域和後端解析來自刮取的url及其標籤的結果)。URL上的正則表達式不能分割哈希沒有GET參數

這裏是工作示例: http://jsfiddle.net/6v1u5w1f/2/

var url = { 
 
check:function(url){ 
 
\t var urlPattern = /((http|ftp|https):)?\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\[email protected]?^=%&\/~+#-])?/gi; 
 
\t var patt = new RegExp(urlPattern); 
 
\t return patt.exec(url); 
 
}, 
 
getParts: function(url){ 
 
\t url = this.check(url); 
 
    if(!url) return false; 
 
    \t \t \t var singleUrlPattern =/^(?:(.*?):)?(\/\/)?(?:[^\/\.]+\.)*?([^\/\.]+)\.?([^\/]*)(?:([^?]*)?(?:\?([^#]*))?)?(.*)?/; 
 
    \t \t \t return singleUrlPattern.exec(url[0]); 
 
    \t } 
 
}; 
 

 

 
var urls = { 
 
schema_less: \t '//cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg', 
 
http: \t \t \t 'http://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg', 
 
https: \t \t 'https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg', 
 
get: \t \t \t 'https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg?this=that', 
 
hash: \t \t \t 'https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg#this=that', 
 
getAndHash: \t \t 'https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg?this=that#dogs=cats', 
 
none: \t \t \t 'cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg', 
 
sentence: \t \t 'bob https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg dog', 
 
barUrl: \t \t \t 'hi there this is my url: mazonaws.com/img/636/nl_nbp.jpg' }; 
 

 
//loop each url and run through the getParts function and print what it returns 
 
for(var keya in urls){ 
 
    console.log(keya+':'); 
 
    console.log(url.getParts(urls[keya])); 
 
    console.log('    ');  
 
}

在這個例子中,將循環通過一系列的POSS URL模式和控制檯登錄功能的結果。

問題是隻有在網址中有散列,我無法從網址的路徑中分離散列。當有一個GET PARAM它找到,但不只是一個哈希..

回答

0

考慮到你已經嘗試過,你的問題的範圍,你可以使用下面的表達式匹配的URL:

(?:(https?|ftp):\/\/)?\/*([^:\/\s]+)((\/\w+)*\/)([-.\w]+[^#?\s]*)?(?:\?([^#\s]*))?(#[-\[email protected]?^=%&;\/~+#]+)? 

Test online and read the explanation pane on the right

聲明:此正則表達式確實有一定的假陽性,符合關於與「/」什麼,因爲URI是不是從一個文字的任何字詞完全不同。此外,它不涵蓋所有有效的URI。它只是作爲OP所說的一種方法。

但是,如果你想解析的URL,正則表達式不是那個工具。瀏覽器已經這樣做了。請檢查下面的代碼:

var urls = { 
 
    schema_less: \t '//cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg', 
 
    http: \t \t 'http://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg', 
 
    https: \t \t 'https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg', 
 
    get: \t \t \t 'https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg?this=that', 
 
    hash: \t \t 'https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg#this=that', 
 
    getAndHash: \t 'https://cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg?this=that#dogs=cats', 
 
    none: \t \t 'cartassets.s3.amazonaws.com/img/636/nl_nbp.jpg', 
 
}; 
 

 
var parseURL = document.createElement('a'); 
 

 
for(var keya in urls){ 
 
    parseURL.href = urls[keya]; 
 
    ['href','protocol','host','hostname','port','pathname','search','hash'].forEach(function(part) { 
 
    document.write("<br />", keya , ": ", part, ": ", parseURL[part]); 
 
    }); 
 
}