拆分在Python 2.x的一個URL

-1

我有一個鏈接在一些HTML代碼被解析如下： -拆分在Python 2.x的一個URL

"http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

我所希望做的是提取代碼的第二部分與第二開始HTTP的發生：因此，在上述情況下，我想提取

"http://truelink.com/football/abcde.html?"

我已經考慮切片URL成段不過我不確定結構隨着時間的推移將保持不變與第一部分。

是否可以識別第二次出現的'http'，然後從那裏解析出代碼到最後？

來源

2015-06-06 thefragileomen

只是出於好奇 - 怎麼樣你最終得到這樣一個字符串？ :) –

link = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?" 

link[link.rfind("http://"):]

回報：

"http://truelink.com/football/abcde.html?"

這是我會怎麼做。 rfind找到最後一次發生的「http」並返回索引。你的例子顯然是真實的，原始的網址。然後你可以提取以該索引開始的子串直到結束。

所以，如果你有一些字符串myStr一個子被蟒蛇提取與類似陣列的表達：

myStr[0] # returns the first character 
myStr[0:5] # returns the first 5 letters, so that 0 <= characterIndex < 5 
myStr[5:] # returns all characters from index 5 to the end of the string 
myStr[:5] # is the same like myStr[0:5]

來源

2015-06-06 21:11:22 daniel451

如果URL是「http://advert.com/go/2/12345/0/http://truelink.com/football/http」'？ – vaultah

然後，我會改變rfind（「http」）rfind（「http：//」） – daniel451

@ascenator，那應該是你的實際答案:) –

我會做這樣的事情：

addr = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?" 
httpPart = 'http://' 
split = addr.split(httpPart) 
res = [] 
for str in split: 
    if (len(str) > 0): 
     res.append(httpPart+str); 
print res

來源

2015-06-06 21:20:48

拆分在Python 2.x的一個URL

回答

相關問題