如何從Python中的字符串中去除？

-3

我正在使用beautifulsoup來追加數組「get_link」中的所有鏈接。如何從Python中的字符串中去除？

get_link = [] 
for a in soup.find_all('a', href=True): 
    if a.get_text(strip=True): 
    get_link .append(a['href'])

輸出get_link的：

['index.html?country=2', 
'index.html?country=25', 
'index.html?country=1', 
'index.html?country=6', 
'index.html?country=2']

如何得到下面的輸出？

[country=2', 
country=25', 
country=1', 
country=6', 
country=2']

來源

2017-10-05 Raju Singh

我不明白你的要求。您的標題與您顯示的代碼幾乎沒有關係，或者沒有關係。你只是想弄清楚如何得到你的每個'index.html？country = ...'字符串的'country = ...'部分？這似乎是'str.index'和一個切片很容易，但我會寫一個答案，說當我不確定這實際上是你問什麼。 – Blckknght

@Blckknght我的英語不好，這就是爲什麼我不能更好地解釋。有沒有什麼辦法可以使用正確的左數組和數組，這樣我就可以只保留必要的數組文本了get_link –

對不起，我仍然不知道「right，left function」是什麼意思。如果你的所有鏈接都是相同的類型（它們總是以'index.html？'開頭，這就是你想要切斷的內容，你可以'get_link.append（a ['href'] [11：]） ''[11：]'是一個切斷前11個字符的切片，如果你的鏈接可能看起來不同，你可能需要更復雜的邏輯 – Blckknght

優化的方法來獲取所有a標籤（鏈接）與非空的文本價值和href屬性：

links = [l.get('href').replace('index.html?','') 
     for l in soup.find_all('a', href=True, string=True) if l.text.strip()] 
print(links)

來源

2017-10-05 09:02:18 RomanPerekhrest

是的，這是除去「index.html？」的另一種方法。謝謝！ –

@RajuSingh，不客氣 – RomanPerekhrest

有很多方法來獲得唯一的「國家=」一些已經在BS4但如果你願意，你可以使用正則表達式：

import re 
ui=['index.html?country=2', 
'index.html?country=25', 
'index.html?country=1', 
'index.html?country=6', 
'index.html?country=2'] 





pattern=r'(country=[0-9]{0,99})' 



print("\n".join([re.search(pattern,i).group() for i in ui]))

結果：

country=2 
country=25 
country=1 
country=6 
country=2

來源

2017-10-05 09:33:18

如何從Python中的字符串中去除？

回答

相關問題