如何打印關鍵字後的所有內容？例如打印的「蘋果」字和字「筆」

之間的一切，我需要做一個腳本什麼我展示的所有字符中的關鍵字之間。如何打印關鍵字後的所有內容？例如打印的「蘋果」字和字「筆」

比方說，我下載HTML頁面，然後讀取它（它有33985個字符在那裏）。我需要"<td class="ml_subject"><a href="?tab=inbox"和"</a></td>"這是十幾封信遠之間打印的一切。

我可以通過找到起點：

if "<td class="ml_subject"><a href="?tab=inbox" in html: 
    print "Success"

但有什麼呢？

來源

2012-02-25 Skylight

使用find()方法： - >http://docs.python.org/library/stdtypes.html#str.find

這將是這個樣子：

# html is your input string 
start = html.find('<td class="ml_subject"><a href="?tab=inbox>') 
end = html.find('</a></td>', start) 
result = html[start:end]

來源

2012-02-25 18:39:02 alex

string = 'how to print everything after keyword ? for instance print everything between word 「Apple」 and word 「Pen」' 
s, e = string.index('Apple') + 5, string.index('Pen') 
# plus 5 because we do not want to capture apple 
print string[s:e]

來源

2012-02-25 18:39:03 Doboy

使用find找到的關鍵字在你的字符串和使用切片表示法來提取文本。 find如果找不到字符串，則返回-1，請確保在實際實現中檢查該字符串。

>>> a = "stuff Apple more stuff Pen blah blah" 
>>> delim1 = 'Apple' 
>>> delim2 = 'Pen' 
>>> i1 = a.find(delim1) 
>>> i1 
6 
>>> i2 = a.find(delim2) 
>>> i2 
23 
>>> a[i1+len(delim1):i2] 
' more stuff '

來源

2012-02-25 18:42:47

使用lxml或其他一些HTML處理模塊：

from lxml.html import fragment_fromstring 
from lxml.cssselect import CSSSelector 

HTML = '<td class="ml_subject"><a href="?tab=inbox">Foobar</a></td>' 

tree = fragment_fromstring(HTML) 
selector = CSSSelector('td.ml_subject > a[href="?tab=inbox"]') 
result = selector(tree)[0].text

來源

2012-02-25 19:12:06 Gandaro

要打印您可以使用BeautifulSoup所有鏈接文本：

try: 
    from urllib2 import urlopen 
except ImportError: # Python 3.x 
    from urllib.request import urlopen 

from bs4 import BeautifulSoup # pip install beautifulsoup4 

soup = BeautifulSoup(urlopen(url)) 
print('\n'.join(soup('a', href="?tab=inbox", text=True)))

如果鏈接必須td.ml_subject父，那麼你可以使用函數作爲搜索條件：

def link_inside_td(tag): 
    td = tag.parent 
    return (tag.name == 'a' and tag.get('href') == "?tab=inbox" and 
      td.name == 'td' and td.get('class') == "ml_subject") 

print('\n'.join(soup(link_inside_td, text=True)))

來源

2012-02-25 19:31:37 jfs

如何打印關鍵字後的所有內容？例如打印的「蘋果」字和字「筆」

回答

相關問題