2
我試圖在標題中搜索帶有特定詞語的論文。更確切地說,2010年和2015年之間。在此發表的論文中字病毒或病毒的代碼我有:使用entrez和biopython在medline數據庫中搜索標題
import re
from Bio import Medline
handle = Entrez.esearch(db="pubmed", # database to search
term="2010[Date - Publication]:2015[Date - Publication]"
)
record = Entrez.read(handle)
handle.close()
pmid_list = record["IdList"] #list of records
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text")
records = Medline.parse(handle)
titles = [] # start with empty list of titles
for record in records:
ti_list = record['TI'] #titles
for title in ti_list:
if title == "virus" and title not in titles: #searching viral/virus
titles.append(title)
print('Publications with viral or virus in the title:')
for record in records:
print(" ", title)
如果我只是打印(記錄[「TI」],然後我得到的所有圖書的清單在我的搜索查詢中,但是我無法搜索到特定的單詞,我認爲我的錯誤可能出現在「if title ==」病毒「中(因爲顯然沒有紙張會單獨用這個單詞標題)
我非常堅持。有沒有更好的方式來尋找在我質疑的論文的標題字?
感謝。
編輯:更新的代碼(現在仍然沒有運氣)
import re
from Bio import Medline
handle = Entrez.esearch(db="pubmed", # database to search
term="2010[Date - Publication]:2015[Date - Publication]"
)
record = Entrez.read(handle)
handle.close()
pmid_list = record["IdList"] #list of records
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text")
records = Medline.parse(handle)
r = re.compile(r"\bvir(al|us)\b")
titles = set() # start with empty list of titles
for record in records:
ti_list = record['TI'] # titles
for title in ti_list:
if r.search(title): #
titles.add(title)
print('Publications with viral or virus in the title:')
for record in records:
print(" ", title)
新代碼:
import re
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text",
term="2010[Date - Publication]:2015[Date - Publication]")
records = Medline.parse(handle)
titles = []
for record in records:
ti_list = record['TI']
for title in ti_list:
titles.append(title)
handle.close()
for title in titles:
print(title)
對不起,我對此很新。我如何將你的答案的正則表達式版本放入我的代碼中? – jarch
@ user3723011,你想達到什麼目的?您正在添加到標題列表,但您似乎沒有使用它。你還在尋找子串還是精確匹配? –
我的目標是有輸出,說 '在標題病毒或病毒刊物: [與病毒或病毒在標題出版物清單]'。 我試圖獲得完全匹配。 – jarch