2016-07-29 100 views
1

我用requestsbs4。在圈子裏,我發現只有當我得到每一個「湯」時,最後的「湯」纔是正確的。另一個「湯」與HTML源不同。請幫幫我。謝謝。python網站爬蟲(多個網站)

for eachLine in files: 
    addr = 'http://neuromorpho.org/neuron_info.jsp?neuron_name='+eachLine 
    print addr 
    st = [] 
    st1 = [] 
    r2 = requests.get(addr) 
    soup2 = bs4.BeautifulSoup(r2.text,"lxml") 
    print soup2 

回答

0

請求對象具有具有該網站的所有內容的內容參數,你可以使用BS4

for eachLine in files: 
    addr = 'http://neuromorpho.org/neuron_info.jsp?neuron_name='+eachLine 
    r2 = requests.get(addr) 
    content = r2.content 
    soup2 = bs4.BeautifulSoup(content) 
    print soup2 
解析它