將值輸入到搜索欄並從網頁下載輸出

我正在嘗試搜索網頁（http://www.phillyhistory.org/historicstreets/）。我認爲相關的源html是這樣的：將值輸入到搜索欄並從網頁下載輸出

<input name="txtStreetName" type="text" id="txtStreetName">

你可以在網站上看到源html的其餘部分。我想進入那個文本框並輸入一個街道名稱並下載一個輸出（即在頁面的搜索框中輸入'Jefferson'並查看傑弗遜的歷史街道名稱）。我曾嘗試使用requests.post，並試圖在URL中嘗試輸入？get = Jefferson來測試如果沒有運氣的話。任何人有任何想法如何獲得此頁？謝謝，

卡梅倫說我現在嘗試（有些進口未使用的，因爲我打算解析等）

代碼：

import requests 
from bs4 import BeautifulSoup 
import csv 
from string import ascii_lowercase 
import codecs 
import os.path 
import time 


arrayofstreets = [] 



arrayofstreets = ['Jefferson'] 

for each in arrayofstreets: 
    url = 'http://www.phillyhistory.org/historicstreets/default.aspx' 
    payload = {'txtStreetName': each} 
    r = requests.post(url, data=payload).content 
    outfile = "raw/" + each + ".html" 
    with open(outfile, "w") as code: 
     code.write(r) 
    time.sleep(2)

這沒有工作，只給了我下載的默認網頁（即傑弗森在搜索欄中沒有輸入和檢索。

來源

2016-06-20 www3

我猜你參考「requests.post」涉及蟒蛇請求模塊。

當你沒有指定你想從搜索結果中湊什麼，我只給你一個片段來獲取HTML對於給定的搜索查詢：

import requests 

query = 'Jefferson' 

url = 'http://www.phillyhistory.org/historicstreets/default.aspx' 
post_data = {'txtStreetName': query} 

html_result = requests.post(url, data=post_data).content 

print html_result

如果您需要進一步處理的HTML文件中提取一些數據，我建議你使用Beautiful Soup模塊來做到這一點。

更新版本：

#!/usr/bin/python 
import requests 
from bs4 import BeautifulSoup 
import csv 
from string import ascii_lowercase 
import codecs 
import os.path 
import time 

def get_post_data(html_soup, query): 
    view_state = html_soup.find('input', {'name': '__VIEWSTATE'})['value'] 
    event_validation = html_soup.find('input', {'name': '__EVENTVALIDATION'})['value'] 
    textbox1 = '' 
    btn_search = 'Find' 
    return {'__VIEWSTATE': view_state, 
      '__EVENTVALIDATION': event_validation, 
      'Textbox1': '', 
      'txtStreetName': query, 
      'btnSearch': btn_search 
      } 

arrayofstreets = ['Jefferson'] 


url = 'http://www.phillyhistory.org/historicstreets/default.aspx' 
html = requests.get(url).content 
for each in arrayofstreets: 
     payload = get_post_data(BeautifulSoup(html, 'lxml'), each) 
     r = requests.post(url, data=payload).content 
     outfile = "raw/" + each + ".html" 
     with open(outfile, "w") as code: 
      code.write(r) 
      time.sleep(2)

在我的/你的第一個版本的問題是，我們並沒有發佈所有必需的參數。要找出需要發送的內容，請在瀏覽器中打開網絡監視器（Firefox中的Ctrl + Shitf + Q），並按照正常情況進行搜索。如果您在網絡日誌中選擇POST請求，則在右側您應該看到「參數選項卡」，其中您的瀏覽器發送了帖子參數。

來源

2016-06-20 15:59:56 Dziugas

嗨Dziugas，這正是我試過的。我沒有得到正確的輸出。我在這個問題上編輯了我的回答 – www3

將值輸入到搜索欄並從網頁下載輸出

回答

相關問題