使用python讀取github文件返回HTML標記

我想讀取使用請求包保存在github中的文本文件。這裏是我使用Python代碼：使用python讀取github文件返回HTML標記

import requests 
    url = 'https://github.com/...../filename' 
    page = requests.get(url) 
    print page.text

非但沒有文字的，我讀的HTML標籤。如何從文件中讀取文本而不是HTML標籤？

來源

2016-07-20 Sandy

確保你得到你所想 - 什麼，如果你把網址到瀏覽器你？ Github通常會返回一個內嵌文件的頁面 - 您可能需要調整您的URL以直接指向該文件。嘗試'https：//github.com/repo/raw /.../文件名'，它重定向到'https：//raw.githubusercontent.com/repo /.../文件名' – brichins

嗨，謝謝你的回覆。我把URL放到瀏覽器中，然後我得到這個文件。無論如何，我也嘗試了https://github.com/repo/raw/.../filename鏈接，我可以通過瀏覽器以原始格式打開文件，但是在通過python閱讀時，我只能獲取HTML標籤。 – Sandy

有一些很好的解決方案了，但如果你使用requests只需按照GitHub的API。

所有內容的終點是

GET /repos/:owner/:repo/contents/:path

但是請記住，GitHub的API的默認行爲是使用base64對內容進行編碼。

在你的情況，你會做到以下幾點：

#!/usr/bin/env python3 
import base64 
import requests 


url = 'https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}' 
req = requests.get(url) 
if req.status_code == requests.codes.ok: 
    req = req.json() # the response is a JSON 
    # req is now a dict with keys: name, encoding, url, size ... 
    # and content. But it is encoded with base64. 
    content = base64.decodestring(req['content']) 
else: 
    print('Content was not found.')

來源

2016-07-21 07:04:01 dasdachs

嗨，我試過了，但是最終出現了一個錯誤：ValueError：沒有JSON對象可以被解碼 – Sandy

即使我得到了一個404響應（這種情況下的結果是'keyError'），我也無法複製你的錯誤。。試用了Python 2.7和3.4。你能給我你使用的URL或用戶，回購和路徑嗎？謝謝。 – dasdachs

嗨，我想我構建了錯誤的URL，這就是爲什麼我得到了以前的錯誤。現在我得到了正確的URL，但我得到了404響應。你知道如何避免這種情況嗎？感謝您的幫助。 – Sandy

您可以通過更改您的路段始訪問文本版本

https://raw.githubusercontent.com/

來源

2016-07-20 22:14:52 patrick

使用python讀取github文件返回HTML標記

回答

相關問題