給定一個文本文件的URL，讀取含有大量數據的文本文件的內容的最簡單方法是什麼？

我已經檢查這個其他答案，我在這個論壇In Python, given a URL to a text file, what is the simplest way to read the contents of the text file?給定一個文本文件的URL，讀取含有大量數據的文本文件的內容的最簡單方法是什麼？

發現它是有用的，但如果你在我的網址文件看看這裏http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt

你會發現，是大量數據回事在這裏。所以，當我使用此代碼：

import urllib2 

data = 
urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt').read(69700) # read only 69700 chars 

data = data.split("\n") # then split it into lines 

for line in data: 

     print line

人物了Python可以在URL文件頭讀取的量是69700個字符，但我的問題是，我需要的所有數據，在那裏大約是像30000000字左右。

當我把這麼多的字符，我只得到一個數據塊顯示，而不是所有的數據，URL文件數據中的每一列的標題都消失了。幫助解決這個問題？

來源

2013-10-02 dontbadick

您參考的SO答案顯示如何逐行讀取url。考慮到你正在處理面向行的數據，這很可能是一條可行的路。您可能希望將urlopen對象傳遞給CSV閱讀器，並讓它將數據拉入。 – tdelaney

「* python可以使用URL文件中的標題讀取的字符數量爲69700個字符*」 - 我不同意。擺脫'.read（69700）'，一切都會好的。 –

什麼揭掉會想在這裏做的是讀取和處理的數據塊的數據，例如：

import urllib2 
f = urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt') 
while True: 
    next_chunk = f.read(4096) #read next 4k 
    if not next_chunk: #all data has been read 
     break 
    process_chunk(next_chunk) #arbitrary processing 
f.close()

來源

2013-10-02 17:09:22 Claudiu

簡單的方式工作得很好：

如果要檢查由行的文件行：

for line in urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt'): 
    # Do something, like maybe print the data: 
    print line,

或者，如果你想下載的所有數據：

data = urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt') 
data = data.read() 
sys.stdout.write(data)

來源

2013-10-02 17:25:30

給定一個文本文件的URL，讀取含有大量數據的文本文件的內容的最簡單方法是什麼？

回答

相關問題