Python的請求 - 分塊流

由於故障服務器的設計，我不得不流下來JSON，如果我找到一個糾正一個空字節。我正在使用python requests來做到這一點。每個JSON事件由\n分隔。我在這裏試圖做的是拉下一塊（總是小於一個日誌行）。通過該塊搜索事件能指結束符（"\"status\":\d+\d+\d+}}\n"）。Python的請求 - 分塊流

如果那個指示符在那裏，我會用完整的JSON事件做一些事情，如果不是的話，我把這個塊添加到緩衝區中，然後抓住下一個塊並尋找標識符。只要我把它弄下來，我會開始搜索空字節。

b = "" 

for d in r.iter_content(chunk_size=25): 

    s = re.search("\"status\":\d+\d+\d+}}\n", d) 

    if s: 
     d = d.split("\n", 1) 
     fullLogLine = b + d[0] 
     b = d[1] 
    else: 
     b = b + d

在這種情況下，我完全失去了b的價值。它似乎沒有通過iter_content繼續。每當我嘗試打印b的值時，它都是空的。我覺得我在這裏錯失了一些明顯的東西。任何幫助。謝謝。

來源

2017-06-13 HectorOfTroy407

對於初學者來說，什麼是'\ d + \ d + \ d +'的意義 - 這是'\ d +'在某種程度上打破一些正則表達式引擎掩飾。 – zwer

首先，該正則表達式是搞砸\d+的意思是「一個或多個數字」他們爲什麼鏈三個人在一起？此外，您還需要使用「原始字符串」對於這種作爲\的模式被視爲轉義字符讓你的模式沒有得到適當的建造。你想要將其更改爲re.search(r'"status":\d+}}', d)。其次，如果塊中有兩條換行符，則d.split()行可能會選取錯誤的\n。

你甚至都不需要的正則表達式這一點，好醇」 Python字符串搜索/切片是綽綽有餘，以確保您得到您的分隔符右：

logs = [] # store for our individual entries 
buffer = [] # buffer for our partial chunks 
for chunk in r.iter_content(chunk_size=25): # read chunk-by-chunk... 
    eoe = chunk.find("}}\n") # seek the guaranteed event delimiter 
    while eoe != -1: # a potential delimiter found, let's dig deeper... 
     value_index = chunk.rfind(":", 0, eoe) # find the first column before it 
     if eoe-1 >= value_index >= eoe-4: # woo hoo, there are 1-3 characters between 
      try: # lets see if it's a digit... 
       status_value = int(chunk[value_index+1:eoe]) # omg, we're getting there... 
       if chunk[value_index-8:value_index] == '"status"': # ding, ding, a match! 
        buffer.append(chunk[:eoe+2]) # buffer everything up to the delimiter 
        logs.append("".join(buffer)) # flatten the buffer and write it to logs 
        chunk = chunk[eoe + 3:] # remove everything before the delimiter 
        eoe = 0 # reset search position 
        buffer = [] # reset our buffer 
      except (ValueError, TypeError): # close but no cigar, ignore 
       pass # let it slide... 
     eoe = chunk.find("}}\n", eoe + 1) # maybe there is another delimiter in the chunk... 
    buffer.append(chunk) # add the current chunk to buffer 
if buffer and buffer[0] != "": # there is still some data in the buffer 
     logs.append("".join(buffer)) # add it, even if not complete... 

# Do whatever you want with the `logs` list...

它看起來複雜，但它實際上是，如果很容易你讀它一行行，你就會有（在同一塊來考慮潛在的多個事件的分隔符）做一些複雜（重疊比賽和這樣）用正則表達式匹配，太。

來源

2017-06-13 23:46:19 zwer

哇。這是我收到的最全面的答案。希望我能給你更多的分數！ – HectorOfTroy407

Python的請求 - 分塊流

回答

相關問題