首先,該正則表達式是搞砸\d+
的意思是「一個或多個數字」他們爲什麼鏈三個人在一起?此外,您還需要使用「原始字符串」對於這種作爲\
的模式被視爲轉義字符讓你的模式沒有得到適當的建造。你想要將其更改爲re.search(r'"status":\d+}}', d)
。其次,如果塊中有兩條換行符,則d.split()
行可能會選取錯誤的\n
。
你甚至都不需要的正則表達式這一點,好醇」 Python字符串搜索/切片是綽綽有餘,以確保您得到您的分隔符右:
logs = [] # store for our individual entries
buffer = [] # buffer for our partial chunks
for chunk in r.iter_content(chunk_size=25): # read chunk-by-chunk...
eoe = chunk.find("}}\n") # seek the guaranteed event delimiter
while eoe != -1: # a potential delimiter found, let's dig deeper...
value_index = chunk.rfind(":", 0, eoe) # find the first column before it
if eoe-1 >= value_index >= eoe-4: # woo hoo, there are 1-3 characters between
try: # lets see if it's a digit...
status_value = int(chunk[value_index+1:eoe]) # omg, we're getting there...
if chunk[value_index-8:value_index] == '"status"': # ding, ding, a match!
buffer.append(chunk[:eoe+2]) # buffer everything up to the delimiter
logs.append("".join(buffer)) # flatten the buffer and write it to logs
chunk = chunk[eoe + 3:] # remove everything before the delimiter
eoe = 0 # reset search position
buffer = [] # reset our buffer
except (ValueError, TypeError): # close but no cigar, ignore
pass # let it slide...
eoe = chunk.find("}}\n", eoe + 1) # maybe there is another delimiter in the chunk...
buffer.append(chunk) # add the current chunk to buffer
if buffer and buffer[0] != "": # there is still some data in the buffer
logs.append("".join(buffer)) # add it, even if not complete...
# Do whatever you want with the `logs` list...
它看起來複雜,但它實際上是,如果很容易你讀它一行行,你就會有(在同一塊來考慮潛在的多個事件的分隔符)做一些複雜(重疊比賽和這樣)用正則表達式匹配,太。
對於初學者來說,什麼是'\ d + \ d + \ d +'的意義 - 這是'\ d +'在某種程度上打破一些正則表達式引擎掩飾。 – zwer