不能將多行文本添加到列表中作爲一個項目

我正在嘗試爲了調試目的而通過發佈scrapy抓取輸出列表。不能將多行文本添加到列表中作爲一個項目

這裏是我的代碼：

post_list = [] 

with open('last_crawl_output.txt','r') as f: 
    crawl_output = f.read() 

# Find first 'referer' that indicates start of scrapy crawl AFTER initial crawl of search results page 
iter = re.finditer("referer", crawl_output) 
referer_list = [m.start(0) for m in iter] 

# Find indicator of crawl finished. 
iter2 = re.finditer("scrapy", crawl_output) 
closing_list = [m.start(0) for m in iter2] 

del referer_list[0] 

pos1 = referer_list[0] 

for pos1 in referer_list: 
    # Get largest scrapy index after each referer index. 
    pos2_index = bisect.bisect(closing_list, pos1) 
    # Get post from positions. 
    pos2 = closing_list[pos2_index+1] 
    post = crawl_output[pos1:pos2-21]

我使用post_list.append(post)也試過了，沒有用。

下面是一些示例輸出。

我想添加到post_listhere

一個字符串，這是我得到的替代。這裏是post_list與帖子說：output

當我使用插入，它通過\n

來源

2015-10-19 Manix

你能提供一個'referer_list'和'closing_list'的例子嗎？我也有點困惑，你爲什麼不寫一個正則表達式，一次性查找開始和結束指示符（例如'post_list = re.findall（「referrer。*？scrapy」，crawl_output）'）。 – Blckknght

@Blckknght我是一個完全noob，所以我只是這樣做，我知道如何。我已經更新了這個問題。正則表達式是否允許像你一樣在一行中？ – Manix

我敢肯定，你可以想出一個匹配你正在尋找的正則表達式，雖然我懷疑我在我的評論中提供的是不是它（它在'referrer'後面是第一個'scrapy'引用結束髮現，而不是第二）。 – Blckknght

分離，我決定來解決這個名單問題像這樣我的方式：

# Splits post by newline, adds to list 
post_lines = post.split('\n') 

# Add the words "Next Post" to differentiate each post. 
post_lines.append('Next Post') 

# Print each line, and get perfect formatting. 
for line in post_lines: 
    print line

來源

2015-10-21 20:09:34 Manix

更好的解決方案應該將帖子添加到字典中。這保持格式化並使用較少的代碼。

post_count = 0 
post_dict = {} 

for pos1 in referer_list: 

    post_count += 1 

    pos2_index = bisect.bisect(closing_list, pos1) 
    pos2 = closing_list[pos2_index+1] 

    post = crawl_output[pos1:pos2-21] 

    post_dict[post_count] = post

來源

2015-10-21 20:23:51 Manix

不能將多行文本添加到列表中作爲一個項目

回答

相關問題