2012-05-19 55 views
0

我正在研究解析文本文件的腳本,嘗試將其標準化爲足以將其插入到數據庫中。數據表示由1位或更多作者撰寫的文章。我遇到的問題是因爲沒有固定數量的作者,所以我在輸出文本文件中獲得可變數量的列。例如。在Python列表中插入值

author1, author2, author3, this is the title of the article 
author1, author2, this is the title of the article 
author1, author2, author3, author4, this is the title of the article 

這些結果給我5.最大列數因此,對於第2篇文章中,我將需要添加空白列,以便輸出具有偶數列。什麼是最好的方法來做到這一點?我的輸入文本是製表符分隔的,我可以通過在選項卡上分割來很容易地遍歷它們。

+0

假設文章標題始終是列表的最後一項,是否安全?另外,你嘗試過什麼方法? –

+0

我有它與變量列數工作,但這不會做。我需要有一定數量的列。我已經建立了列表並嘗試添加到列表中,但我堅持在列表中添加空白項。 – aeupinhere

+0

這是我站在... http://pastebin.com/A2CT97s9 – aeupinhere

回答

2

假設你已經有了最大數量的列,並且已經將它們分成了列表(我假設你把它們放到了自己的列表中),你應該可以使用list.insert( - 1,項目)來添加空列:

def columnize(mylists, maxcolumns): 
    for i in mylists: 
     while len(i) < maxcolumns: 
      i.insert(-1,None) 

mylists = [["author1","author2","author3","this is the title of the article"], 
      ["author1","author2","this is the title of the article"], 
      ["author1","author2","author3","author4","this is the title of the article"]] 

columnize(mylists,5) 
print mylists 

[['author1', 'author2', 'author3', None, 'this is the title of the article'], ['author1', 'author2', None, None, 'this is the title of the article'], ['author1', 'author2', 'author3', 'author4', 'this is the title of the article']] 

不破壞原來的列表,列表解析法替代版本:

def columnize(mylists, maxcolumns): 
    return [j[:-1]+([None]*(maxcolumns-len(j)))+j[-1:] for j in mylists] 

print columnize(mylists,5) 

[['author1', 'author2', 'author3', None, 'this is the title of the article'], ['author1', 'author2', None, None, 'this is the title of the article'], ['author1', 'author2', 'author3', 'author4', 'this is the title of the article']] 
1

原諒我,如果我誤解了,但它聽起來像你正在以一種困難的方式接近這個問題。這是很容易的文本文件轉換成映射標題一套作者的字典:

>>> lines = ["auth1, auth2, auth3, article1", "auth1, auth2, article2","auth1, article3"] 
>>> d = dict((x[-1], x[:-1]) for x in [line.split(', ') for line in lines]) 
>>> d 
{'article2': ['auth1', 'auth2'], 'article3': ['auth1'], 'article1': ['auth1', 'auth2', 'auth3']} 
>>> total_articles = len(d) 
>>> total_articles 
3 
>>> max_authors = max(len(val) for val in d.values()) 
>>> max_authors 
3 
>>> for k,v in d.iteritems(): 
...  print k 
...  print v + [None]*(max_authors-len(v)) 
... 
article2 
['auth1', 'auth2', None] 
article3 
['auth1', None, None] 
article1 
['auth1', 'auth2', 'auth3'] 

然後,如果你真的想,你可以輸出這個數據使用的內置到Python的csv module。或者,您可以直接輸出您將需要的SQL。

您正在多次打開同一個文件並多次讀取它,以獲取可從內存中的數據導出的計數。爲了這些目的,請勿多次讀取文件。