2017-09-12 24 views
0

我從一個論壇(使用Python)webscraped一些數據,並在字典中,看起來像這樣還冒出它:從webscraped評論生成分層文件,並輸出爲JSON

thread = {"1.Init_Post": init_post, 
     "2.Time_Posted": time_posted, 
     "3.URL": url, 
     "4.Discussion_Posts": discussion_posts, 
     "5.Discussion_Post_Times": post_dates} 

它包含了初始帖子,發佈初始帖子的時間,原始帖子的URL,相應的討論帖以及每個討論帖的發佈時間。

從一個討論輸出的一個例子是:

{'1.Init_Post': u'I purchased a piece of land over 12 years ago which did not come with any title guarantee. I now wish to register this with the land registry. Does anyone know how I do this please? thanks so much', 
'2.Time_Posted': '17/08/17 22:47', 
'3.URL': 'http://www.thelawforum.co.uk/how-register-land-unregistered-title', 
'4.Discussion_Posts': [u'How did you manage that? Registration has been compulsory for years. https://www.gov.uk/government/publications/first-registrations/practice-...', 
    u'I read that it had to be done within 3 months and that regime started in 1998? i have another look at the emails from the solicitor when it was purchased in 2005. The solicitor said the land registry had refused to register the land because its previous use of grazing livestock and cutting hay was not sufficiently strong to warrant granting of title. So we purchased indemnity insurance and was told to wait 10/12 years before trying again. was this advice incorrect? thanks', 
    u'sounds about right. Try with LR again.', 
    u'Registration has been must for any land you buy from someone. How did you manage this issue from last 12 years. You need to consult a good lawyer. Or need to create documents as soon as possible.'], 
'5.Discussion_Post_Times': ['18/08/17 08:19', 
    '18/08/17 09:42', 
    '18/08/17 13:25', 
    '02/09/17 06:14']} 

我要的是一個分層文件(我可以變成一個JSON),它看起來像這樣:

{'1.Init_Post': u'I purchased a piece of land over 12 years ago which did not come with any title guarantee. I now wish to register this with the land registry. Does anyone know how I do this please? thanks so much', 
'2.Time_Posted': '17/08/17 22:47', 
'3.URL': 'http://www.thelawforum.co.uk/how-register-land-unregistered-title', 
'4.Discussion':[ 
    {'a.Discussion_Post':u'How did you manage that? Registration has been compulsory for years. https://www.gov.uk/government/publications/first-registrations/practice-...', 
    'b.Discussion_Post_Time':'18/08/17 08:19'}, 
    {'a.Discussion_Post':u'I read that it had to be done within 3 months and that regime started in 1998? i have another look at the emails from the solicitor when it was purchased in 2005. The solicitor said the land registry had refused to register the land because its previous use of grazing livestock and cutting hay was not sufficiently strong to warrant granting of title. So we purchased indemnity insurance and was told to wait 10/12 years before trying again. was this advice incorrect? thanks', 
    'b.Discussion_Post_Time':'18/08/17 09:42'}, 
    {'a.Discussion_Post':u'sounds about right. Try with LR again.', 
    'b.Discussion_Post_Time':'18/08/17 13:25'}, 
    {'a.Discussion_Post':'Registration has been must for any land you buy from someone. How did you manage this issue from last 12 years. You need to consult a good lawyer. Or need to create documents as soon as possible.', 
    'b.Discussion_Post_Time':'02/09/17 06:14'} 
] 
} 

我看過這個問題:Translate a table to a hierarchical dictionary?。但我認爲我可以比將它轉換成表格然後轉換成分層結構更有效率。任何建議如何欣賞!

+0

你需要表現出一定的努力吧。如何[將兩個列表添加到字典中](https://stackoverflow.com/questions/18502197/how-to-add-two-lists-into-dictionary)並將其嵌入到「討論」下的原始字典中鍵?或者將它們追加到您的案例中的列表中... – nutmeg64

回答

1

您可以在zip這兩個鍵中新建一個'4.Discussion'鍵,然後刪除它們。

thread['4.Discussion'] = [ 
    {'a.Discussion_Post':i[0], 'b.Discussion_Post_Time':i[1]} 
    for i in zip(thread['4.Discussion_Posts'], thread['5.Discussion_Post_Times']) 
] 
del thread['4.Discussion_Posts'] 
del thread['5.Discussion_Post_Times'] 
0

這會幫助你:

'4.Discussion': [{'a.Discussion_Post': i, 'b.Discussion_Post_Time': j} for i, j in zip(out['4.Discussion_Posts'], out['5.Discussion_Post_Times'])]