2014-09-05 163 views
0

我是新來解析json字符串的信息。我用json.loads來分析一塊文本,但是我很難弄清楚如何得到標題。解析Python中的json字符串

下面的代碼:

from alchemyapi import AlchemyAPI 
import json 

alchemyapi = AlchemyAPI() 

def run_alchemy_api(articleurl): 
    response = alchemyapi.entities('url',articleurl, { 'showSourceText':1, 'sourceText':'xpath', 'xpath':'//*[contains(@class,"title may-blank")][1]' }) 
    if response['status'] == 'OK': 
     print('## Response Object ##') 
     print(json.dumps(response, indent=4)) 
     json_string = json.dumps(response, indent=4) 
     titles = json.loads(json_string) 
     print('This is the decode test,') 
     print titles # <---- this is what I want to organize into a list 
    else: 
     print('Error in entity extraction call: ', response['statusInfo']) 

run_alchemy_api('http://www.reddit.com/r/worldnews/') 

我只是想解析u'text」類別,但這是輸出的部分列表:

{u'status': u'OK', u'language': u'english', u'text': u'Lego is now the world\u2019s largest toymaker, as kids choose bricks over Barbie\n\nAfter convincing China to give up shark fin soup, Yao Ming sets out to save Africa\'s elephants from the ivory trade\n\nThree top ISIS lieutenants killed in US bombing raid\n\nAnonymous Really Wants a Cyberwar with the Islamic State\n\nBP found \'grossly negligent\' in 2010 Gulf oil spill\n\nA group of indigenous people in Brazil\'s Amazon region have detained and expelled loggers working illegally in their ancestral lands.\n\nAnti-ISIS flag-burning campaign launched by a trio of fearless Lebanese teens have ignited an Internet anti-terror sensation\n\nNova Scotia to ban fracking\n\nWHO and others criticised by numerous experts for misleading the public by publishing an ignorant and alarmist report into E-Cigarettes.\n\nRussia warns NATO not to offer membership to Ukraine\n\nKorean 20 year old dies in military service after a month of systematic beating, military is accused of covering up bullying\n\nNATO Chief to Russia: Pull Troops From Ukraine\n\nLarge asteroid to pass "very close" to Earth on Sunday\n\nNew dinosaur discovered! Ancient behemoth: Meet Dreadnoughtus, a supermassive dino\n\nThe U.N. nuclear watchdog said it has seen releases of steam and water indicating that North Korea may be operating a reactor, in the latest update on a plant that experts say could make plutonium for atomic bombs.\n\nWorld-first experiment achieves direct brain-to-brain communication in human subjects\n\nNATO allies to supply Ukraine with lethal military equipment\n\nUS doctor infected with Ebola heading to Nebraska\n\nNorth Korea\'s suicide rate among worst in world, says WHO report\n\nIslamic State Using Leaked Snowden Info To Evade Intelligence - U.S. Military Official Said Most Mid-Level And High-Ranking Islamic State Operators Have Virtually Disappeared, Giving No Hint As To Their Whereabouts Or Actions.\n\nEbola epidemic in West Africa is outpacing current responses.\u201cThe window of opportunity to stop Ebola from spreading widely throughout Africa and becoming a global threat for years to come is closing, but it is not yet closed,\u201d\n\nGrim Ebola Prediction: Outbreak Is Unstoppable for Now, MD Says\n\nFor the first time, scientists glimpse inside the cosmic nursery to see baby planets form\n\nCanadian beekeepers sue Bayer, Syngenta over neonicotinoid pesticides for over $400 million\n\nUkraine army on alert to repel possible rebel attack near Mariupol - military source', u'entities': [{u'relevance': u'0.803767', u'count': u'4', u'type': u'Country', u'text': u'Ukraine'}, {u'relevance': u'0.671762', u'count': u'3', u'type': u'Organization', u'disambiguated': {u'website': u'http://www.natoonline.org/', u'yago': u'http://yago-knowledge.org/resource/National_Association_of_Theatre_Owners', u'name': u'National Association of Theatre Owners', u'freebase': u'http://rdf.freebase.com/ns/m.031hx_', u'subType': [], u'dbpedia': u'http://dbpedia.org/resource/National_Association_of_Theatre_Owners'}, u'text': u'NATO'}, {u'relevance': u'0.564646', u'count': u'3', u'type': u'HealthCondition', u'text': u'Ebola'}, {u'relevance': u'0.543892', u'count': u'3', u'type': u'Region', u'text': u'West Africa'}, {u'relevance': u'0.521051', u'count': u'2', u'type': u'FieldTerminology', u'text': u'military equipment'}, {u'relevance': u'0.491148', u'count': u'2', u'type': u'Country', u'disambiguated': {u'website': u'http... and so on 

如何去只是提取u'text'的標題到這樣的東西?

articles = [Lego is now the world\u2019s largest toymaker, as kids choose bricks over Barbie, After convincing China to give up shark fin soup, Yao Ming sets out to save Africa\'s elephants from the ivory trade ... etc.] 
+1

爲什麼你轉儲然後重新加載響應?你可以直接使用響應。 – tdelaney 2014-09-05 08:37:16

回答

1

它看起來像你的文字標題是由兩個新行(unix風格)分裂。所以你必須從你的response字典中提取文本密鑰(不要將它轉換成json並回到python)並將其分割成它的標題。

text = response['text'] 
titles = text.split('\n\n') 
+0

太近了!現在我得到了''''樂高現在是世界上最大的玩具製造商,因爲孩子們選擇了比芭比磚',「在說服中國放棄魚翅湯之後,姚明着手將非洲大象從象牙貿易中拯救出來」 ,u'3名ISIS中尉在美國爆炸襲擊中喪生',u'Anonymous真的想要與伊斯蘭國家交戰',' – 2014-09-05 08:57:55

+0

你究竟想要什麼?這個? 「你已經是世界上最大的玩具製造商了,」孩子們選擇比芭比磚塊,「在說服中國放棄魚翅之後,姚明開始着手拯救非洲的大象,象牙貿易「,u'Three頂尖ISIS中尉在美國爆炸襲擊中遇難',u'Anonymous真的想要與伊斯蘭國家交戰',...]' – semptic 2014-09-05 09:01:16

+0

我想擺脫你。因此:(樂高現在是世界上最大的玩具製造商,因爲孩子們選擇了比芭比磚)'在說服中國放棄魚翅湯之後,姚明着手拯救非洲的大象來自象牙貿易「等。] – 2014-09-05 09:02:25

0

解析JSON後,需要手動提取text這樣的:

json.loads(json_string).get('text')

如果用巨大的JSON文件時,請儘量使用迭代JSON解析器 - ijson

+0

它只是一個字典和標準訪問是'json.loads(json_string)['text']' – tdelaney 2014-09-05 08:38:22

+0

謝謝,但它回到原來的格式,像這樣'[u'Lego現在是世界上最大的玩具製造商,因爲孩子們選擇在說服中國放棄魚翅之後,姚明着手拯救非洲大象免於象牙貿易\ n \ n ...]'我試圖讓[Title 1,標題2等] – 2014-09-05 08:42:05

0

響應中心一個python字典和'文本'是它的關鍵之一。只要使用它。有很多方法可以製作清單。一種是通過一個列表並添加成功標題。

def run_alchemy_api(articleurl, article_list): 
    response = alchemyapi.entities('url',articleurl, { 'showSourceText':1, 'sourceText':'xpath', 'xpath':'//*[contains(@class,"title may-blank")][1]' }) 
    if response['status'] == 'OK': 
     print(response['text']) 
     article_list.append(response['text']) 
    else: 
     print('Error in entity extraction call: ', response['statusInfo']) 


urls = [ 'url1', ...] 
titles = [] 
for url in urls: 
    run_alchmy_api(url, titles)