1
我有一個文本,其中包含有關恐怖襲擊的不同新聞文章。每篇文章都以html標籤開頭(<p>Advertisement
),我想從每篇文章中提取一個具體信息:在恐怖襲擊中受傷的人數。正則表達式和csv |輸出更可讀
這是文本文件的樣本,以及如何將物品分開:
[<p>Advertisement , By MILAN SCHREUER and ALISSA J. RUBIN OCT. 5, 2016
, BRUSSELS — A man wounded 2 police officers with a knife in Brussels around noon on Wednesday in what the authorities called 「a potential terrorist attack.」 , The two officers were attacked on the Boulevard Lambermont.....]
[<p>Advertisement ,, By KAREEM FAHIM and MOHAMAD FAHIM ABED JUNE 30, 2016
, At least 33 people were killed and 25 were injured when the Taliban bombed buses carrying police cadets on the outskirts of Kabul, Afghanistan, on Thursday. , KABUL, Afghanistan — Taliban insurgents bombed a convoy of buses carrying police cadets on the outskirts of Kabul, the Afghan capital, on Thursday, killing at least 33 people, including four civilians, according to government officials and the United Nations. , During a year...]
這是到目前爲止我的代碼:
text_open = open("News_cleaned_definitive.csv")
text_read = text_open.read()
splitted = text.read.split("<p>")
pattern= ("wounded (\d+)|(\d+) were wounded|(\d+) were injured")
for article in splitted:
result = re.findall(pattern,article)
,我得到的輸出是:
[]
[]
[]
[('', '40', '')]
[('', '150', '')]
[('94', '', '')]
我想使輸出更具可讀性,然後將其另存爲csv文件:
article_1,0
article_2,0
article_3,40
article_3,150
article_3,94
有關如何使其更具可讀性的任何建議?
這正是我所尋找的。我想知道如何正確保存它? '打開(「wounded.csv」,「w」,newline =「」)爲f: writer = csv.writer(f,delimiter =「,」) writer.writerows([row])' –
you got它幾乎正確!讓我在我的回答中爲你編輯它,讓它更乾淨。 –
好吧,一個月後,我可以說你是我最喜歡的管理員。 –