2016-01-23 23 views
1

我正在使用藝術代碼(如果這是可能的?),我試圖檢索第三組使用findall正則表達式。我讀了findall的官方文檔,發現它返回的元組有點欠缺,我想傳遞一個返回第三組的標誌,而不是3個組的元組(前兩個是佔位符)。什麼是最有效的方法來鏈接返回唯一的名字(第三組)而不是後來迭代?如何使用標誌來返回只有一組使用findall-Python

import re, requests 

rgx = r"([<][TDtd][>])|(target[=]new[>])(?P<the_deceased>[A-Z].*?)[,]" 

urls = {2013: "http://www.killedbypolice.net/kbp2013.html", 
     2014: "http://www.killedbypolice.net/kbp2014.html", 
     2015: "http://www.killedbypolice.net/" } 

names_of_the_dead = [] 

for url in urls.values(): 
    response = requests.get(url) 
    content = response.content 
    people_killed_by_police_that_year_alone = re.findall(rgx, content) 
    for dead_person in people_killed_by_police_that_year_alone: 
     names_of_the_dead.append(dead_person) 

#dead_americans_as_string = ",".join(names_of_the_dead) 
#print("RIP, {} since 2013:\n".format(len(names_of_the_dead))) 
#print(dead_americans_as_string) 

In [67]: names_of_the_dead 
Out[67]: 
[('', 'target=new>', 'May 1st - Dec 31st'), 
('', 'target=new>', 'Ricky Junior Toney'), 
('', 'target=new>', 'William Jackson'), 
('', 'target=new>', 'Bethany Lytle'), 
('', 'target=new>', 'Christopher George'), 

回答

1

將第一和第二捕獲組變成非捕獲組。

rgx = r"(?:[<][TDtd][>])|(?:target[=]new[>])(?P<the_deceased>[A-Z].*?)[,]" 
+0

看起來不錯,還是讓我來試試吧阿維納什 – codyc4321

1

因爲這是你解析,爲什麼不使用它的專用工具的HTML數據 - 一個HTML解析器,像BeautifulSoup。這個想法是遍歷錶行,並獲得第4列文:

import requests 
from bs4 import BeautifulSoup 


urls = {2013: "http://www.killedbypolice.net/kbp2013.html", 
     2014: "http://www.killedbypolice.net/kbp2014.html", 
     2015: "http://www.killedbypolice.net/" } 

names_of_the_dead = [] 

for url in urls.values(): 
    response = requests.get(url) 
    soup = BeautifulSoup(response.content, "html.parser") 

    for row in soup.select("table tr")[2:]: 
     cells = row.find_all("td") 
     if len(cells) > 3: 
      names_of_the_dead.append(cells[3].text.split(",")[0].strip()) 

print(names_of_the_dead) 
+0

這就是我想要的亞歷克。我不認爲那些額外的700人應該白白死去,不要忘記他們的名字。 gracias mucho – codyc4321

+0

由於某種原因,它不工作。我將在本週工作 In [17]:print(names_of_the_dead) [] – codyc4321

相關問題