2016-07-28 38 views
2

我正在嘗試編寫一個Python腳本,從其DNS中提取所有Google雲計算子網。關於這個更多資訊:使用正則表達式從單行文本中提取主機名

https://cloud.google.com/compute/docs/faq#where_can_i_find_short_product_name_ip_ranges

到目前爲止,我能夠拉個人主機名的TXT記錄列表,也沒有問題的即basestring。

import dns.resolver 

# Set the resolver 
my_resolver = dns.resolver.Resolver() 
my_resolver.nameservers = ['8.8.8.8'] 

answer = my_resolver.query('_cloud-netblocks.googleusercontent.com', 'TXT') 

for rdata in answer: 
    for txt_string in rdata.strings: 
     txt_record = txt_string 

這給我留下的

v=spf1 include:_cloud-netblocks1.googleusercontent.com include:_cloud-netblocks2.googleusercontent.com include:_cloud-netblocks3.googleusercontent.com include:_cloud-netblocks4.googleusercontent.com include:_cloud-netblocks5.googleusercontent.com ?all 

字符串我想什麼做的是使用re.match從這個最初的反應提取5名的主機名,所以我可以做到連續查找並去掉子網然後把它們放入一個數組中。到目前爲止,我所有使用正則表達式的努力都沒有那麼好......我想知道是否有人會提供一些指導?謝謝!

編輯:

這裏是爲別人有需要收集所有谷歌的雲IP地址的完整劇本。

import dns.resolver, re 

# Set the resolver 
my_resolver = dns.resolver.Resolver() 
my_resolver.nameservers = ['8.8.8.8'] 

answer = my_resolver.query('_cloud-netblocks.googleusercontent.com', 'TXT') 

for rdata in answer: 
    for txt_string in rdata.strings: 
     txt_record = txt_string 

# Extract hostnames into array 
hostnames = [x.split(":")[1] for x in txt_record.split() if ":" in x] 
total_subnets = [] 

for host in hostnames: 
    answer = my_resolver.query(host, 'TXT') 

    for rdata in answer: 
     for txt_string in rdata.strings: 
      txt_record = txt_string 

    ip4_subnets = re.findall(r'ip4:(\S+)', txt_record) 
    ip6_subnets = re.findall(r'ip6:(\S+)', txt_record) 

    for subnet in ip4_subnets: 
     total_subnets.append(subnet) 

    for subnet in ip6_subnets: 
     total_subnets.append(subnet) 

print total_subnets 

回答

1

你並不需要使用正則表達式這一點,使用split兩次和理解:

s = "v=spf1 include:_cloud-netblocks1.googleusercontent.com include:_cloud-netblocks2.googleusercontent.com include:_cloud-netblocks3.googleusercontent.com include:_cloud-netblocks4.googleusercontent.com include:_cloud-netblocks5.googleusercontent.com ?all" 
print([x.split(":")[1] for x in s.split() if ":" in x]) 
# => ['_cloud-netblocks1.googleusercontent.com', 
#  '_cloud-netblocks2.googleusercontent.com', 
#  '_cloud-netblocks3.googleusercontent.com', 
#  '_cloud-netblocks4.googleusercontent.com', 
#  '_cloud-netblocks5.googleusercontent.com'] 

demo here

詳細

  • s.split() - 與空間分割
  • if ":" in x - 只得到那些條目與:
  • x.split(":")[1] - 分裂與:上面的條目,並獲得第二塊

當然,如果你願意,你可以使用正則表達式:

include:(\S+) 

請參閱demo

這將匹配include:,並將1個非空白符號捕獲到組1中。re.findall將取您的列表(re.findall(r'include:(\S+)', s))。

+1

非常好!工作很棒!非常感謝! – user2421173