如何從文本文件中的任何位置提取域？

這裏是我到目前爲止已經試過：如何從文本文件中的任何位置提取域？

import re 

with open('text.txt', 'r') as fh: 
    re.findall(r'^[a-z0-9]([a-z0-9-]+\.){1,}[a-z0-9]+\Z"',fh.readline()) 
print(p)

我想從這個文件中提取的域或網址：File link
我想知道我怎麼能做到這一點使用正則表達式的方法。
請建議。

來源

2017-03-25 Jaffer Wilson

是有線之間的換行符？ – RomanPerekhrest

是在一條線上。我試圖從文件中提取所有域。可能發生的情況是，一行有兩個域需要提取它們。請你幫忙嗎？ –

上述文件的每一行看起來都非常像JSON編碼字典。
所以它是json模塊一個很好的例子：

import json 

with open("text.txt", "r") as fh: 
    domains = [] 
    for l in fh.readlines(): 
     d = json.loads(l) 
     domains.append(d["name"]) 
     # some url domains are located in `value` key for the records which have "type":"cname" 
     if (d["type"] == "cname"): domains.append(d["value"]) 

print(domains)

輸出：

['mail.callfieldcompanion.com', 'reseauocoz.cluster007.ovh.net', 'cluster007.ovh.net', 'ghs.googlehosted.com', 'googlehosted.l.googleusercontent.com', 'isutility.web9.hubspot.com', 'a1049.b.akamai.net', 'plato.mx25.net']

如果輸入文件包含一行用下面的辦法：

import json, re 

with open("text.txt", "r") as fh: 
    domains = [] 
    # emulating the list of dictionaries 
    line = "[" + re.sub(r'\}\s*\{', '},{',fh.read()) + "]" 
    l = json.loads(line) 
    for d in l: 
     domains.append(d["name"]) 
     # some url domains are located in `value` key for the records which have "type":"cname" 
     if (d["type"] == "cname"): domains.append(d["value"]) 

print(domains)

來源

2017-03-26 16:56:43 RomanPerekhrest

讓我試試這個我的朋友。 –

我得到了這個錯誤'回傳（最近呼叫最後）：文件「test.py」，第6行，在 d = json.loads（l）文件「/usr/lib/python3.5/json/ __init__.py「，第319行，載入中 return _default_decoder.decode（s）文件」/usr/lib/python3.5/json/decoder.py「，第342行，解碼引發JSONDecodeError（」Extra data「，s，end） json.decoder.JSONDecodeError：額外數據：第1行98（char 97） ' –

@JafferWilson，請記住我的問題*是否有換行符？*？根據你的答案*是在一條線上。*我認爲每一條線都是分開的。如果每條線都以換行符結束，它將起作用。（也許，這是誤解） – RomanPerekhrest

如何從文本文件中的任何位置提取域？

回答

相關問題