Python的這些標準

我試圖清理一些規定 - 文本文件處理文本文件。我是一個Python的新手，對我來說如此光禿禿的。Python的這些標準

我的文字這樣表示

NHIST_0003 (ZS.MC.BGE.0424SPVCOS) (21.12) 14.08
(ZS.MC.BLK.0424SPVCOS) (21.12) 14.08
(ZS.MC.GRY.0424SPVCOS) (21.12) 14.08
(ZS.MC.BLK.0525SPVCOS3) (21.12) 14.08
(ZS.MC.GRY.0525SPVCOS2) (21.12) 14.08
NHIST_0004 (ZS.MC.BGE.0424SPVCOS) (21.12) 14.08

我需要刪除任何文本是未來第一「（」括號如果行有之前的任何文本以及刪除我要的文本圓括號保管。我還需要進去幹掉的數字與括號。看着行號一個，我只是想保持

ZS.MC.BGE.0424SPVC0S 14.08

這是我想出了試圖把事情的代碼。我寧願不要使用重新表達式，因爲在這個階段對我來說太過於進步

fileName='reach.txt' 
fileName2='outreach.txt' 


while True: 
    f=open(fileName,'r') 
    for words in f: 
     x=words.split('(', 1)[-1] 
     g = open(fileName2,'w') 
     g.write(x) 
     g.close()

此循環是無限的。我認爲通過關閉文件，我告訴系統停止處理生產線。

任何幫助，將不勝感激

感謝

來源

2014-04-08 weemo

'開放的（文件， 'R'）作爲FH：在FH行：行[：row.find（ '（'）]'或者只是做'row.split（）'拿走你想要的東西。例如'x = row.split（）'和'x [1]，x [3]' – Torxed

，但是即使文本文件是x = row.split（）和x [1]，x [3]沒有格式化全部相同？ – weemo

它沒有，所以我重新編寫代碼來查找'（...）'，然後取出行中的最後一項，因爲這看起來是一致的。 – Torxed

可以遍歷在這樣的文件中的行：

with open('filename.txt') as f: 
    for line in f.readlines(): 
     #do stuff

要採取從一條線，你想要的信息，你可以這樣做：

cleaned = [] 
items = line.split() 
for item in items: 
    if item.startswith('(') and item.endswith(')'): 
     cleaned.append(item.strip('()')) 
     break 
cleaned.append(items[-1]) 
cleaned = ' '.join(cleaned)

全部程序：

in_file = 'reach.txt' 
out_file = 'outreach.txt' 

def clean(string): 
    if not string: 
     return string 

    cleaned = [] 
    items = string.split() 
    for item in items: 
     if item.startswith('(') and item.endswith(')'): 
      cleaned.append(item.strip('()')) 
      break 
    cleaned.append(items[-1]) 
    return ' '.join(cleaned) 

with open(in_file) as i, open(out_file, 'w') as o: 
    o.write('\n'.join([clean(line) for line in i]))

來源

2014-04-08 22:42:14

或只是'對於f'線，同樣的事情。這也給了語法錯誤，因爲缺少'：'（固定爲你） – Torxed

輝煌！非常感謝。我喜歡你如何寫它。非常可讀和簡單。 – weemo

Scorpion_God喜歡該代碼，但提示出了索引錯誤 – weemo

fileName='reach.txt' 
fileName2='outreach.txt' 

def isfloat(s): 
    try: 
     float(s) 
     return True 
    except ValueError: 
     return False 

g = open(fileName2, 'w') 
with open(fileName, 'r') as fh: 
    for row in fh: 
     x = row.split() 
     for item in x: 
      if '(' in item and ')' in item: 
       first = item.strip('()') 
       break 
     for i in range(-1, 0-len(x), -1): 
      second = x[i] 
      if isfloat(second): 
       break 
     print(first, second) 
     g.write(' '.join((first, second)) + '\n') 
g.close()

其中給出：

ZS.MC.BGE.0424SPVCOS 14.08 
ZS.MC.BLK.0424SPVCOS 14.08 
ZS.MC.GRY.0424SPVCOS 14.08 
ZS.MC.BLK.0525SPVCOS3 14.08 
ZS.MC.GRY.0525SPVCOS2 14.08 
ZS.MC.BGE.0424SPVCOS 14.08

我們去那裏，這段代碼將處理各種故障的數據。例如，如果浮置值不是在最後將被覆蓋，以及，如果所述(...)數據是不固定在可以說，在第二位置，但第一，這將被覆蓋爲好。

來源

2014-04-08 22:35:36 Torxed

你可以嘗試使用正則表達式，如果每行裏有(code you want) (thing you don't want)。

import re 
infile = 'reach.txt' 
outfile = 'outreach.txt' 

with open(infile, 'r') as inf, open(outfile, 'w') as outf: 
    for line in inf: 
     # each line has "* (what you want) (trash) *" 
     # always take first one 
     first = re.findall("(\([A-z0-9\.]*\))", line)[0] 

     items = line.strip().split(" ") 
     second = line[-1] 
     to_write = " ".join((first, second)) 
     outf.write(to_write + "\n")

"(\([A-z0-9\.]*\))"的任意組合（由[ ]*表示）相匹配正則表達式：

字母（A-z），
號碼（0-9），和
週期（\.）

是在側面括號（\(\)）。

從你的例子中，總會有兩個匹配，比如ZS.MC.BLK.0424SPVCOS和21.12。 re.findall將在給定的順序找到這兩個。既然你想要的永遠是第一，抓住與re.findall(regex, line)[0]。

來源

2014-04-08 22:39:51 wflynny

暫時還不能。太向前推進了。閱讀關於它我只是沒有得到通配符 – weemo

有沒有比現在更好的時間學習！ – wflynny

@weemo'.'只是表示任何角色。所以「a ..」將匹配任何以「a」開頭的三個字符串。 –

blacklist = set('1234567890.') 
with open('reach.txt') as infile, open('outreach.txt', 'w') as outfile: 
    for line in infile: 
     line = line.strip() 
     if not line: 
      continue 
     _left, line = line.split("(", 1) 
     parts = [p.rstrip(")").lstrip("(") for p in line.split()] 
     parts = [p for i,p in enumerate(parts) if not all(char in blacklist for char in p) or i==len(parts)-1] 
     outfile.write("%s\n" %(' '.join(parts)))

，跟你前充足reach.txt，我得到

ZS.MC.BGE.0424SPVCOS 14.08 
ZS.MC.BLK.0424SPVCOS 14.08 
ZS.MC.GRY.0424SPVCOS 14.08 
ZS.MC.BLK.0525SPVCOS3 14.08 
ZS.MC.GRY.0525SPVCOS2 14.08 
ZS.MC.BGE.0424SPVCOS 14.08

來源

2014-04-08 22:46:19 inspectorG4dget

ValueError：需要多個值才能解包 – weemo

@weemo：向我顯示輸入。我懷疑文件末尾是空行。如果是這樣，編輯應該幫助 – inspectorG4dget

無論如何，我可以發佈整個文本文件？是4000線。不幸的是，不嚴格遵循格式 – weemo

Python的這些標準

回答

相關問題