2017-08-08 40 views
-2

如何在python中執行此操作?從基於另一個文件的文件中刪除短語(Python)

badphrases.txt包含

Go away 
Don't do that 
Stop it 

allphrases.txt包含

I don't know why you do that. Go away. 
I was wondering what you were doing. 
You seem nice 

我想allphrases.txt是乾淨的badphrases.txt的線條。

這是在bash

cat badfiles.txt | while read b 
do 
cat allphrases.txt | grep -v "$b" > tmp 
cat tmp > allphrases.txt 
done 

哦瑣碎的,你以爲我沒有看或審判。我搜索了一小時。

這裏是我的代碼:

# Files 
ttv = "/tmp/tv.dat" 
tmp = "/tmp/tempfile" 
bad = "/tmp/badshows" 

Badfiles文件已經存在
...代碼就在這裏創建TTV

# Function grep_v 
def grep_v(f,str): 
    file = open(f, "r") 
    for line in file: 
      if line in str: 
       return True 
    return False 

t = open(tmp, 'w') 
tfile = open(ttv, "r") 
for line in tfile: 
    if not grep_v(bad,line): 
      t.write(line) 
tfile.close 
t.close 
os.rename(tmp, ttv) 

回答

0

首款谷歌如何閱讀Python中的文件:

你可能會得到像這樣的東西:How do I read a file line-by-line into a list?

使用此同時讀取列表中的

with open('badphrases.txt') as f: 
    content = f.readlines() 
badphrases = [x.strip() for x in content] 

with open('allphrases.txt') as f: 
    content = f.readlines() 
allphrases = [x.strip() for x in content] 

現在你有兩個列表中的內容的文件。

遍歷allphrases並檢查是否存在badphrases中的短語。

在這一點上,你可能會考慮谷歌:

  • 如何遍歷一個列表蟒蛇
  • 如何檢查字符串出現在另一個字符串蟒蛇

從這些地方採取的代碼並建立了這樣一個蠻力算法:

for line in allphrases: 
    flag = True 
    for badphrase in badphrases: 
     if badphrase in line: 
      flag = False 
      break 
    if flag: 
     print(line) 

如果你可以unde rstand此代碼,然後你會發現你需要更換打印輸出到文件:

  • 現在谷歌如何打印到文件的Python。

然後想想如何改進算法。祝一切順利。

UPDATE:

@COLDSPEED建議你可以簡單的谷歌 - 如何在python替換文件中的行:

你可能會得到這樣的事情:Search and replace a line in a file in Python

也可以工作。

+0

還不如請求用戶google「如何用python替換文件中的行」。 –

+0

可能有100種方法可以做到這一點。顯然他/他正在嘗試學習python。所以給出一些基本的提示。 –

+0

有時事情在另一種語言中更容易。無論python中的解決方案是什麼,它肯定比它需要的複雜得多。我不明白爲什麼python很受歡迎。 –

0

解決方案不是太糟糕。

#!/usr/bin/env python3 
# -*- coding: utf-8 -*- 

import feedparser, os, re 

# Files 
h = os.environ['HOME'] 
ttv = h + "/WEB/Shows/tv.dat" 
old = h + "/WEB/Shows/old.dat" 
moo = h + "/WEB/Shows/moo.dat" 
tmp = h + "/WEB/Shows/tempfile" 
bad = h + "/WEB/Shows/badshows" 

# Function not_present 
def not_present(f,str): 
    file = open(f, "r") 
    for line in file: 
      if str in line: 
       return False 
    return True 

# Sources (shortened) 
sources = ['http://predb.me/?cats=tv&rss=1'] 

# Grab all the feeds and put them into ttv and old 
k = open(old, 'a') 
f = open(ttv, 'a') 
for h in sources: 
    d = feedparser.parse(h) 
    for post in d.entries: 
      if not_present(old,post.link): 
       f.write(post.title + "|" + post.link + "\n") 
       k.write(post.title + "|" + post.link + "\n") 
f.close 
k.close 

# Remove shows without [Ss][0-9] and put them in moo 
m = open(moo, 'a') 
t = open(tmp, 'w') 
file = open(ttv, "r") 
for line in file: 
    if re.search(r's[0-9]', line, re.I) is None: 
      m.write(line) 
#   print("moo", line) 
    else: 
      t.write(line) 
#   print("tmp", line) 
t.close 
m.close 
os.rename(tmp, ttv) 

# Remove badshows 
t = open(tmp, 'w') 
with open(bad) as f: 
    content = f.readlines() 
bap = [x.strip() for x in content] 

with open(ttv) as f: 
    content = f.readlines() 
all = [x.strip() for x in content] 

for line in all: 
    flag = True 
    for b in bap: 
     if b in line: 
      flag = False 
      break 
    if flag: 
     t.write(line + "\n") 
t.close 
os.rename(tmp, ttv) 
+0

請參閱python畢竟不是那麼糟糕。你也可以改進'#函數not_present'。目前它每次都在讀取文件。閱讀文件一次並存儲在列表中。當該方法被稱爲從該列表中檢查。 –

相關問題