2017-03-18 19 views
0

我想要一個包含10個句子(全部單詞)的txt文件,並將其作爲命令行參數傳遞給python腳本。我想打印包含dic中列出的單詞的句子。下面的腳本找到匹配的句子,但是它找到匹配的單詞時會多次打印這些句子。Python:我需要打印返回唯一的句子

有沒有其他的方法可以用來做到這一點?另外,我不想輸出到由線分離(\ n)的

import sys 

dic=["april","aprils","ask","aug","augee","august","bid","bonds","brent","buy","call","callroll","calls","chance","checking","close","collar","condor","cover"] 

f=open(sys.argv[1]) 

for i in range(0,10): 
line=f.readline()  
words=line.split() 
if len(words) > 3: 
    for j in words: 
     if j in dic: 
      print(line) 

輸出:

eighty two is what i am bidding on the brent 

eighty two is what i am bidding on the brent 

eighty two is what i am bidding on the brent 

call on sixty five to sixty seventy 

call on sixty five to sixty seventy 

call on sixty five to sixty seventy 

call on sixty five to sixty seventy 

call on sixty five to sixty seventy 

no nothing is going on double 

i am bidding on the option for eighty five 

i am bidding on the option for eighty five 

recross sell seller selling sept 

recross sell seller selling sept 

recross sell seller selling sept 

recross sell seller selling sept 

recross sell seller selling sept 

blah blah blah blah close 

所需的輸出:

eighty two is what i am bidding on the brent 
call on sixty five to sixty seventy 
no nothing is going on double 
i am bidding on the option for eighty five 
recross sell seller selling sept 
blah blah blah blah close 
+1

在'print(line)'後面放一個'break',所以它不檢查其他單詞。 – trincot

+0

@trincot,謝謝你解決了80%的問題。使用休息完全滑脫了我的想法。 –

+0

另外20%是由'line'字符串中的換行符引起的。看到我的答案。 – trincot

回答

1
  1. 抑制重複線在輸出

    添加一個breakprint(line)後聲明,所以for環路上的字典單詞中斷

  2. 抑制換行符

    額外的新行是由f.readline()造成的,因爲這將包括\n在返回的字符串的結尾。您可以使用line.strip()刪除它,但最好使用for line in f語法代替。

下面是代碼:

for line in f:  
    words=line.split() 
    if len(words) > 3: 
     for j in words: 
      if j in dic: 
       print(line) 
       break 
+0

爲了抑制我做了換行: '用於在F線:) 詞語= line.split( 如果len(字)> 3: 用於字Y: 如果j在DIC: 打印(線。條( 「\ n」)) break' –

1

我建議創建一個set爲你的詞彙字典,幷包含您的文件的每一行字的第二set。然後,您可以使用&來比較這些集合以得到它們的交集,或者將這兩個集合通用。這比循環查找類似單詞更有效。

import sys 

dic=set(["april","aprils","ask","aug","augee","august","bid","bonds","brent","buy","call","callroll","calls","chance","checking","close","collar","condor","cover"]) 

filename = sys.argv[1] 

with open(filename) as f: 
    for line in f: 
     s = set(line.split()) 
     if s & dic: 
      print(line.strip()) 
+0

使用這種方法的解決方案是: '進口SYS DIC =集([ 「四月」, 「aprils」, 「詢問」, 「月」, 「augee」 ,「八月」,「投標」,「債券」,「布倫特」,「買」, 「呼叫」,「呼叫」,「通話」,「機會」,「檢查」,「關閉」 「condor」,「cover」]) filename = sys.argv [1] 具有開放(文件名)爲f: \t在F線:0​​\t \t如果len(line.split())> 3: \t \t \t S =設定(線。分裂()) \t \t \t如果S&DIC: \t \t \t \t打印(line.strip())' –

+0

感謝您對優化程序 –

+0

什麼是'len個> 3'檢查? – Crispin