Python：嵌套循環，而不是創建多個輸入和輸出

我剛開始學習python和編程，所以這可能是一個非常天真的問題。但我會感謝任何幫助。Python：嵌套循環，而不是創建多個輸入和輸出

下面的代碼有效，但我被告知有這些多個輸入和輸出是壞的，我應該嵌套循環。但嘗試我可能每次試圖嵌套任何東西，它只是最終給我一個空的文件夾。

所以我的問題是如何嵌套所有這些？

感謝和對長文章感到抱歉。

#1) I call a perl script and execute it to get the input file. 
perl = "/usr/bin/perl" 
perl_script = "geoFF.pl"; 
params = " --mount-doom-hot" 
pl_script = subprocess.Popen([perl, perl_script, params], stdout=sys.stdout) 
pl_script.communicate() 

## 2) input the output from the perl script but only the wanted data. 
# The input is a BIG file and I just want some specific lines from it. 
infile1 = "inputperl.txt" 
outfile1 = "c1.txt" 

f1 = open(infile1,'rU') 
o1 = open(outfile1,'w+') 

words = ['Acc','title','orgn','date','GP'] #for lines in file f1 get lines with the words 

for line in f1: 
    if any(words in line for words in words): 
     o1.write(line) 

# From the specific lines delete some symbols/charactewords I don't want. 

input1 =open("c1.txt",'rU') 
output1 = open("c2.txt",'w') 
del_list = ['>','title', 'orgn','date','<','GP','/Item','"','</Item>','<DS>','Name=','DocS','Acc'] # I want to keep the rest of the line but not these words. 

for line in input1: 
    for word in del_list: 
     line = line.replace(word, "") 
    output1.write(line) 

# For one specific word in the lines AB. The file has lines with AB129, AB8877, AB0997 and AB(etc). Here I want to attach and url so it will be an hyperlink.Attached url to GSE to get hyperlink 
inp = open("c2.txt",'rU') 
out= open("c3.txt",'w') 
filedata2 = inp.read() 
newdata2 = filedata2.replace('AB', "\n"'http://www.whatever.com/g/qu/acc.cgi?acc=AB') 
out.write(newdata2) 
# this output the line as http://www.whatever.com/g/qu/acc.cgi?acc=AB(somenumber) 
#for example http://www.whatever.com/g/qu/acc.cgi?acc=AB129 
#and http://www.whatever.com/g/qu/acc.cgi?acc=AB8877 etc. 

### then I want to take this files with the changes and send it by email 
from email.MIMEMultipart import MIMEMultipart 
from email.MIMEText import MIMEText 

fromaddr = "[email protected]" 
toaddr = "[email protected]" 
msg = MIMEMultipart() 
msg['From'] = fromaddr 
msg['To'] = toaddr 
msg['Subject'] = "RESULT" 

# send txt file in email body 
f6 = (open("c3.txt",'rU')) 
results = MIMEText(f6.read(),'plain') 
f6.close() 
msg.attach(results) 

#convert to string 
import smtplib 
server = smtplib.SMTP('smtp.gmail.com', 587) 
server.ehlo() 
server.starttls() 
server.ehlo() 
server.login("sender email", "password") 
text = msg.as_string() 
server.sendmail(fromaddr, toaddr, text)

輸入文件看起來像

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE> 
<eSummaryResult> 
<DS> 
    <Id>20006767</Id> 
    <Item Name="Acc" Type="String">AB64767</Item> 
    <Item Name="GDS" Type="String"></Item> 
    <Item Name="title" Type="String">word word title of this word...</Item> 
    <Item Name="summary" Type="String">word word word..word word word..</Item> 
    <Item Name="GP" Type="String">11002;13112</Item> 
    <Item Name="AB" Type="String">64767</Item> 
    <Item Name="orgn" Type="String">Mus musculus</Item> 
    <Item Name="entryType" Type="String">AB</Item> 
    <Item Name="gdsType" Type="String">word word word..word word word..word word word..</Item> 
    <Item Name="ptechType" Type="String"></Item> 
    <Item Name="valType" Type="String"></Item> 
    <Item Name="SSInfo" Type="String"></Item> 
    <Item Name="subsetInfo" Type="String"></Item> 
    <Item Name="date" Type="String">2015/12/09</Item> 
    <Item Name="suppFile" Type="String">WIG</Item> 
    <Item Name="Samples" Type="List"> 
    </Item> 
    <Item Name="n_samples" Type="Integer">12</Item> 
    <Item Name="SeriesTitle" Type="String"></Item> 
    <Item Name="PlatformTitle" Type="String"></Item> 
    <Item Name="PlatformTaxa" Type="String"></Item> 
    <Item Name="SamplesTaxa" Type="String"></Item> 
    <Item Name="Ids" Type="List"> 
</Item> 
    <Id>200098567</Id> 
    <Item Name="Acc" Type="String">AB64789</Item> 
    <Item Name="GDS" Type="String"></Item> 
    <Item Name="title" Type="String">word word word...</Item> 
    <Item Name="summary" Type="String">word word word..word word word..</Item> 
    <Item Name="GP" Type="String">11002;13112</Item> 
    <Item Name="AB" Type="String">AB64789</Item> 
    <Item Name="orgn" Type="String">Mus musculus</Item> 
    <Item Name="entryType" Type="String">AB</Item> 
    <Item Name="gdsType" Type="String">word word word..word word word..word word word..</Item> 
    <Item Name="ptechType" Type="String"></Item> 
    <Item Name="valType" Type="String"></Item> 
    <Item Name="SSInfo" Type="String"></Item> 
    <Item Name="subsetInfo" Type="String"></Item> 
    <Item Name="date" Type="String">2015/12/09</Item> 
    <Item Name="suppFile" Type="String">WIG</Item> 
    <Item Name="Samples" Type="List"> 
</Item> 
    </Item>  
    <Id>200064997</Id> 
    <Item Name="Acc" Type="String">AB69957</Item> 
    <Item Name="GDS" Type="String"></Item> 
    <Item Name="title" Type="String">word word word...</Item> 
    <Item Name="summary" Type="String">word word word..word word word..</Item> 
    <Item Name="GP" Type="String">1100</Item> 
    <Item Name="AB" Type="String">69957</Item> 
    <Item Name="orgn" Type="String">Mus musculus</Item> 
    <Item Name="entryType" Type="String">AB</Item> 
    <Item Name="gdsType" Type="String">word word word..word word word..word word word..</Item> 
    <Item Name="ptechType" Type="String"></Item> 
    <Item Name="valType" Type="String"></Item> 
    <Item Name="SSInfo" Type="String"></Item> 
    <Item Name="subsetInfo" Type="String"></Item> 
    <Item Name="date" Type="String">2015/12/09</Item> 
    <Item Name="suppFile" Type="String">WIG</Item> 
    <Item Name="Samples" Type="List"> 
    </Item> 
    <Item Name="n_samples" Type="Integer">12</Item> 
    <Item Name="SeriesTitle" Type="String"></Item> 
    <Item Name="PlatformTitle" Type="String"></Item> 
    <Item Name="PlatformTaxa" Type="String"></Item> 
    <Item Name="SamplesTaxa" Type="String"></Item> 
    <Item Name="Ids" Type="List"> 
    <Item Name="int" Type="Integer">26476451</Item> 
    </Item> 
    <Item Name="Projects" Type="List"></Item> 
    <Item Name="G2R" Type="String">no</Item>

我只想以下數據：

<Item Name="Acc" Type="String">AB64767</Item> 
<Item Name="title" Type="String">word word title of this word...</Item> 
<Item Name="AB" Type="String">64767</Item> 
<Item Name="orgn" Type="String">Mus musculus</Item> 
<Item Name="date" Type="String">2015/12/09</Item>

但作爲顯示：

http://www.whatever.com/g/qu/acc.cgi?acc=AB64767 
word word title of this word... 
Mus musculus 
2015/12/09 

http://www.whatever.com/g/qu/acc.cgi?acc=AB64789 
word word title of this word... 
Mus musculus 
2015/12/09 

http://www.whatever.com/g/qu/acc.cgi?acc=AB69957 
word word title of this word... 
Mus musculus 
2015/12/09

來源

2015-12-10 Carol M

不確定它是否相關，但你可能需要從'--mount-doom-hot'中刪除前導空格; Perl腳本獲取的參數以空格開頭，而不是「 - 」，因此可能無法將其識別爲選項。 – chepner

你完全正確。謝謝 –

讀取文件一次，並使用正則表達式將是一個更好的辦法：

import re 
del_list = ['>', 'title', 'orgn', 'date', '<', 'GP', '/Item', '"', '</Item>', '<DS>', 'Name=', 'DocS', 
      'Acc'] # I want to keep the rest of the line but not these words. 
words = ['Acc', 'title', 'orgn', 'date', 'GP'] 


rep = re.compile(r'|'.join(del_list)) 
keep = re.compile(r"|".join(words)) 
r3 = re.compile("AB(?=\d)") 

with open("test.txt") as f, open("out.txt","w") as out: 
    for line in f: 
     # if line contains match from words 
     if keep.search(line): 
      # replace all unwanted substrings 
      line = rep.sub("", line.lstrip()) 
      line = r3.sub("\n"'http://www.whatever.com/g/qu/acc.cgi?acc=AB', line) 
      out.write(line)

out.txt：

Item Type=String 
http://www.whatever.com/g/qu/acc.cgi?acc=AB64767 
Item Type=Stringword word of this word... 
Item Type=String11002;13112 
Item Type=StringMus musculus 
Item Type=String2015/12/09 
Item Type=String 
http://www.whatever.com/g/qu/acc.cgi?acc=AB64789 
Item Type=Stringword word word... 
Item Type=String11002;13112 
Item Type=StringMus musculus 
Item Type=String2015/12/09 
Item Type=String 
http://www.whatever.com/g/qu/acc.cgi?acc=AB69957 
Item Type=Stringword word word... 
Item Type=String1100 
Item Type=StringMus musculus 
Item Type=String2015/12/09

如果你正在尋找匹配一些單詞正好，那麼你將需要在正則表達式中使用單詞邊界，否則最終匹配"foo" in "foobar"，如果您只想發送文件，則不必將其寫入磁盤。

來源

2015-12-10 20:09:54

哇，這是驚人的各種各樣。非常感謝。 –

沒有問題，不客氣。 –

雖然這是接近完成這裏有一些提示：

磁盤IO速度很慢，所以如果您只讀一次，請執行所有處理，然後生成輸出，而不是通過每個篩選步驟的文件來獲得更好的性能。

例如讓examen這樣的：

for line in f1: 
    if any(words in line for words in words): 
     o1.write(line) 

# From the specific lines delete some symbols/charactewords I don't want. 

input1 =open("c1.txt",'rU') 
output1 = open("c2.txt",'w') 
del_list = ['>','title', 'orgn','date','<','GP','/Item','"','</Item>','<DS>','Name=','DocS','Acc'] # I want to keep the rest of the line but not these words. 

for line in input1: 
    for word in del_list: 
     line = line.replace(word, "") 
    output1.write(line)

在您選擇您輸入文件只有幾行第一個循環。在第二個循環中，您從選定的行中刪除一些單詞。在你之間你寫你的整個數據到磁盤。

一個相當簡單的優化是寫回磁盤之前做的話，直接替換，即：

del_list = ['>','title', 'orgn','date','<','GP','/Item','"','</Item>','<DS>','Name=','DocS','Acc'] 

for line in f1: 
    if any(words in line for words in words): 
     for word in del_list: 
      line = line.replace(word, "") 
     o1.write(line)

你能看到這是如何節省往返到磁盤？替代技術是通過將文件讀入list然後在該列表上操作而不是每次都來回地將數據保存在存儲器中。

我希望這能指出你正確的方式，你現在可以弄清楚如何擺脫第三組文件，以便最終只有一個輸入文件和一個輸出文件。

來源

2015-12-10 19:58:02 ted

這是我嘗試過的一件事情，但它用「」代替了整個數據，最後我得到了一個空文件。我會試圖找出我在這裏做錯了什麼，然後感謝 –

Python：嵌套循環，而不是創建多個輸入和輸出

回答

相關問題