2013-03-23 54 views
0

我正在設置一個腳本來根據文件中包含的文本合併PDF。我在這裏的問題是「小提琴I」也包含在「小提琴II」中,並且「中音薩克斯管I」也包含在「中音薩克斯管II」中。我該如何設置,以便tempList只包含來自「Violin I」的條目並排除「Violin II」,反之亦然?分離其他地方包含的字符串

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin II.pdf", ] 
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Baritone Saxophone"] 


# create arrays for each instrument that can be used for merging/organization 
def organizer(): 
    for fileName in pdfList: 
     for instrument in instruments: 
      tempList = [] 
      if instrument in fileName: 
       tempList.append(fileName) 
     print tempList 


print pdfList 
organizer() 
+0

PDF是否總是像這樣命名? IE瀏覽器。 '號碼+儀表+ .pdf'。或者我們是否應該假定PDF可以有任何包含該工具的名稱? – woemler 2013-03-23 16:22:39

+0

是的,PDFs將始終採用格式「(初始數字)+(一些文本)+(儀器)+ .pdf – jumbopap 2013-03-23 16:23:37

回答

1

嘗試使這一變化:

... 
if instrument+'.pdf' in fileName: 
... 

這會涵蓋所有情況?以避免包括子

+0

簡單而有效,謝謝。 – jumbopap 2013-03-23 17:57:40

3

一種方法是使用正則表達式,如:

import re 

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin \ 
II.pdf", ] 
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "\ 
Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Barit\ 
one Saxophone"] 

# create arrays for each instrument that can be used for merging/organization 
def organizer(): 
    for fileName in pdfList: 
     tempList = [] 
     for instrument in instruments: 
      if re.search(r'\b{}\b'.format(instrument), fileName): 
       tempList.append(fileName) 
     print tempList 

print pdfList 
organizer() 

這種包裝了\b搜索詞,使其只在開頭和結尾都以字邊界匹配。此外,也許很明顯但值得指出的是,這也會使你的樂器名稱成爲正則表達式的一部分,所以請注意,如果你使用任何也是正則表達式元字符的字符,它們將被相互插入(現在你不是)。更普遍的方案將需要一些代碼來查找和正確地逃避這些角色。

相關問題