Python正則表達式：遍歷目錄中每個文件的第一行

我想遍歷.txt文件，並從該文件的第一行使用日期（例如1993年4月1日）。Python正則表達式：遍歷目錄中每個文件的第一行

此代碼的工作，但整個文件，只是第一線匹配不（注：下面的代碼林顯示出比剛日期匹配環以上）：

以下腳本被更新，並且它的工作原理：

articles = glob.glob("*.txt") 
y = 1 

for f in articles: 
    with open(f, "r") as content: 
     wordcount = "x" 
     lines = content.readlines() 
     for line in lines : 
      if line[0:7] == "LENGTH:": 
       lineclean = re.sub('[#%&\<>*?:/{}[email protected]+|=]', '', line) 
       wordcount = lineclean[7:13] 
       if wordcount[5] == "w": 
        wordcount = wordcount[0:4] 
       elif wordcount[4] == "w": 
        wordcount = wordcount[0:3] 
       elif wordcount[3] == "w": 
        wordcount = wordcount[0:2] 
       elif wordcount[2] == "w": 
        wordcount = wordcount[0:1] 
    with open(f, "r") as content: 
     first_line = next(content) 
     try: 
      import re 
      match = re.search('(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}', first_line).group() 
     except: 
      pass   
     from dateutil import parser 
     parsed_pubdate = parser.parse(match).strftime('%Y-%m-%d')     
    try: 
     if wordcount != "x": 
      move(f, "{parsed_pubdate}_{wordcount}_{source}.txt".format(**locals())) 
     else: 
      pass 
    except OSError: 
     pass 
    y += 1 
    content.close()

爲了只在文件的第一行匹配日期，我想補充^\s和flags=re.MULTILINE，所以我得到：

match = re.search('^\s(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)? 
|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)? 
|Dec(ember)?)\s+\d{1,2},\s+\d{4}', line, flags=re.MULTILINE).group()

但是，現在程序只使用一個日期（文件夾中最後一個文件的日期）並將其用於每個文件（因此每個文件獲取的日期相同，而日期在原始.txt文件中有所不同）。

我明確了這個循環的一部分，但我的問題只適用於正則表達式日期匹配循環。在此先感謝您的幫助！

來源

2017-10-11 Rens

既然你刪除了「：」，你應該開始你的wordcount 6，不是嗎？如果你只想檢查第一行，爲什麼不firs_line = content.readlines（）[0]？ –

@AlfredoMiranda：我會提前'first_line = next（content）'，以避免讀取所有行，以後丟棄除第一個以外的所有行... –

我試過'first_line = content.readlines（）[0]'但它給出了與正則表達式中的'第一行描述符'完全相同的問題。也就是說，它只使用一個.txt文件中的日期並將其應用於每個文件。 Re：字數統計。這在當前腳本中工作正常。 – Rens

articles = glob.glob("*.txt") 
y = 1 

for f in articles: 
    with open(f, "r") as content: 
     wordcount = "x" 
     lines = content.readlines() 
     for line in lines : 
      if line[0:7] == "LENGTH:": 
       lineclean = re.sub('[#%&\<>*?:/{}[email protected]+|=]', '', line) 
       wordcount = lineclean[7:13] 
       if wordcount[5] == "w": 
        wordcount = wordcount[0:4] 
       elif wordcount[4] == "w": 
        wordcount = wordcount[0:3] 
       elif wordcount[3] == "w": 
        wordcount = wordcount[0:2] 
       elif wordcount[2] == "w": 
        wordcount = wordcount[0:1] 
    with open(f, "r") as content: 
     first_line = next(content) 
     try: 
      import re 
      match = re.search('(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}', first_line).group() 
     except: 
      pass   
     from dateutil import parser 
     parsed_pubdate = parser.parse(match).strftime('%Y-%m-%d')     
    try: 
     if wordcount != "x": 
      move(f, "{parsed_pubdate}_{wordcount}_{source}.txt".format(**locals())) 
     else: 
      pass 
    except OSError: 
     pass 
    y += 1 
    content.close()

來源

2017-10-11 18:33:37 Rens

Python正則表達式：遍歷目錄中每個文件的第一行

回答

相關問題