Python的文件命名

我很新的蟒蛇，我已經過基於這此腳本：Python的文件命名

github上/ whiteShtef/LiteScraper

刮圖片和GIF

問題是，該腳本將圖像保存在單獨的文件夾中。

這是命名的文件夾代碼：

foldername=self.url[7:] 
    foldername=foldername.split("/")[0] 

    extension=time.strftime("iFunny")+"-"+time.strftime("%d-%m-%Y") + "-" + time.strftime("%Hh%Mm%Ss")

它會做FOLDERNAME iFunny與後一個時間戳。

我需要的是，它都可以下載保存到文件夾中的「圖像」

我試圖使它簡單地保存到文件夾中的「圖像」，但問題是，因爲它擦傷不同的頁面圖像獲取相同的名字，他們覆蓋彼此。

例如，如果它刮掉第1頁，它會從它下載圖像（Im相當肯定它的10張/每頁的GIF），它會爲它們命名1，2，3，4等...

然後刮2頁和名稱他們1，2，3，4等...，並從第1

這是一個完整的代碼覆蓋舊圖片：

import os 
import time 
from html.parser import HTMLParser 
import urllib.request 

#todo: char support for Windows 
#deal with triple backslash filter 
#recursive parser option 


class LiteScraper(HTMLParser): 
    def __init__(self): 
     HTMLParser.__init__(self) 
     self.lastStartTag="No-Tag" 
     self.lastAttributes=[] 
     self.lastImgUrl="" 
     self.Data=[] 
     self.acceptedTags=["div","p","h","h1","h2","h3","h4","h5","h6","ul","li","a","img"] 
     self.counter=0 
     self.url="" 


     self.SAVE_DIR="" 
     self.Headers=["User-Agent","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"] 

    def handle_starttag(self,tag,attrs): 
     #print("Encountered a START tag:",tag) 
     self.lastStartTag=tag 
     self.lastAttributes=attrs #unnecesarry, might come in hany 

     if self.lastStartTag=="img": 
      attrs=self.lastAttributes 

      for attribute in attrs: 
       if attribute[0]=="src": 
        self.lastImgUrl=attribute[1] 
        print(attribute[1]) 

        #Allow GIF from iFunny to download 
        for attribute in attrs: 
         if attribute[0]=="data-gif": 
          self.lastImgUrl=attribute[1] 
          print(attribute[1]) 
          #End Gif Code 

      self.handle_picture(self.lastImgUrl) 

    def handle_endtag(self,tag): 
     #print("Encountered a END tag:",tag) 
     pass 

    def handle_data(self,data): 
     data=data.replace("\n"," ") 
     data=data.replace("\t"," ") 
     data=data.replace("\r"," ") 
     if self.lastStartTag in self.acceptedTags: 
      if not data.isspace(): 
       print("Encountered some data:",data) 
       self.Data.append(data) 

     else: 
      print("Encountered filtered data.") #Debug 

    def handle_picture(self,url): 
     print("Bumped into a picture. Downloading it now.") 
     self.counter+=1 
     if url[:2]=="//": 
      url="http:"+url 

     extension=url.split(".") 
     extension="."+extension[-1] 

     try: 
      req=urllib.request.Request(url) 
      req.add_header(self.Headers[0],self.Headers[1]) 
      response=urllib.request.urlopen(req,timeout=10) 
      picdata=response.read() 
      file=open(self.SAVE_DIR+"/pics/"+str(self.counter)+extension,"wb") 
      file.write(picdata) 
      file.close() 
     except Exception as e: 
      print("Something went wrong, sorry.") 


    def start(self,url): 
     self.url=url 
     self.checkSaveDir() 

     try: #wrapped in exception - if there is a problem with url/server 
      req=urllib.request.Request(url) 
      req.add_header(self.Headers[0],self.Headers[1]) 
      response=urllib.request.urlopen(req,timeout=10) 
      siteData=response.read().decode("utf-8") 
      self.feed(siteData) 
     except Exception as e: 
      print(e) 

     self.__init__() #resets the parser/scraper for serial parsing/scraping 
     print("Done!") 

    def checkSaveDir(self): 
     #----windows support 
     if os.name=="nt": 
      container="\ " 
      path=os.path.normpath(__file__) 
      path=path.split(container[0]) 
      path=container[0].join(path[:len(path)-1]) 
      path=path.split(container[0]) 
      path="/".join(path) 
     #no more windows support! :P 
     #for some reason, os.normpath returns path with backslashes 
     #on windows, so they had to be supstituted with fowardslashes. 

     else: 
      path=os.path.normpath(__file__) 
      path=path.split("/") 
      path="/".join(path[:len(path)-1]) 

     foldername=self.url[7:] 
     foldername=foldername.split("/")[0] 

     extension=time.strftime("iFunny")+"-"+time.strftime("%d-%m-%Y") + "-" + time.strftime("%Hh%Mm%Ss") 

     self.SAVE_DIR=path+"/"+foldername+"-"+extension 


     if not os.path.exists(self.SAVE_DIR): 
      os.makedirs(self.SAVE_DIR) 

     if not os.path.exists(self.SAVE_DIR+"/pics"): 
      os.makedirs(self.SAVE_DIR+"/pics") 

     print(self.SAVE_DIR)

我不是很確定在這裏做什麼，任何幫助將是偉大的！

來源

2016-11-04 Anon

你可以在這裏複製命名圖像的代碼嗎？這聽起來像你只是想順序命名它們。 –

嗨，匿名，歡迎來到Stack Overflow。一些事情可以幫助您更有效地使用網站。首先，請確保將代碼直接包含在您的問題中，而不是將其發佈到外部服務（如Pastebin）上。其次，請確保你的確切問題是明確的。我不完全確定確切的問題是什麼，但我猜你不希望它覆蓋以前下載的圖像，這是它現在正在做什麼？請務必通過[幫助中心]（http://stackoverflow.com/help）停止獲取關於如何使用本網站的更多信息！開心問！ –

目前，這確實不是理想的問題。一些寫作[最小，完整，可驗證示例]（http://stackoverflow.com/help/mcve）的東西將重點放在一個非常具體的問題上（不需要代碼來重現這個問題）會變得更好響應。對於專注於文件名生成的問題，應該不需要HTML解析器的複雜性。另見http://sscce.org/ –

（刪除時間戳部分後）看起來像self.counter是決定文件名的值。在創建LiteScraper對象時將其設置爲零。如果在移動到下一頁時重用LiteScraper對象，它應該繼續計數而不是從零開始。

改爲在同一個對象上再次調用start（）。像這樣：

scraper = LiteScraper() 
scraper.start(<url/page1>) #images 1, 2, 3, & 4 are created 
scraper.start(<url/page2>) #images 5, 6, 7, etc.

來源

2016-11-04 20:17:04

對不起，我應該更好地解釋。我刮的網站並不完全具有「頁面」，我正在像這樣刮http://pastebin.com/53h7GmQJ – Anon

刪除此部分：'+「 - 」+ time.strftime（「％d-％m-％ Y「）+」 - 「+ time.strftime（」％Hh％Mm％Ss「）'。如果對象被重用，self.count不會被重置，所以在這種情況下它應該繼續計數。 –

Python的文件命名

回答

相關問題