2017-08-07 26 views
-1

我有使用Python DOCX圖書館麻煩,我已經刮從網站圖片,我想將它們添加到DOCX,但我不能添加的圖片就能夠直接DOCX,我不斷收到一個錯誤:如何從URL添加圖片到docx python?

File "C:\Python27\lib\site-packages\docx\image\image.py", line 46, in from_file with open(path, 'rb') as f: IOError: [Errno 22] invalid mode ('rb') or filename: ' http://upsats.com/Content/Product/img/Product/Thumb/PCB2x8-.jpg '

這是我的代碼:

import urllib 
import requests 
from bs4 import BeautifulSoup 
from docx import Document 
from docx.shared import Inches 
import os 


    document = Document() 

    document.add_heading("Megatronics Items Full Search", 0) 


    FullPage = ['New-Arrivals-2017-6', 'Big-Sales-click-here', 'Arduino-Development-boards', 
       'Robotics-and-Copters', 'Breakout-Boards', 'RC-Wireless-communication', 'GSM,-GPS,-RFID,-Wifi', 
       'Advance-Development-boards-and-starter-Kits', 'Sensors-and-IMU', 'Solenoid-valves,-Relays,--Switches', 
       'Motors,-drivers,-wheels', 'Microcontrollers-and-Educational-items', 'Arduino-Shields', 
       'Connectivity-Interfaces', 'Power-supplies,-Batteries-and-Chargers', 'Programmers-and-debuggers', 
       'LCD,-LED,-Cameras', 'Discrete-components-IC', 'Science-Education-and-DIY', 'Consumer-Electronics-and-tools', 
       'Mechanical-parts', '3D-Printing-and-CNC-machines', 'ATS', 'UPS', 'Internal-Battries-UPS', 
       'External-Battries-UPS'] 

    urlp1 = "http://www.arduinopak.com/Prd.aspx?Cat_Name=" 
    URL = urlp1 + FullPage[0] 

    for n in FullPage: 
     URL = urlp1 + n 
     page = urllib.urlopen(URL) 
     bsObj = BeautifulSoup(page, "lxml") 
     panel = bsObj.findAll("div", {"class": "panel"}) 

     for div in panel: 
      titleList = div.find('div', attrs={'class': 'panel-heading'}) 
      imageList = div.find('div', attrs={'class': 'pro-image'}) 
      descList = div.find('div', attrs={'class': 'pro-desc'}) 

      r = requests.get("http://upsats.com/", stream=True) 
      data = r.text 

      for link in imageList.find_all('img'): 
       image = link.get("src") 
       image_name = os.path.split(image)[1] 
       r2 = requests.get(image) 
       with open(image_name, "wb") as f: 
        f.write(r2.content) 

       print(titleList.get_text(separator=u' ')) 
       print(imageList.get_text(separator=u'')) 
       print(descList.get_text(separator=u' ')) 
       document.add_heading("%s \n" % titleList.get_text(separator=u' ')) 
       document.add_picture(image, width=Inches(1.5)) 
       document.add_paragraph("%s \n" % descList.get_text(separator=u' ')) 

    document.save('megapy.docx') 

不是全部,而只是主要部分。現在,我在複製下載的圖片時遇到問題,我想將其複製到docx。我不知道如何添加圖片。我如何轉換它?我想我必須格式化,但我該怎麼做?

我所知道的就是問題之所在這段代碼中:

document.add_picture(image, width=Inches(1.0)) 

如何使這一形象從URL中的docx顯示?我錯過了什麼?

+0

的可能的複製[在一個特定位置上的文檔(.docx)與Python在添加圖像?](https://開頭stackoverflow.com/questions/32932230/add-an-image-in-a-specific-position-in-the-document-docx-with-python) – Veltro

+0

對不起,但這是定位。我想在docx文件中顯示圖像,我已經從這個網址下載了圖像:www.arduinopak.com/,但是我無法將圖像存入docx文件。 –

回答

2

更新

我做了一個測試用10張,我得到了一個DOCX。當加載很多我在一個地方有一個錯誤,我通過增加一個嘗試覆蓋,除了(見下文)。由此產生的megapy.docx得到了165 MB大,花了大約10分鐘創建。

with open(image_name, "wb") as f: 
    f.write(r2.content) 

要:

image = io.BytesIO(r2.content) 

並補充說:

try: 
    document.add_picture(image, width=Inches(1.5)) 
except: 
    pass 

enter image description here


使用IO庫來創建類文件ojects。

實施例,關於python2 & 3的工作原理:

import requests 
import io 
from docx import Document 
from docx.shared import Inches 

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Usain_Bolt_Rio_100m_final_2016k.jpg/200px-Usain_Bolt_Rio_100m_final_2016k.jpg' 
response = requests.get(url, stream=True) 
image = io.BytesIO(response.content) 

document = Document() 
document.add_picture(image, width=Inches(1.25)) 
document.save('demo.docx') 

enter image description here

+0

非常感謝。我仍然有一些問題。 document.add_picture(圖像,寬度=英寸(1.0)) '文件 「C:\ Python27 \ lib中\站點包\ DOCX \ document.py」,線79,在add_picture 返回run.add_picture( image_path_or_stream,width,height)' –

+0

@AbbasKhan我寫了一個小程序,讓我看看我是怎麼做的 –

+0

我很抱歉打擾你,朋友。我是新來的。試圖讓我的手準備好。 –