熊貓和Python圖像到numpy陣列

-1

我目前在教自己的熊貓和python機器學習。到目前爲止，我對文本數據的處理還不錯，但是處理圖像數據的時候，對python和pandas的瞭解有限，這讓我感到沮喪。熊貓和Python圖像到numpy陣列

我已經將.csv文件讀入熊貓數據框，其中一列包含圖像的URL。所以這是當我從數據框中獲取信息時顯示的內容。

dataframe = pandas.read_csv("./sample.csv") 
dataframe.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 5000 entries, 0 to 4999

Data columns (total of 5 columns):

name 5000 non-null object

...

image 5000 non-null object

圖像列包含圖像的網址。問題是，我不知道如何從中導入圖像數據並將其保存爲numpy數組進行處理。

任何幫助表示讚賞。提前致謝！

來源

2017-09-15 Ishiro Kusabi

你可以張貼CSV – johnashu

歡迎的一個片段到SO。不幸的是，這不是一個代碼寫作服務。如果你沒有機會，請閱讀[問]和[mcve]。通過一點研究和學習Python文檔，你應該找到一些工具來幫助你從網絡中抓取一個帶有URL的圖像。如果您想出解決方案並卡住，請回來問問。 – wwii

您使用的是哪個版本的Python？您是否將DataFrame用於其他目的，還是僅僅是解析csv文件的中間步驟？ – wwii

如果你想從網絡上下載的圖像，然後，例如，從您的數據框中旋轉您的圖片，並保存結果您可以使用以下代碼：

import pandas as pd 
import matplotlib.pylab as plt 
import numpy as np 
from PIL import Image 
import urllib2 as urllib 
import io 

df = pd.DataFrame({ 
"name": ["Butterfly", "Birds"], 
"image": ["https://upload.wikimedia.org/wikipedia/commons/0/0c/Two-tailed_pasha_%28Charaxes_jasius_jasius%29_Greece.jpg", 
           'https://upload.wikimedia.org/wikipedia/commons/c/c5/Bat_cave_in_El_Maviri_Sinaloa_-_Mexico.jpg']}) 

def rotate_image(image, theta): 
    """ 
    3D rotation matrix around the X-axis by angle theta 
    """ 
    rotation_matrix = np.c_[ 
     [1,0,0], 
     [0,np.cos(theta),-np.sin(theta)], 
     [0,np.sin(theta),np.cos(theta)] 
    ] 
    return np.einsum("ijk,lk->ijl", image, rotation_matrix) 

for i, imageUrl in enumerate(df.image): 
    print imageUrl 
    fd = urllib.urlopen(imageUrl) 
    image_file = io.BytesIO(fd.read()) 
    im = Image.open(image_file) 
    im_rotated = rotate_image(im, np.pi) 
    fig = plt.figure() 
    plt.imshow(im_rotated) 
    plt.axis('off') 
    fig.savefig(df.name.ix[i] + ".jpg")

如果我nstead你要顯示的圖片，你可以這樣做：

plt.show()

得到的圖片是birds和butterfly可以在這裏看到，以及：

來源

2017-09-15 16:16:08

謝謝Cedirc！我用了一種不同的方法，但是這個方法也運行得很好，似乎比我的方法要乾淨得多！祝你有美好的一天。再次感謝。 –

由於我們不知道您的csv文件，您必須根據您的情況調整您的pd.read_csv()。

這裏我使用requests下載一些圖像內存。

然後在scipy的幫助下解碼（如果沒有的話，你也可以使用Pillow）。

解碼圖像然後是原始numpy數組，並由matplotlib顯示。

請記住，我們在這裏沒有使用臨時文件，並且所有內容都保存在內存中。另請閱讀this（由jfs回答）。

對人缺少一些必需的lib，一個應該能夠做同樣的（需要改變課程代碼）：

請求可以urllib（標準庫）來代替
- 我不是顯示的代碼，但this SO-question should be a good start
- another relevant SO-question談論內存中處理與urllib的
大熊貓可以通過csv（標準庫）
SciPy的可以通過Pillow更換（儘管內部存儲可能不同然後）來代替
matplotlib只是爲了演示的目的（不知道枕允許顯示圖像; 編輯：現在看來，這can）

~~我只是選擇一些隨機圖片來自 some german newspage。~~

編輯：來自維基百科的免費圖片現在使用！

代碼：

import requests     # downloading images 
import pandas as pd    # csv-/data-input 
from scipy.misc import imread # image-decoding -> numpy-array 
import matplotlib.pyplot as plt # only for demo/plotting 

# Fake data -> pandas DataFrame 
urls_df = pd.DataFrame({'urls': ['https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Rescue_exercise_RCA_2012.jpg/500px-Rescue_exercise_RCA_2012.jpg', 
           'https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Clinotarsus_curtipes-Aralam-2016-10-29-001.jpg/300px-Clinotarsus_curtipes-Aralam-2016-10-29-001.jpg', 
           'https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/US_Capitol_east_side.JPG/300px-US_Capitol_east_side.JPG']}) 

# Download & Decode 
imgs = [] 
for i in urls_df.urls:    # iterate over column/pandas Series 
    r = requests.get(i, stream=True) # See link for stream=True! 
    r.raw.decode_content = True  # Content-Encoding 
    imgs.append(imread(r.raw))  # Decoding to numpy-array 

# imgs: list of numpy arrays with varying shapes of form (x, y, 3) 
#  as we got 3-color channels 
# Beware!: downloading png's might result in a shape of (x, y, 4) 
#  as some alpha-channel might be available 
# For more options: https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.imread.html 

# Plot 
f, arr = plt.subplots(len(imgs)) 
for i in range(len(imgs)): 
    arr[i].imshow(imgs[i]) 
plt.show()

輸出：

來源

2017-09-15 15:16:25 sascha

謝謝sascha！解碼真的是我需要幫助的部分。我很抱歉信息不足。我想我現在正處於一個我不知道我不知道什麼的階段，所以我的問題最終變得模糊。再次感謝你的幫助！ –

熊貓和Python圖像到numpy陣列

回答

相關問題