2017-04-26 51 views
0

我想基於已填充的artist_title列更新album_title列。Python的SQLite更新列

我可以做與環中reapeatdly最後ALBUM_TITLE整個ALBUM_TITLE列的更新: 的標籤在專輯:

for album in tag: 
    cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album,)) 

    for artist in artists: 
     artist = artist.string   
     cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist,))   
     cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist)) 

或者,我可以讓只有擁有正確ALBUM_TITLE最後一行更新。

for tag in albums: 

    for album in tag: 
     cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album,)) 

     for artist in artists: 
      artist = artist.string   
      cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist,)) 

     cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist)) 

我明白爲什麼這些問題正在發生,但我不能工作,如何實現我想要的東西 - 用正確的專輯名稱更新每一行。 album_title的名稱將始終與artist_name的順序相同。

我已經看到更新列在這裏被廣泛覆蓋,但是我不能解決這個問題,因爲我有自己的糾結的獨特的循環。 如果我的問題是因爲我的數據檢索結構糟糕,很高興聽到如何解決它。

整個代碼:

from urllib.request import Request, urlopen 
from urllib.parse import urlparse 
from urllib.parse import urljoin 
from bs4 import BeautifulSoup 

import urllib.error 
import sqlite3 
import json 
import time 
import ssl 


#connect/create database 
conn = sqlite3.connect('pitchscraper.sqlite') 
#create way to talk to database 
cur = conn.cursor() 

#create table 
cur.execute(''' 
    CREATE TABLE IF NOT EXISTS Master (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, album_title TEXT UNIQUE, artist_name TEXT UNIQUE)''') 

cur.execute(''' 
    CREATE TABLE IF NOT EXISTS Albums (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, album_title TEXT UNIQUE)''') 

cur.execute(''' 
    CREATE TABLE IF NOT EXISTS Artists (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, artist_name TEXT UNIQUE, album_title TEXT, FOREIGN KEY(album_title) REFERENCES Albums(album_title))''') 



#open and read page 
req = Request('http://pitchfork.com/reviews/albums/?page=1', headers={'User-Agent': 'Mozilla/5.0'}) 
pitchpage = urlopen(req).read() 


#parse with beautiful soup 
soup = BeautifulSoup(pitchpage, "lxml") 
albums = soup('h2') 
artists = soup.find_all(attrs={"class" : "artist-list"}) 


for tag in albums: 

    for album in tag: 
     cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album,)) 

     for artist in artists: 
      artist = artist.string   
      cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist,))   
      cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist)) 


print() 


conn.commit() 

失敗輸出:

+------+-------------------------------------------+-------------+ 
| id |    artist_name    | album_title | 
+------+-------------------------------------------+-------------+ 
| "1" | "Sylvan Esso"        | "Odd Hours" | 
| "2" | "Mew"          | "Odd Hours" | 
| "3" | "Tara Jane O’Neil"      | "Odd Hours" | 
| "4" | "Real Life Buildings"      | "Odd Hours" | 
| "5" | "Bruce Springsteen and the E Street Band" | "Odd Hours" | 
| "6" | "Ravyn Lenae"        | "Odd Hours" | 
| "7" | "Tee Grizzley"       | "Odd Hours" | 
| "8" | "Shugo Tokumaru"       | "Odd Hours" | 
| "9" | "Woods"         | "Odd Hours" | 
| "10" | "Formation"        | "Odd Hours" | 
| "11" | "Valgeir Sigurðsson"      | "Odd Hours" | 
| "12" | "Caddywhompus"       | "Odd Hours" | 
+------+-------------------------------------------+-------------+ 

所需的輸出:

+------+-------------------------------------------+-------------------------------+ 
| id |    artist_name    |   album_title   | 
+------+-------------------------------------------+-------------------------------+ 
| "1" | "Sylvan Esso"        | "What Now"     | 
| "2" | "Mew"          | "Visuals"      | 
| "3" | "Tara Jane O’Neil"      | "Tara Jane O'Neil"   | 
| "4" | "Real Life Buildings"      | "Significant Weather"   | 
| "5" | "Bruce Springsteen and the E Street Band" | "Hammersmirth Odeon, London" | 
| "6" | "Ravyn Lenae"        | "Midnight Moonlight EP"  | 
| "7" | "Tee Grizzley"       | "My Moment"     | 
| "8" | "Shugo Tokumaru"       | "TOSS"      | 
| "9" | "Woods"         | "Love is Love"    | 
| "10" | "Formation"        | "Look at the Powerful People" | 
| "11" | "Valgeir Sigurðsson"      | "Dissonance"     | 
| "12" | "Caddywhompus"       | "Odd Hours"     | 
+------+-------------------------------------------+-------------------------------+ 
+0

顯示一些示例數據和期望的結果。 –

+0

@CL。我爲你添加了2個截圖。 –

+0

顯示所需的結果。 (請參閱[如何 格式化堆棧溢出中的SQL表格 ?](https://meta.stackexchange.com/q/96125)) –

回答

0
albums = soup('h2') 
artists = soup.find_all(attrs={"class" : "artist-list"}) 

的問題是artists列表包含所有藝術家。

您必須從每個專輯中提取循環內的藝術家列表。

+0

不知道我明白了,請詳細介紹一下嗎? –