2017-09-05 18 views
0

我是新來的編碼,這是我的第一個項目。到目前爲止,我通過谷歌搜索,教程和堆棧將所有內容拼湊在一起。數據框不附加到MySQL數據庫

到目前爲止,我已經設法從RSS獲取數據到SQL數據庫。但是,當我運行腳本時,新數據不會附加。它只是更新最新的供稿條目,刪除之前的內容。

我不確定我寫錯了什麼,因爲我在函數中添加了append。道歉是這是一個非常愚蠢的問題,但我不明白我做錯了什麼。

import pandas as pd 
from pandas.io import sql 
import feedparser 
import time 

rawrss = ['http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml', 
      'https://www.yahoo.com/news/rss/', 
      'http://www.huffingtonpost.co.uk/feeds/index.xml', 
      'http://feeds.feedburner.com/TechCrunch/', 
     ] 

time = time.strftime('%a %H:%M:%S') 
summary = 'text' 

posts = [] 
for url in rawrss: 
    feed = feedparser.parse(url) 
    for post in feed.entries: 
     posts.append((time, post.title, post.link, summary)) 

df = pd.DataFrame(posts, columns=['article_time','article_title','article_url', 'article_summary']) # pass data to init 
df.set_index(['article_time'], inplace=True) 

import pymysql 
from sqlalchemy import create_engine 

engine = create_engine('mysql+pymysql://<username>:<password>@<hostname>:<port>/<dbname>?charset=utf8', encoding = 'utf-8') 
engine.execute("DROP TABLE IF EXISTS rsstracker") 
engine.execute("""CREATE TABLE rsstracker(article_time varchar(255), 
       article_title varchar(255), 
       article_url varchar(1000), 
       article_summary varchar(1000))""") 

df.to_sql(con=engine, name='rsstracker', if_exists='append', flavor='mysql') 
+1

這'engine.execute(「DROP TABLE IF EXISTS rsstracker」)'可以解釋爲什麼_However,當我運行該腳本的新數據不APPEND_ – RiggsFolly

+0

這是否刪除該表並重新開始我每次運行腳本? –

回答

0

有人向我指出,我正在放棄表格,這導致我意識到我每次都試圖創建一個表格。

我現在編輯了初始代碼來創建表並用插入替換它。

import pandas as pd 
from pandas.io import sql 
import feedparser 
import time 

rawrss = ['http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml', 
      'https://www.yahoo.com/news/rss/', 
      'http://www.huffingtonpost.co.uk/feeds/index.xml', 
      'http://feeds.feedburner.com/TechCrunch/', 
     ] 

time = time.strftime('%a %H:%M:%S') 
summary = 'text' 

posts = [] 
for url in rawrss: 
    feed = feedparser.parse(url) 
    for post in feed.entries: 
     posts.append((time, post.title, post.link, summary)) 

df = pd.DataFrame(posts, columns=['article_time','article_title','article_url', 'article_summary']) # pass data to init 
df.set_index(['article_time'], inplace=True) 

import pymysql 
from sqlalchemy import create_engine 

engine = create_engine('mysql+pymysql://<username>:<password>@<hostname>:<port>/<dbname>?charset=utf8', encoding = 'utf-8') 
engine.execute("INSERT INTO rsstracker VALUES('%s', '%s', '%s','%s')" % (time, post.title, post.link, summary)) 


df.to_sql(con=engine, name='rsstracker', if_exists='append', flavor='mysql')