請考慮下列表格。加入4個表格中的數據以計算幾個加權分數
users
擁有數以萬計的Twitter用戶;他們的tweets
索引sp100_id
,這是公司的id(請參閱sp100
)鳴叫正在談論。 tweets.class
爲每條推文保留指定的情緒類(1
=中性,2
=正數,3
=負數)。 tweets.rt
保存推文已被轉推的次數。最後,每個用戶被賦予一個quality
分數和follow
評分,如下:
users tweets
------------------------- -----------------------------------------------
user_id quality follow tweet_id sp100_id nyse_date user_id class rt
------------------------- -----------------------------------------------
1 2.50 5.00 1 1 2011-03-12 1 1 0
2 0.75 1.00 2 1 2011-03-13 1 2 2
3 1 2011-03-13 1 2 1
daterange 4 1 2011-03-13 2 2 0
---------------- 5 1 2011-03-13 2 3 3
_date 6 2 2011-03-12 2 2 3
---------------- 7 2 2011-03-12 2 2 0
2011-03-11 8 2 2011-03-12 1 3 5
2011-03-12 9 2 2011-03-13 2 2 0
2011-03-13
sp100
----------------
sp100_id _name
----------------
1 Alcoa
2 Apple
所需的輸出是每sp100_id
列表每_date
的每加權陽性(class=2
)和負極(class=3
)鳴叫的量rt
,「質量」和follow
:
sp100_id nyse_date pos-rt pos-quality pos-follow neg-rt neg-quality neg-follow
--------------------------------------------------------------------------------
1 2011-03-11 0 0 0 0 0 0
1 2011-03-12 0 0 0 0 0 0
1 2011-03-13 5 (1) 5.75 (2) 11.00 (3) 3 (4) 0.75 (5) 1.00 (6)
2 2011-03-11 0 0 0 0 0 0
2 2011-03-12 3 (7) 5.00 (8) 10.00 (9) 5.00 2.50 2.50
2 2011-03-13 0 0.75 1.00 0 0 0
--------------------------------------------------------------------------------
(1) On 2011-03-13, 3 positive tweets for sp100_id 1. 1 tweet retweeted 2 times,
1 tweets retweeted 1 time and 1 tweet retweeted 0 times = 2x2+1x1+1x0 = 5
(2) On 2011-03-13, 2 positive tweets made by user 1, who has quality 2.50 and
1 positive tweet made by user 2, who has quality 0.75 = 2x2.50+1x0.75 = 5.75
(3) On 2011-03-13, 2 positive tweets made by user 1, who has follow 5.00 and
1 positive tweet made by user 2, who has follow 1 = 2x5.00+1x1.00 = 11.00
(4) On 2011-03-13, 1 negative tweet made by user 2, retweeted 3 times = 1x3 = 3
(5) On 2011-03-13, 1 negative tweet made by user 2, who has quality 0.75, thus
1x0.75 = 0.75
(6) On 2011-03-13, 1 negative tweets made by user 2, who has follow 1.00 so
1x1.00 = 1.00
(7) 1 positive tweet which has been retweeted 3 times, 1 positive tweet without
any retweets = 1x3+1x0 = 3
(8) 2 positive tweets from user 2 x quality 2.50 = 5.00
(9) 2 positive tweets x follow 5 = 10.00
我試圖解釋自己儘可能好。誰可以幫助我構建正確的查詢?正如你所看到的,還有沒有推文(所有值爲零)的日期,都需要包含在結果集中。我現在有這一點,但我有麻煩整理休息:通過正確的語法來代替
SELECT
s.sp100_id,
d._date,
COALESCE(c.pos-rt,0) AS pos-rt,
COALESCE(c.pos-quality,0) AS pos-quality,
COALESCE(c.pos-follow,0) AS pos-follow,
COALESCE(c.neg-rt,0) AS neg-rt,
COALESCE(c.neg-quality,0) AS neg-quality,
COALESCE(c.neg-follow,0) AS neg-follow
FROM sp100 s
CROSS JOIN daterange d
LEFT JOIN (
SELECT
sp100_id,
nyse_date,
COUNT(CASE class WHEN 2 THEN 1 END) * [rt] AS pos-rt,
COUNT(CASE class WHEN 2 THEN 1 END) * [quality] AS pos-quality,
COUNT(CASE class WHEN 2 THEN 1 END) * [follow] AS pos-follow,
COUNT(CASE class WHEN 3 THEN 1 END) * [rt] AS neg-rt,
COUNT(CASE class WHEN 3 THEN 1 END) * [quality] AS neg-quality,
COUNT(CASE class WHEN 3 THEN 1 END) * [follow] AS neg-follow
FROM tweets
GROUP BY sp100_id, nyse_date
) c ON s.sp100_id = c.sp100_id AND d._date = c.nyse_date
ORDER BY s.sp100_id, d._date ASC
顯然,[rt]
,[quality]
和[follow]
需要,我不知道的COUNT(...)
要麼,因爲它現在第一計數推文的數量,但它應該把每一條推文分開,並乘以它自己的轉推數('rt')。
有人可以幫我嗎?
有一些問題了解你的表腳註(1):第一鳴叫轉推了兩次;爲什麼它對'pos-rt' 2 * 2而不是1 * 2的貢獻,而另外兩個推文(retweted一次和零次)分別貢獻1 * 1和1 * 0? – eggyal 2012-07-31 17:30:07
在腳註(8)中,我認爲相關用戶擁有'user_id = 2'且質量= 0.75,因此'pos-rt'應該是'1.5'?同樣,對於腳註(9)'follow = 1.00',因此'pos-follow'應該是'2.00'? – eggyal 2012-07-31 17:45:44
你在這兩個帳戶都是正確的:-) – Pr0no 2012-07-31 20:09:34