2014-05-12 41 views
1

我開始編寫this query,我發現很難明白爲什麼應該關閉這個問題。如何計算dataexplorer中每個帖子最頻繁的CloseReasonTypes?

select 
    TOP ##Limit:int?38369## -- The maximum value the hardware can handle. 
    Posts.Id as [Post Link], -- Question title. 
    Count(PendingFlags.PostId) as [Number of pending flags], -- Number of pending flags per questions. 
    Posts.OwnerUserId as [User Link], -- Let click on the colum to see if the same user ask off-topic questions often. 
    Reputation as [User Reputation], -- Interesting to see that such questions are sometimes asked by high rep users. 
    Posts.Score as [Votes], -- Interesting to see that some questions have more than 100 upvotes. 
    Posts.AnswerCount as [Number of Answers], -- I thought we shouldn't answer on off- topic post. 
    Posts.FavoriteCount as [Number of Stars], -- Some questions seems to be very helpfull :) . 
    Posts.CreationDate as [Asked on], -- The older is the question, the more is the chance that flags on them can't get reviewed. 
    Posts.LastActivityDate as [last activity], -- Similar effect as with Posts.CreationDate. 
    Posts.LastEditDate as [modified on], 
    Posts.ViewCount 
from posts 
    LEFT OUTER JOIN Users on Users.id = posts.OwnerUserId 
    INNER JOIN PendingFlags on PendingFlags.PostId = Posts.Id 
where ClosedDate IS NULL -- The question is not closed. 
group by Posts.id, Posts.OwnerUserId, Reputation, Posts.Score, Posts.FavoriteCount, Posts.AnswerCount, Posts.CreationDate, Posts.LastActivityDate, Posts.LastEditDate, Posts.ViewCount 
order by Count(PendingFlags.PostId) desc; -- Questions with more flags have more chance to get them handled, and the higher is the probabilty that the question is off-topic (since several users already reviewed the question). 

鑑於其每題的幾個標誌,我不能用一個簡單的表來顯示標誌用於每個標誌的原因,但我認爲這應該是相關的SHO CloseReasonTypes最常見的值.ID爲每個帖子:這使我兩個問題:

  • 首先:看着this query後,我應該JOIN CloseReasonTypesPendingFlags顯示的原因南而不是他們的數字。既然有帖子PendingFlags之間沒有共同的領域,但我使用from posts爲基準,爲連接表,我對如何做到這一點JOIN毫無頭緒。

  • Secound:我不知道在每一行上選擇最常用的關閉原因。雖然有幾個問題似乎已經討論過類似的情況,但我不能在他們詢問如何在整個表上找到最常見的值的情況下使用他們的答案,從而產生具有單列和單行的表格,而我需要爲每個帖子上的標誌數量做這個。

回答

1

雖然不完全是你在找什麼,我相信這query將爲您提供一個良好的開端。

select 
    PostId as [Post Link], 
    duplicate = sum(case when closereasontypeid = 101 then 1 else 0 end), 
    offtopic = sum(case when closereasontypeid = 102 then 1 else 0 end), 
    unclear = sum(case when closereasontypeid = 103 then 1 else 0 end), 
    toobroad = sum(case when closereasontypeid = 104 then 1 else 0 end), 
    opinion = sum(case when closereasontypeid = 105 then 1 else 0 end), 
    ot_superuser = sum(case when CloseAsOffTopicReasonTypeId = 4 then 1 else 0 end), 
    ot_findexternal = sum(case when CloseAsOffTopicReasonTypeId = 8 then 1 else 0 end), 
    ot_serverfault = sum(case when CloseAsOffTopicReasonTypeId = 7 then 1 else 0 end), 
    ot_lackinfo = sum(case when CloseAsOffTopicReasonTypeId = 12 then 1 else 0 end), 
    ot_typo = sum(case when CloseAsOffTopicReasonTypeId = 11 then 1 else 0 end) 
from pendingflags 
where 
    flagtypeid in (13,14) -- Close flags 
    and creationdate > '2014-04-15' 
group by PostId 

這只是看今年4月15日以來關閉的帖子,並返回約23,500條記錄。

我相信,數據資源管理器不包含的職位有被刪除,因此並沒有包括在結果中。

這將需要修改,如果/當新的理由接近添加或刪除。

+0

這是一個良好的開端,但是這不是我要找的:) ......所以upvoted,但未被接受。它不解決我的第一個問題,也不解決第二個問題*(我知道我可以做到這一點,但我已經有其他行:它不告訴語法選擇最常見的結果)*。 – user2284570