2016-10-10 87 views
0

我正在創建一個R sweave文件,該文件將編譯一份軟件中測試數據的pdf報告。這些數據主要是從看起來是這樣的一個SQL Server表拉:查找第二高值而不是最大值「-1」

| FileName | Version | Category | Value |   Date  | TestNum | 
|:--------:|:-------:|:--------:|:-----:|:-------------------:|:-------:| 
| File1 | 1.0.12 | Run Time | 74 | 2016-10-01 12:00:00 | 1  | 
| File1 | 1.0.12 | Totals | 468 | 2016-10-01 12:00:00 | 1  | 
| File1 | 1.0.12 | DB Size | 589 | 2016-10-01 12:00:00 | 1  | 
| File2 | 1.0.12 | Run Time | 81 | 2016-10-01 12:00:00 | 1  | 
| File2 | 1.0.12 | Totals | 351 | 2016-10-01 12:00:00 | 1  | 
| File2 | 1.0.12 | DB Size | 625 | 2016-10-01 12:00:00 | 1  | 
| File1 | 1.0.15 | Run Time | 74 | 2016-10-01 12:00:00 | 2  | 
| File1 | 1.0.15 | Totals | 468 | 2016-10-01 12:00:00 | 2  | 
| File1 | 1.0.15 | DB Size | 589 | 2016-10-01 12:00:00 | 2  | 
| File2 | 1.0.15 | Run Time | 81 | 2016-10-01 12:00:00 | 2  | 
| File2 | 1.0.15 | Totals | 351 | 2016-10-01 12:00:00 | 2  | 
| File2 | 1.0.15 | DB Size | 625 | 2016-10-01 12:00:00 | 2  | 
| File1 | 1.0.17 | Run Time | 74 | 2016-10-01 12:00:00 | 3  | 
| File1 | 1.0.17 | Totals | 468 | 2016-10-01 12:00:00 | 3  | 
| File1 | 1.0.17 | DB Size | 589 | 2016-10-01 12:00:00 | 3  | 
| File2 | 1.0.17 | Run Time | 81 | 2016-10-01 12:00:00 | 3  | 
| File2 | 1.0.17 | Totals | 351 | 2016-10-01 12:00:00 | 3  | 
| File2 | 1.0.17 | DB Size | 625 | 2016-10-01 12:00:00 | 3  | 
| File1 | 1.0.21 | Run Time | 74 | 2016-10-01 12:00:00 | 4  | 
| File1 | 1.0.21 | Totals | 468 | 2016-10-01 12:00:00 | 4  | 
| File1 | 1.0.21 | DB Size | 589 | 2016-10-01 12:00:00 | 4  | 
| File2 | 1.0.21 | Run Time | 81 | 2016-10-01 12:00:00 | 4  | 
| File2 | 1.0.21 | Totals | 351 | 2016-10-01 12:00:00 | 4  | 
| File2 | 1.0.21 | DB Size | 625 | 2016-10-01 12:00:00 | 4  | 

我用TestNum列,使其更容易以遞增數版本,因爲它們是字符串。所以在我的R腳本中,我有一個本應該找到最新版本的部分,以及之前的版本。

vLatest <- unique(df[df[,"TestNum"] == max(df$TestNum), "Version"]) 
vPrevious <- unique(df[df[,"TestNum"] == max(df$TestNum)-1, "Version"]) 

但是,有時候某個版本的軟件對於每個測試來說都是非常麻煩的並且崩潰。這是不是在圖表看,所以我剛剛加入,我用它來篩選出來的SQL數據庫中的線是非常有用的,然後將R數據幀最終看起來像這樣:

| FileName | Version | Category | Value |   Date  | TestNum | 
|:--------:|:-------:|:--------:|:-----:|:-------------------:|:-------:| 
| File1 | 1.0.12 | Run Time | 74 | 2016-10-01 12:00:00 | 1  | 
| File1 | 1.0.12 | Totals | 468 | 2016-10-01 12:00:00 | 1  | 
| File1 | 1.0.12 | DB Size | 589 | 2016-10-01 12:00:00 | 1  | 
| File2 | 1.0.12 | Run Time | 81 | 2016-10-01 12:00:00 | 1  | 
| File2 | 1.0.12 | Totals | 351 | 2016-10-01 12:00:00 | 1  | 
| File2 | 1.0.12 | DB Size | 625 | 2016-10-01 12:00:00 | 1  | 
| File1 | 1.0.15 | Run Time | 74 | 2016-10-01 12:00:00 | 2  | 
| File1 | 1.0.15 | Totals | 468 | 2016-10-01 12:00:00 | 2  | 
| File1 | 1.0.15 | DB Size | 589 | 2016-10-01 12:00:00 | 2  | 
| File2 | 1.0.15 | Run Time | 81 | 2016-10-01 12:00:00 | 2  | 
| File2 | 1.0.15 | Totals | 351 | 2016-10-01 12:00:00 | 2  | 
| File2 | 1.0.15 | DB Size | 625 | 2016-10-01 12:00:00 | 2  | 
| File1 | 1.0.21 | Run Time | 74 | 2016-10-01 12:00:00 | 4  | 
| File1 | 1.0.21 | Totals | 468 | 2016-10-01 12:00:00 | 4  | 
| File1 | 1.0.21 | DB Size | 589 | 2016-10-01 12:00:00 | 4  | 
| File2 | 1.0.21 | Run Time | 81 | 2016-10-01 12:00:00 | 4  | 
| File2 | 1.0.21 | Totals | 351 | 2016-10-01 12:00:00 | 4  | 
| File2 | 1.0.21 | DB Size | 625 | 2016-10-01 12:00:00 | 4  | 

但隨後vPrevious是仍在尋找TestNum == 3,因此腳本中斷。有沒有一種方法可以查找第二高的值呢?

編輯:通過建議,這裏是什麼查詢看起來像我用來創建數據框。

df <- sqlQuery(db, "select FileName, Version, Category, Value, Date, TestNum 
       from Table where Comments != 'Do Not Include in R Chart'", 
       stringsAsFactors = F) 
+1

vPrevious < - 獨特(DF [DF [ 「TestNum」] ==排序(獨特(DF $ TestNum),T)[2], 「版本」]) – dww

+0

@dww那是我溶液正在尋找感謝! – David

+0

@大衛我已根據您的查詢更新了Sql。雖然您可以選擇在Sql和R中實現相同的結果,但您可以花一點時間比較結果和性能?可能會提供一些有趣的見解...乾杯 –

回答

2

你可以嘗試使用dense_rankorder by TestNum

下面的代碼片段給出了一個例子它的用法。

select c.* 
from (
    select *,dense_rank() over (order by [object_id] desc) as [row_number] 
    from sys.columns 
    ) c 
where c.[row_number] in (1,2) 

如果您可以將Sql查詢添加到問題中,那麼它可能有助於提供更有針對性的響應。

編輯:

量體裁衣運算的原始查詢;

select FileName, Version, Category, Value, Date, TestNum 
from (
    select FileName, Version, Category, Value, Date, TestNum 
     , dense_rank() over (order by [TestNum] desc) as [row_number] 
    from Table 
    where Comments != 'Do Not Include in R Chart' 
    ) t 
where t.[row_number] in (1,2) 
+0

爲什麼downvote這個答案?排名函數恰好*根據某些標準檢索第二,第三最佳值所需的內容。 –