今天我得到了一個奇怪的結果。移調相同的物體

要複製它，請考慮以下的數據幀：

x <- data.frame(x=1:3, y=11:13) 
y <- x[1:3, 1:2]

他們都應該是，實際上是相同的：

identical(x,y) 
# [1] TRUE

應用t()到張玉峯對象應產生相同的結果，但：

identical(t(x),t(y)) 
# [1] FALSE

區別在於列名稱：

colnames(t(x)) 
# NULL 
colnames(t(y)) 
# [1] "1" "2" "3"

鑑於此，如果你想按列堆棧y，你得到你所期望的：

stack(as.data.frame(t(y))) 
# values ind 
# 1  1 1 
# 2  11 1 
# 3  2 2 
# 4  12 2 
# 5  3 3 
# 6  13 3

同時：

stack(as.data.frame(t(x))) 
#  values ind 
# 1  1 V1 
# 2  11 V1 
# 3  2 V2 
# 4  12 V2 
# 5  3 V3 
# 6  13 V3

在後一種情況下， as.data.frame()找不到原始列名稱並自動生成它們。

罪魁禍首是as.matrix()，由t()叫：

rownames(as.matrix(x)) 
# NULL 
rownames(as.matrix(y)) 
# [1] "1" "2" "3"

一種解決方法是設置rownames.force：（並相應地重寫stack(...)調用）

rownames(as.matrix(x, rownames.force=TRUE)) 
# [1] "1" "2" "3" 
rownames(as.matrix(y, rownames.force=TRUE)) 
# [1] "1" "2" "3" 
identical(t(as.matrix(x, rownames.force=TRUE)), 
      t(as.matrix(y, rownames.force=TRUE))) 
# [1] TRUE

我的問題是：

爲什麼as.matrix()對待不同x和y和
你怎麼能告訴他們有什麼區別？

注意，其他信息功能不x, y之間發現差異性：

identical(attributes(x), attributes(y)) 
# [1] TRUE 
identical(str(x), str(y)) 
# ... 
#[1] TRUE

評論到解決方案

Konrad Rudolph給出了一個簡潔而有效的解釋，上述行爲（見mt1022 更多細節）。

總之康拉德表明：

一個）x和y是內部不同;
b）「identical太簡直太默認了」來捕捉這個內部差異。

現在，如果你把一組S，其中有的S，然後S和T所有元素的一個子集T是完全一樣的對象。所以，如果你把一個數據幀y，其中有所有行和x，然後x和y列應完全相同的對象。不幸的是x \neq y！
這種行爲不僅是違反直覺，而且是混淆的，也就是說差異不是不言自明，而只有內部甚至默認identical函數看不到它。

另一個自然原理是轉置兩個相同的（類矩陣）對象產生相同的對象。再次，這是因爲在轉位之前，identical是「過於寬鬆」的事實打破了;轉置後，默認identical足以看出差異。

恕我直言，這種行爲（即使它是不是一個錯誤）是一個科學的語言如R.
希望這篇文章將推動一些關注和將R團隊將考慮修改其錯誤行爲。

來源

2017-04-04 antonio

似乎是如何定義'row.names'，因爲它們在'dput（x）'和'dput（y'）中是不同的。在使用''[.data.frame'' – user20650

時可能會明確添加它們您可以使用dput（x）和dput（y），您將看到row.names以不同的方式存儲。我認爲它與自動row.names處理有關（查看https://stat.ethz.ch/R-manual/R-devel/library/base/html/row.names.html詳細信息部分獲取更多信息），不知道爲什麼子集返回不同的row.names儘管...說實話，它聞起來像一個意想不到的行爲 – digEmAll

'相同（x，y，attrib.as.set = FALSE）'似乎在差異（注意到'*注意，相同的（x，y，FALSE，FALSE，FALSE，FALSE）會精確測試其是否相等。「* – user20650

identical簡直是在默認情況下過於寬鬆，但你可以改變：

> identical(x, y, attrib.as.set = FALSE) 
[1] FALSE

原因可以通過詳細檢查的對象中找到：

> dput(x) 
structure(list(x = 1:3, y = 11:13), .Names = c("x", "y"), row.names = c(NA, 
-3L), class = "data.frame") 
> dput(y) 
structure(list(x = 1:3, y = 11:13), .Names = c("x", "y"), row.names = c(NA, 
3L), class = "data.frame")

注意不同row.names屬性：

> .row_names_info(x) 
[1] -3 
> .row_names_info(y) 
[1] 3

從文檔中我們可以蒐集負數表示自動排名（對於x），而y的排名不是自動的。而as.matrix對待它們的方式不同。

來源

2017-04-04 16:07:51

沒有分歧。 'row.names'的幫助頁面上寫着：「對於n> 2，形式1：n的行名稱以緊湊形式存儲在內部，..」，as.matrix和「其他函數」將「處理[這種名字]不同。「運行軌跡（'row.names'）表明它對於提問者的例子被調用了3次（至少有一次調用了'print（y'））。它還說：「'row.names'將始終返回一個字符向量（如果需要檢索一組整數值的行名，則使用attr（x，」row.names「））。 –

'row.names = c（NA，3L）'仍然自動生成row.names以及'row.names = c（NA，-3L）'。問題是，爲什麼對數據進行子集化會改變符號（從而導致差異）？ – digEmAll

@digEmAll：'c（NA，-3L）'似乎將對象標記爲沒有明確的「row.names」（即未設置或設置爲NULL），這意味着函數適用於data.frame 「row.names」應該忽略這個屬性。 'c（NA，3L）'似乎將該對象標記爲具有顯式的「row.names」，但是形式爲'1：nrow（x）'，可以不用創建。 ''[.data.frame「'返回數據的一個子集以及它的」row.names「的一個子集（例如'x [2：3，]的'row.names'不能被緊湊地存儲），並且似乎最一致的行爲方式總是返回帶有明確「row.names」的對象。 –

正如在評論中，x和y不完全相同。當我們調用t到data.frame，t.data.frame將被執行：

function (x) 
{ 
    x <- as.matrix(x) 
    NextMethod("t") 
}

我們可以看到，它調用as.matrix，即as.matrix.data.frame：

function (x, rownames.force = NA, ...) 
{ 
    dm <- dim(x) 
    rn <- if (rownames.force %in% FALSE) 
     NULL 
    else if (rownames.force %in% TRUE) 
     row.names(x) 
    else if (.row_names_info(x) <= 0L) 
     NULL 
    else row.names(x) 
...

正如評論說@oropendola，.row_names_infox的迴歸和y是不同的，上述功能是差異生效的地方。

那麼爲什麼y有不同rownames？讓我們來看看[.data.frame，我在關鍵線路添加評論：

{ 
    ... # many lines of code 
    xx <- x #!! this is where xx is defined 
    cols <- names(xx) 
    x <- vector("list", length(x)) 
    x <- .Internal(copyDFattr(xx, x)) # This is where I am not sure about 
    oldClass(x) <- attr(x, "row.names") <- NULL 
    if (has.j) { 
     nm <- names(x) 
     if (is.null(nm)) 
      nm <- character() 
     if (!is.character(j) && anyNA(nm)) 
      names(nm) <- names(x) <- seq_along(x) 
     x <- x[j] 
     cols <- names(x) 
     if (drop && length(x) == 1L) { 
      if (is.character(i)) { 
       rows <- attr(xx, "row.names") 
       i <- pmatch(i, rows, duplicates.ok = TRUE) 
      } 
      xj <- .subset2(.subset(xx, j), 1L) 
      return(if (length(dim(xj)) != 2L) xj[i] else xj[i, 
                  , drop = FALSE]) 
     } 
     if (anyNA(cols)) 
      stop("undefined columns selected") 
     if (!is.null(names(nm))) 
      cols <- names(x) <- nm[cols] 
     nxx <- structure(seq_along(xx), names = names(xx)) 
     sxx <- match(nxx[j], seq_along(xx)) 
    } 
    else sxx <- seq_along(x) 
    rows <- NULL ## this is where rows is defined, as we give numeric i, the following 
    ## if block will not be executed 
    if (is.character(i)) { 
     rows <- attr(xx, "row.names") 
     i <- pmatch(i, rows, duplicates.ok = TRUE) 
    } 
    for (j in seq_along(x)) { 
     xj <- xx[[sxx[j]]] 
     x[[j]] <- if (length(dim(xj)) != 2L) 
      xj[i] 
     else xj[i, , drop = FALSE] 
    } 
    if (drop) { 
     n <- length(x) 
     if (n == 1L) 
      return(x[[1L]]) 
     if (n > 1L) { 
      xj <- x[[1L]] 
      nrow <- if (length(dim(xj)) == 2L) 
       dim(xj)[1L] 
      else length(xj) 
      drop <- !mdrop && nrow == 1L 
     } 
     else drop <- FALSE 
    } 
    if (!drop) { ## drop is False for our case 
     if (is.null(rows)) 
      rows <- attr(xx, "row.names") ## rows changed from NULL to 1,2,3 here 
     rows <- rows[i] 
     if ((ina <- anyNA(rows)) | (dup <- anyDuplicated(rows))) { 
      if (!dup && is.character(rows)) 
       dup <- "NA" %in% rows 
      if (ina) 
       rows[is.na(rows)] <- "NA" 
      if (dup) 
       rows <- make.unique(as.character(rows)) 
     } 
     if (has.j && anyDuplicated(nm <- names(x))) 
      names(x) <- make.unique(nm) 
     if (is.null(rows)) 
      rows <- attr(xx, "row.names")[i] 
     attr(x, "row.names") <- rows ## this is where the rownames of x changed 
     oldClass(x) <- oldClass(xx) 
    } 
    x 
}

我們可以看到，y通過類似attr(x, 'row.names')得到它的名字：

> attr(x, 'row.names') 
[1] 1 2 3

所以，當我們用[.data.frame創建y，它接收row.names屬性與x不同，其中row.names是自動的，並且在dput結果中顯示負號。

注

行：

編輯

事實上，這已經在row.names手冊說明。名稱與數組的rownames相似，並且它有一個方法爲數組參數調用rownames。

形式1的行的名稱：N對於n> 2在內部存儲在一個緊湊的形式，這可能會從C代碼或由deparsing但從來沒有通過 row.names或ATTR（X，「行中看到。名稱「）。此外，此排序的一些名稱被標記爲「自動」，並通過as.matrix 和data.matrix（以及潛在的其他函數）進行不同處理。

所以attr不自動row.names（像的x）和明確的整數row.names（像的y）之間，同時，這是通過as.matrix通過內部表示.row_names_info判別區分。

來源

2017-04-04 15:49:43 mt1022

值得注意的是，attr（x，「row.names」）和attr（x，「row.names」）= value並不顯示R在內部如何處理「row.names 」。 '.row_names_info'更準確。例如。 'attr（x，「row.names」）= 1：3'不將'1：3'存儲爲「row.names」，但是如'.row_names_info（x，0）'所示。儘管如此，除了'NULL'之外的任何其他標籤都將該對象標記爲具有用戶定義的「row.names」，因此函數（如'as.matrix'）需要/應該考慮到這一點。 –

當然。 'attr（x，'row。名稱'）'和'attr（y，'row.names'）'給出了相同的結果！ – mt1022

移調相同的物體

評論到解決方案

回答

編輯

相關問題