2016-10-13 16 views
0

我試圖創建一個閃亮的應用程序,允許用戶選擇列進行加密,其中每行中的值在後續運行中應始終保持相同是一樣的。即如果客戶名稱=「John」,則在運行此過程時總是得到「A」,如果客戶名稱更改爲「Jon」,則可以獲得「C」......但如果更改回「John」,您將再次獲得A.這將被用於'掩蓋'用於分析的敏感數據。摘要 - 在修改只有一個時在所有行中獲取不同的值

此外,如果任何人都可以通過存儲以後使用的密鑰來'解密'這些列的方法...這將不勝感激。

如何我試圖實現這一點(需要消化庫)的簡單化版本:

test <- data.frame(CustomerName=c("John Snow","John Snow","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Joe Farmer","Joe Farmer","Joe Farmer","Joe Farmer"), 
       LoanNumber=c("12548","45878","45796","45813","45125","45216","45125","45778","45126","32548","45683"), 
       LoanBalance=c("458463","5412548","458463","5412548","458463","5412548","458463","5412548","458463","5412548","2484722"), 
       FarmType=c("Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy")) 


test[,1] <- sapply(test[,1],digest,algo="sha1") 

輸出示例:

        CustomerName LoanNumber LoanBalance FarmType 
1 5c96f777a14f201a6a9b79623d548f7ab61c7a11  12548  458463  Hay 
2 5c96f777a14f201a6a9b79623d548f7ab61c7a11  45878  5412548 Dairy 
3 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45796  458463  Fish 
4 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45813  5412548  Hay 
5 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45125  458463 Dairy 
6 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45216  5412548  Fish 
7 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45125  458463  Hay 
8 b0db86a39b9617cef61a8986fd57af7960eec9f4  45778  5412548 Dairy 
9 b0db86a39b9617cef61a8986fd57af7960eec9f4  45126  458463  Fish 
10 b0db86a39b9617cef61a8986fd57af7960eec9f4  32548  5412548  Hay 
11 b0db86a39b9617cef61a8986fd57af7960eec9f4  45683  2484722 Dairy 

改性數據幀(在約翰除去 'H'):

test <- data.frame(CustomerName=c("Jon Snow","Jon Snow","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Daffy Duck","Joe Farmer","Joe Farmer","Joe Farmer","Joe Farmer"), 
      LoanNumber=c("12548","45878","45796","45813","45125","45216","45125","45778","45126","32548","45683"), 
      LoanBalance=c("458463","5412548","458463","5412548","458463","5412548","458463","5412548","458463","5412548","2484722"), 
      FarmType=c("Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy","Fish","Hay","Dairy")) 
test[,1] <- sapply(test[,1],digest,algo="sha1") 

新的輸出:

        CustomerName LoanNumber LoanBalance FarmType 
1 2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f  12548  458463  Hay 
2 2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f  45878  5412548 Dairy 
3 b0187b6ff2322fa86004d4d22cd479f3cdc345d2  45796  458463  Fish 
4 b0187b6ff2322fa86004d4d22cd479f3cdc345d2  45813  5412548  Hay 
5 b0187b6ff2322fa86004d4d22cd479f3cdc345d2  45125  458463 Dairy 
6 b0187b6ff2322fa86004d4d22cd479f3cdc345d2  45216  5412548  Fish 
7 b0187b6ff2322fa86004d4d22cd479f3cdc345d2  45125  458463  Hay 
8 2127453066c45db6ba7e2f6f8c14d22796c3fd54  45778  5412548 Dairy 
9 2127453066c45db6ba7e2f6f8c14d22796c3fd54  45126  458463  Fish 
10 2127453066c45db6ba7e2f6f8c14d22796c3fd54  32548  5412548  Hay 
11 2127453066c45db6ba7e2f6f8c14d22796c3fd54  45683  2484722 Dairy 

我本來期望:

CustomerName LoanNumber LoanBalance FarmType 
1 2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f  12548  458463  Hay 
2 2cabeabb3b50e04d3b46ea2c68ab12c7350cd87f  45878  5412548 Dairy 
3 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45796  458463  Fish 
4 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45813  5412548  Hay 
5 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45125  458463 Dairy 
6 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45216  5412548  Fish 
7 10bf345ab114c20df2d1eedbbe7e7cd6b969db05  45125  458463  Hay 
8 b0db86a39b9617cef61a8986fd57af7960eec9f4  45778  5412548 Dairy 
9 b0db86a39b9617cef61a8986fd57af7960eec9f4  45126  458463  Fish 
10 b0db86a39b9617cef61a8986fd57af7960eec9f4  32548  5412548  Hay 
11 b0db86a39b9617cef61a8986fd57af7960eec9f4  45683  2484722 Dairy 

我誤解是如何工作的?如果我將相同的邏輯應用到多個列,我會爲未更改的列獲得相同的值,但問題仍然存在於具有修改值的列中。我試圖向量化摘要函數,以確保我的sapply函數不是具有相同結果的問題。有任何想法嗎?

回答

0

我認爲我已經回答了我自己的問題......當然我在這裏發佈後:)。

摘要函數具有serialize參數,其中包含以下文檔:一個邏輯變量,指示是否使用serialize(以ASCII形式)序列化對象。將其設置爲FALSE允許將給定字符串的摘要輸出與已知的控制輸出進行比較。它還允許使用原始向量,如非ASCII序列化的輸出。

將serialize設置爲FALSE似乎解決了問題,我得到了預期的輸出。

例如:

test[,1] <- sapply(test[,1],digest,algo="sha1",serialize = FALSE) 
相關問題