[R初學者用什麼似乎是一個非常簡單的問題: 我有一些電子郵件的日誌,我已經在格式讀入R:R:轉換電子郵件地址爲唯一整數
>log1
Date Time From To
1 2000-01-01 00:00:00 [email protected] [email protected]
2 2000-01-02 01:00:00 carolyn @mail.com [email protected]
3 2000-01-03 02:00:00 [email protected] [email protected]
4 2000-01-04 03:00:00 chris @mail.com [email protected]
5 2000-01-05 04:00:00 [email protected] [email protected]
6 2000-01-06 05:00:00 [email protected] [email protected]
我需要要將log1 $ From和log1 $ To更改爲全局唯一數字標識符,以便稍後在其他日誌中讀取任何給定電子郵件地址時將收到與先前日誌相同的標識符。
我曾嘗試:
id <- as.numeric(as.character(log1[,3])))
id<-as.numeric(levels(log1[,3])))
id <- charToRaw(log1[,4]), base=16)
會某種靈魂請幫我 - 謝謝!
道歉或許應該已經包括此:
Date=c("01/01/2000" ,"02/01/2000" ,"03/01/2000", "04/01/2000" ,"05/01/2000" ,"06/01/2000","07/01/2000","08/01/2000",
"09/01/2000","10/01/2000","11/01/2000", "12/01/2000" ,"13/01/2000", "14/01/2000", "15/01/2000","16/01/2000"
,"17/01/2000","18/01/2000","19/01/2000","20/01/2000","01/01/2000","02/01/2000")
Time=c("00:00:00","01:00:00","02:00:00", "03:00:00" ,"04:00:00" ,"05:00:00", "06:00:00" ,"07:00:00", "08:00:00", "09:00:00" ,"10:00:00",
"11:00:00", "12:00:00","13:00:00", "14:00:00","15:00:00","16:00:00","17:00:00","18:00:00","19:00:00","00:00:00" ,"00:00:00")
From=c("[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]")
To=c("[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]")
log<-data.frame(Date=Date,Time=Time,From=From,To=To)
在嘗試使用MD5生成全局唯一標識符:注意[email protected]標識符是如何內ID_to正確的比賽,但不在範圍在水平/因子方法ID_from
ID_to<-data.frame()
ID_from<-data.frame()
for (i in 1:nrow(log)){
to<-as.numeric(paste('0x', substr(rep(hmac('secret',log[i,4], algo='md5'), 2), c(1, 9, 17, 25), c(8, 16, 24, 32)),sep=""))
(ID_to<-rbind(ID_to,to))
from<-as.numeric(paste('0x', substr(rep(hmac('secret',log[i,3], algo='md5'), 2), c(1, 9, 17, 25),c(8, 16, 24, 32)),sep=""))
(ID_from<-rbind(ID_from,from))
}
ID_to[,3]<-paste(ID_to[,1],ID_to[,2], sep="")
ID_from[,3]<-paste(ID_from[,1],ID_from[,2], sep="")
edgelist<-data.frame(ID_from[,3],log[,3],ID_to[,3],log[,4],log[,1],log[,2])
print(edgelist)
ID_from...3. log...3. ID_to...3. log...4. log...1. log...2.
27488842661591306920 [email protected] 18727221862165338513 [email protected] 01/01/2000 00:00:00
38124472891255273775 [email protected] 1251903296725454474 [email protected] 02/01/2000 01:00:00
29070047663451376630 [email protected] 17074276751156451031 [email protected] 03/01/2000 02:00:00
8261398433828474582 [email protected] 1563683670909194033 [email protected] 04/01/2000 03:00:00
18727221862165338513 [email protected] 26735368323826533112 [email protected] 05/01/2000 04:00:00
5680838251168988404 [email protected] 2923605896229594830 [email protected] 06/01/2000 05:00:00
2351312285811012730 [email protected] 17171333544033270402 [email protected] 07/01/2000 06:00:00
328278708432069254 [email protected] 33840664403556851587 [email protected] 08/01/2000 07:00:00
1127901879852039037 [email protected] 1973548136161209824 [email protected] 09/01/2000 08:00:00
7349515121496417787 [email protected] 5680838251168988404 [email protected] 10/01/2000 09:00:00
27488842661591306920 [email protected] 328278708432069254 [email protected] 11/01/2000 10:00:00
38124472891255273775 [email protected] 1127901879852039037 [email protected] 12/01/2000 11:00:00
29070047663451376630 [email protected] 27488842661591306920 [email protected] 13/01/2000 12:00:00
8261398433828474582 [email protected] 38124472891255273775 [email protected] 14/01/2000 13:00:00
18727221862165338513 [email protected] 29070047663451376630 [email protected] 15/01/2000 14:00:00
5680838251168988404 [email protected] 8261398433828474582 [email protected] 16/01/2000 15:00:00
2351312285811012730 [email protected] 2351312285811012730 [email protected] 17/01/2000 16:00:00
328278708432069254 [email protected] 7349515121496417787 [email protected] 18/01/2000 17:00:00
1127901879852039037 [email protected] 41762759923562968495 [email protected] 19/01/2000 18:00:00
7349515121496417787 [email protected] 24894056753582090007 [email protected] 20/01/2000 19:00:00
27488842661591306920 [email protected] 18727221862165338513 [email protected] 01/01/2000 00:00:00
27488842661591306920 [email protected] 18727221862165338513 [email protected] 02/01/2000 00:00:00
嘗試:
獲得一個錯誤:
log <- union(levels(log[,3]), levels(log[,4]))
>Error in emails[, 3] : incorrect number of dimensions
不太瞭解R,但是從你提到的內容來看,你正在尋找From和To電子郵件地址組合的唯一標識符。你可以嘗試爲它們的連接創建一個散列。 R似乎有一些散列函數,所以你可以嘗試一下。 – Gangadhar 2012-03-18 15:09:43
感謝您的輸入傢伙,當然有一個比實現校驗和或hashmap更簡單的解決方案?! – 2012-03-18 15:41:18
只要您爲每個輸入獲取唯一標識符,就可以使用任何算法(md5,sha,crc,..)。 – blejzz 2012-03-18 16:07:51