2017-09-06 163 views
0

我有數據包含emailaddress和州,我想創建一個標記每個州​​的區域的列。在SQL中,我通過case語句完成了此操作,但在R中最好的方法是什麼?我通過美國人口普查定義了地區(截至2017年)。添加專欄來標記美國各州的美國人口普查地區

我的出發數據是這樣的:

emailaddress  states 
[email protected] NV  
[email protected] CA  
[email protected]  UT  
[email protected] AZ  
[email protected]  IA  

我想要得到的結果是:

emails   states regions 
[email protected] NV  West 
[email protected] CA  West 
[email protected]  UT  West 
[email protected] AZ  West 
[email protected]  IA  Midwest 

然後我想寫這篇輸出到CSV文件。

讚賞任何幫助或起點。

+1

也許你需要'split(df1 $ states,df1 $ regions)'或者你需要一個單獨的列,然後用'dcast'即ie library(data.table); dcast(setDT(df1),rowid(regions)〜regions,value.var =「states」)' – akrun

+0

@ akrun..Thanku開始了..但我有一個快速的問題..我將如何將這些狀態組合地區?因爲這個區域列是我想要的輸出 – sim

+0

我認爲最好的選擇是使用'split'使用'列表',如上面在我的評論中提到 – akrun

回答

2

像往常一樣困難的部分是收集數據,但我碰巧從US Census歸檔。所以運行的代碼以下行運行後下方的 「國家/地區數據」 部分:

df <- data.frame(emails=c("[email protected]","[email protected]","[email protected]", 
          "[email protected]","[email protected]"), 
       states=c("NV","CA","UT","AZ","IA")) 

df$regions <- sapply(df$states, 
       function(x) names(region.list)[grep(x,region.list)]) 

#Then write to desktop, for example, with: 
write.csv(df,"~/Desktop/nameHere.csv",row.names=FALSE) 

輸出:

  emails states regions 
1 [email protected]  NV West 
2 [email protected]  CA West 
3 [email protected]  UT West 
4 [email protected]  AZ West 
5 [email protected]  IA Midwest 

國家/地區的數據:

NE.name <- c("Connecticut","Maine","Massachusetts","New Hampshire", 
      "Rhode Island","Vermont","New Jersey","New York", 
      "Pennsylvania") 
NE.abrv <- c("CT","ME","MA","NH","RI","VT","NJ","NY","PA") 
NE.ref <- c(NE.name,NE.abrv) 

MW.name <- c("Indiana","Illinois","Michigan","Ohio","Wisconsin", 
      "Iowa","Kansas","Minnesota","Missouri","Nebraska", 
      "North Dakota","South Dakota") 
MW.abrv <- c("IN","IL","MI","OH","WI","IA","KS","MN","MO","NE", 
      "ND","SD") 
MW.ref <- c(MW.name,MW.abrv) 

S.name <- c("Delaware","District of Columbia","Florida","Georgia", 
      "Maryland","North Carolina","South Carolina","Virginia", 
      "West Virginia","Alabama","Kentucky","Mississippi", 
      "Tennessee","Arkansas","Louisiana","Oklahoma","Texas") 
S.abrv <- c("DE","DC","FL","GA","MD","NC","SC","VA","WV","AL", 
      "KY","MS","TN","AR","LA","OK","TX") 
S.ref <- c(S.name,S.abrv) 

W.name <- c("Arizona","Colorado","Idaho","New Mexico","Montana", 
      "Utah","Nevada","Wyoming","Alaska","California", 
      "Hawaii","Oregon","Washington") 
W.abrv <- c("AZ","CO","ID","NM","MT","UT","NV","WY","AK","CA", 
      "HI","OR","WA") 
W.ref <- c(W.name,W.abrv) 

region.list <- list(
    Northeast=NE.ref, 
    Midwest=MW.ref, 
    South=S.ref, 
    West=W.ref) 
+0

..在我的數據中,我有一千個emails.so在開始的時候你提到過電子郵件我將如何輸入那些所有電子郵件? – sim

+0

@sim - 您的數據的格式是什麼?它是一個文本文件,一個CSV?請在read.csv()上搜索像[this one]這樣的頁面(https://stackoverflow.com/questions/3391880/how-to-get-a-csv-file-into-r)以便「讀取在「你的數據到R.你不必手動輸入它們。上面的示例數據只是爲了演示我的答案。 – www

+0

..我的數據爲CSV文件 – sim