R使用grep函數來識別從中產生二進制指標的值

我的問題是提高我的代碼的效率/優雅。我有一個df和一系列藥物。我想確定以C09和C10開始的藥物。如果一個人有這些藥物，我想給他們一個二元指標（1 =是，0 =否）是否有這些藥物。二進制指標將位於同一數據框中名爲「statins」的新列中。我用這篇文章作爲指導：What's the R equivalent of SQL's LIKE 'description%' statement?。R使用grep函數來識別從中產生二進制指標的值

這是我所做的;

names<-c("tom", "mary", "mary", "john", "tom", "john", "mary", "tom", "mary", "tom", "john") 
drugs<-c("C10AA05", "C09AA03", "C10AA07", "A02BC01", "C10AA05", "C09AA03", "A02BC01", "C10AA05", "C10AA07", "C07AB03", "N02AA01") 
df<-data.frame(names, drugs) 
df 

    names drugs 
1 tom C10AA05 
2 mary C09AA03 
3 mary C10AA07 
4 john A02BC01 
5 tom C10AA05 
6 john C09AA03 
7 mary A02BC01 
8 tom C10AA05 
9 mary C10AA07 
10 tom C07AB03 
11 john N02AA01 


ptn = '^C10.*?' 
get_statin = grep(ptn, df$drugs, perl=T) 
stats<-df[get_statin,] 

names drugs 
1 tom C10AA05 
3 mary C10AA07 
5 tom C10AA05 
8 tom C10AA05 
9 mary C10AA07 


ptn2='^C09.*?' 
get_other=grep(ptn2, df$drugs, perl=T) 
other<-df[get_other,] 
other 

    names drugs 
2 mary C09AA03 
6 john C09AA03 

df$statins=ifelse(df$drugs %in% stats$drugs,1,0) 
df 

    names drugs statins 
1 tom C10AA05  1 
2 mary C09AA03  0 
3 mary C10AA07  1 
4 john A02BC01  0 
5 tom C10AA05  1 
6 john C09AA03  0 
7 mary A02BC01  0 
8 tom C10AA05  1 
9 mary C10AA07  1 
10 tom C07AB03  0 
11 john N02AA01  0 


df$statins=ifelse(df$drugs %in% other$drugs,1,df$statins) 
df 

    names drugs statins 
1 tom C10AA05  1 
2 mary C09AA03  1 
3 mary C10AA07  1 
4 john A02BC01  0 
5 tom C10AA05  1 
6 john C09AA03  1 
7 mary A02BC01  0 
8 tom C10AA05  1 
9 mary C10AA07  1 
10 tom C07AB03  0 
11 john N02AA01  0

所以，我能得到我想要的東西 - 但我覺得有可能是一個更好的，更好的方式來做到這一點，將在這裏得到任何指導。一個明顯的解決方案，我可以感覺到你在屏幕上大喊大叫，只是使用'^ C'作爲一種模式 - 因此可以抓住所有以C開頭的藥物。在我的主要分析中，我無法做到這一點， C'會在某些情況下捕捉我不想要的東西，所以我需要儘可能縮小範圍。

預先感謝您

來源

2013-06-28 user2363642

在這裏你去：

transform(df, statins=as.numeric(grepl('^C(10|09)', drugs)))

來源

2013-06-28 19:41:30

有附加值利用變換而不是'data.frame'？我在尋求自己的理解。 – dayne

nope，我只是習慣於使用它。你也可以用'within'做同樣的事情（用'<-'替換'='）。當然，'transform'和'within'可以讓你修改現有的列，而'data.frame'不會。 –

輝煌！謝謝馬修！你介意我問你別的什麼嗎？如果我想將以N和A開頭的藥物與您的代碼結合起來，我該怎麼辦？我嘗試了轉換（df，statins = as.numeric（grepl（'^ N02。*？'|'A02B。*？'，drugs）））...........但它給了我一個錯誤; 「^ N02。*？」中的錯誤＆「A02B。*？」：操作僅適用於數字，邏輯或複數類型 – user2363642

R使用grep函數來識別從中產生二進制指標的值

回答

相關問題