刪除其列中的值包含2個以上4個唯一字符的行

希望標題的措辭有意義。我有一個由值組成的數據框：「A」，「B」，「C」，「D」，「」，「A/B」。我想確定哪些行只包含2個「A」，「B」，「C」或「D」。這些字母中每個字母的頻率並不重要。我只想知道該行中是否存在超過2個這樣的4個字母。刪除其列中的值包含2個以上4個唯一字符的行

下面是一個示例數據幀：

df.sample = as.data.frame(rbind(c("A","B","A","A/B","B","B","B","B","","B"),c("A","B","C","A","B","","","B","","B"),c("A","B","D","D","B","B","B","B","","B"),c("A","B","A","A","B","B","B","B","B","B"))) 
    df.sample 

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 
    1 A B A A/B B B B B  B 
    2 A B C A B  B  B 
    3 A B D D B B B B  B 
    4 A B A A B B B B B B

我想的功能適用於確定多少各4個字母（「A」，「B」，「C」中的每一行，或「D」），而不是每個的頻率，但基本上只有「A」，「B」，「C」和「D」的0或1值。如果這4個值的總和大於3，那麼我想將該行的索引分配給一個新的向量，該向量將用於從數據幀中刪除這些行。

myfun (x){ 
     #which rows contain > 2 different letters of A, B, C, or D. 
     #The number of times each letter occurs in a given row does not matter. 
     #What matters is if each row contains more than 2 of the 4 letters. Each row should only contain 2 of them. The combination does not matter. 

     out = which(something > 2) 
    } 

    row.indexes = apply(df.sample,1,function(x) myfun(x)) #Return a vector of row indexes that contain more than 2 of the 4 letters. 

    new.df.sample = df.sample[-row.indexes,] #create new data frame excluding rows containing more than 2 of the 4 letters.

在df.sample以上，2和3行包含多於2那些4個字母的並且因此應該被索引以便除去。通過函數運行df.sample和row.indexes刪除行後，我new.df.sample數據幀應該是這樣的：

 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 
    1 A B A A/B B B B B  B 
    4 A B A A B B B B B B

我試圖認爲這是對每個邏輯語句4個字母，然後分配一個0或1的每個字母，總結它們，然後確定哪些總和大於2.例如，我想也許我可以嘗試'grep（）'並將其轉換爲邏輯每個字母然後被轉換爲0或1並且相加。這似乎太冗長了，並沒有用我試過的方式工作。有任何想法嗎？

來源

2014-01-22 SC2

如何處理'A/B'？ –

對於A/B，忽略它是「A/B」，並且只檢查該值是否包含A，B，C或D.單元格內的值不必完美匹配，但僅限於包含我正在尋找的價值。例如，如果第1行中的A/B實際上是A/C，則該行將被索引以進行刪除，但因爲它是A/B，所以它保持不變。 – SC2

這是這項任務的一個功能。該函數返回一個邏輯值。 TRUE表示具有兩個以上不同字符串的行：

myfun <- function(x) { 
    sp <- unlist(strsplit(x, "/")) 
    length(unique(sp[sp %in% c("A", "B", "C", "D")])) > 2 
} 

row.indexes <- apply(df.sample, 1, myfun) 
# [1] FALSE TRUE TRUE FALSE 

new.df.sample <- df.sample[!row.indexes, ] # negate the index with '!' 

# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 
# 1 A B A A/B B B B B  B 
# 4 A B A A B B B B B B

來源

2014-01-22 15:25:01

看，我知道它要簡單得多。完美，謝謝！ – SC2

@ SC2我更新了功能。現在，它也適用於'A/B'的情況。 –

刪除其列中的值包含2個以上4個唯一字符的行

回答

相關問題