2014-02-28 77 views
1

我有這樣如何刪除R中dataframe列中字符串的所有NAs?

LocationList,Identity,Category 
"New York,New York,United States","42","S" 
"NA,California,United States","89","lyt" 
"Hartford,Connecticut,United States","879","polo" 
"San Diego,California,United States","45454","utyr" 
"Seattle,Washington,United States","uytr","69" 
"NA,NA,United States","87","tree" 

一個CSV文件,我想從 'LocationList' 列中刪除所有 'NA'

期望的結果 -

LocationList,Identity,Category 
"New York,New York,United States","42","S" 
"California,United States","89","lyt" 
"Hartford,Connecticut,United States","879","polo" 
"San Diego,California,United States","45454","utyr" 
"Seattle,Washington,United States","uytr","69" 
"United States","87","tree" 

列數不固定和可能增加或減少。此外,我想寫入不帶引號的CSV文件,並且不會爲'LocationList'列進行轉義。

如何在R中實現以下內容? 新的R任何幫助表示讚賞。

+0

刪除NA,因爲您暗示使標題信息錯誤。你想讓NA被黑色或空間所取代?如果你真的想刪除NA,有辦法做,但我想知道它的使用後處理。如果這是csv,並且所需的輸出也是csv,那麼不能簡單地使用任何文本處理器來替換'NA,'''「」'(無),並且不加任何引號(「)」 – Ananta

+1

@Ananta格式就像'LocationList'列'NA,NA,United States'一樣,我不知道它是如何使標題信息錯誤的? – user3188390

+0

oops,my bad。然後'df $ LocationList < - gsub(「NA,」,「」 '''''''''''''''''''''''''''''''''''''''''當'gsub(「NA,」,「」,my.data $ LocationList''''write.table'參數'quote = FALSE' – Ananta

回答

1

嘗試:

my.data <- read.table(text='LocationList,Identity,Category 
         "New York,New York,United States","42","S" 
         "NA,California,United States","89","lyt" 
         "Hartford,Connecticut,United States","879","polo" 
         "San Diego,California,United States","45454","utyr" 
         "Seattle,Washington,United States","uytr","69" 
         "NA,NA,United States","87","tree"', header=T, sep=",") 
my.data$LocationList <- gsub("NA,", "", my.data$LocationList) 
my.data 
#       LocationList Identity Category 
# 1 New York,New York,United States  42  S 
# 2   California,United States  89  lyt 
# 3 Hartford,Connecticut,United States  879  polo 
# 4 San Diego,California,United States 45454  utyr 
# 5 Seattle,Washington,United States  uytr  69 
# 6      United States  87  tree 

如果去掉引號的,當你寫一個傳統的CSV文件,您將無法讀取數據在以後。這是因爲您在LocationList變量中的每個值中都有逗號,所以您可以在字段中間逗號並標記字段之間的中斷。您可以嘗試使用write.csv2(),它會用分號;指示新字段。你可以使用:

write.csv2(my.data, file="myFile.csv", quote=FALSE, row.names=FALSE) 

其產生下列文件:

LocationList;Identity;Category 
New York,New York,United States;42;S 
California,United States;89;lyt 
Hartford,Connecticut,United States;879;polo 
San Diego,California,United States;45454;utyr 
Seattle,Washington,United States;uytr;69 
United States;87;tree 

現在我注意到了5IdentityCategory值大概是搞砸了,你可能想在寫入文件之前切換這些文件,

x    <- my.data[5, 2] 
my.data[5, 2] <- my.data[5, 3] 
my.data[5, 2] <- x 
rm(x) 
+0

' )'? – Ananta

+0

好點,@Ananta。我改變了這個,我想我默認是'lapply()' – gung

+0

謝謝你的回答,它給了我一個好主意,仍然學習R,如果有人回答你的問題。 – user3188390

2

在這種情況下,您只需將NA,替換爲無。但是,這不是刪除NA值的標準方法。

假設dat是您的數據,使用

dat$LocationList <- gsub("^(NA,)+", "", dat$LocationList) 
相關問題