有沒有辦法在R中識別無效的列名?也許使用正則表達式或其他技術。R:識別並刪除列名無效的列
我正在從文本列生成DocumentTermMatrix(DTM),然後將此DTM轉換爲數據框。我結束了名字無效的列。例如
「節點」, 「CLASS」 「️️️️」 「️️️」, 「德」, 「德」 「濟devais」 「5章夜」, 「her眼睛」, 「註冊會計師修德」,「鬱鬱蔥蔥cosmétiques」, 「香港專業教育學院看到」
當我通過該數據集到MLR :: makeClassificationTask時,得到下面的錯誤消息
錯誤makeClassifTask(數據= DAT,目標= 「CLASS」 ): 'data'聲明失敗:列必須根據R的變量命名規則命名。
因此,我想確定並刪除所有具有無效名稱的列。像
invalidColumnNames <- identify indexes of columns with invalid names
dat <- dat[,-invalidColumnNames]
數據重複的例子:
cols <- c("node", "CLASS", "️️️️", "️️️", " de", " des",
" kmh", " points", " zéro", "\u2615️\u2615️", "\u2615️",
"\u2693️\u2693️", "\u26f5️\u2693️", "\u2728\u2728\u2728\u2728\u2728",
"aaliassime", "aaron", "abaixoassinado", "abandono", "abat",
"abattu", "abiertamente", "abierto", "abit", "able", "abomination",
"abonnements", "abonnés", "abonnez", "abraham", "absolutely",
"abstract", "abused", "acaba", "acabar", "acabo", "acadiebathurst",
"acaï", "acc", "accept", "accèsloisirs", "access", "accessible",
"accessories", "accident", "accidentally", "acción", "acciones",
"accommodationsreligious", "accompli", "accomplie", "accomplir",
"accorde", "accordent", "account", "accounts", "accro", "accueil",
"accueille", "accueillir", "accurate", "accusé", "accusent",
"acérées", "acériculteur", "acha", "achat", "achei", "acheté",
"acheter", "acho", "acidités", "acknowledge", "acontecem", "acordei",
"acquis", "across", "action", "activité", "activités", "actresses",
"actualité", "actuel", "adam", "adaptation", "adapter", "added",
"addicive", "addicted", "addition", "additives", "addressed",
"adds", "adeus", "adjoint", "adjointeadministrative", "adjust",
"administratives", "adopción", "adopté", "adorable")
期望的結果:
"node", "CLASS", " de", " des",
" kmh", " points", " zéro", "aaliassime", "aaron",
"abaixoassinado", "abandono", "abat",
"abattu", "abiertamente", "abierto", "abit", "able", "abomination",
"abonnements", "abonnés", "abonnez", "abraham", "absolutely",
"abstract", "abused", "acaba", "acabar", "acabo", "acadiebathurst",
"acaï", "acc", "accept", "accèsloisirs", "access", "accessible",
"accessories", "accident", "accidentally", "acción", "acciones",
"accommodationsreligious", "accompli", "accomplie", "accomplir",
"accorde", "accordent", "account", "accounts", "accro", "accueil",
"accueille", "accueillir", "accurate", "accusé", "accusent",
"acérées", "acériculteur", "acha", "achat", "achei", "acheté",
"acheter", "acho", "acidités", "acknowledge", "acontecem", "acordei",
"acquis", "across", "action", "activité", "activités", "actresses",
"actualité", "actuel", "adam", "adaptation", "adapter", "added",
"addicive", "addicted", "addition", "additives", "addressed",
"adds", "adeus", "adjoint", "adjointeadministrative", "adjust",
"administratives", "adopción", "adopté", "adorable"
任何幫助是極大的讚賞。
您的列名似乎對我有用。請參閱此處查看R中對變量命名的限制:https://stackoverflow.com/questions/9195718/variable-name-restrictions-in-r –