我試圖將兩列字符數據轉換爲因子,因此我可以分析它們的「級別」。無法將字符列轉換爲R中的數據類型因子
問題出在代碼的最後。 兩列之一處理得很好。當我運行「levels」命令時會發現一些字符串。
> levels(austinCrime2014_data_selected_zips$highestOffenseDesc)
[1] "AGG ROBBERY BY ASSAULT" "AGG ROBBERY/DEADLY WEAPON" "BURG NON RESIDENCE SHEDS" "BURGLARY NON RESIDENCE"
[5] "BURGLARY OF RESIDENCE" "ROBBERY BY ASSAULT" "ROBBERY BY THREAT"
當我運行的另一列「級別」,我看到它出現在數據從字符轉換爲有麻煩 - >因素的數據類型。
> levels(austinCrime2014_data_selected_zips$NIBRS_OffenseDesc)
[1] "Burglary/\nBreaking & Entering" "Robbery"
我希望有人能幫助我理解這裏發生了什麼,以及如何糾正它。
這裏是我一起工作的代碼:
library(data.table)
library(readr)
library(dplyr)
####
#### Import 2014 neighborhood economic data
####
# Import data
austin2014_data_raw <- read_csv('https://data.austintexas.gov/resource/hcnj-rei3.csv', na = '')
glimpse(austin2014_data_raw)
nrow(austin2014_data_raw)
# Clean it: Remove NAs
austin2014_data <- na.omit(austin2014_data_raw)
nrow(austin2014_data) # now there's one less row.
columnSelection <- c("Zip Code", "Population below poverty level", "Median household income", "Unemployment", "Median rent", "Percentage of rental units in poor condition")
## Our neighborhood economic data subset
austin2014_data_selection <- subset(austin2014_data, select=columnSelection)
names(austin2014_data_selection)
# Extract the zip codes for mapping & comparison with crime data
zipCodesOfData <- austin2014_data_selection$`Zip Code`
####
#### Import crime data
####
# Import data
austinCrime2014_data_raw <- read_csv('https://data.austintexas.gov/resource/7g8v-xxja.csv', na = '')
glimpse(austinCrime2014_data_raw)
nrow(austinCrime2014_data_raw)
# Select and rename required columns
columnSelection_Crime <- c("GO Location Zip", "GO Highest Offense Desc", "Highest NIBRS/UCR Offense Description")
austinCrime_dataset <- select(austinCrime2014_data_raw, one_of(columnSelection_Crime))
names(austinCrime_dataset) <- c("zipcode", "highestOffenseDesc", "NIBRS_OffenseDesc")
glimpse(austinCrime_dataset)
nrow(austinCrime_dataset)
# Filter crime data by zipcodes available in the neighborhood economic data subset
austinCrime2014_data_selected_zips <- filter(austinCrime_dataset, zipcode %in% zipCodesOfData)
glimpse(austinCrime2014_data_selected_zips)
nrow(austinCrime2014_data_selected_zips)
typeof(austinCrime2014_data_selected_zips)
####
#### Convert our crime data subset from string/char data into factorized data so we can see levels
####
# let's make the character data columns c("highestOffenseDesc", "NIBRS_OffenseDesc") into factors so we can check its levels
glimpse(austinCrime2014_data_selected_zips) # characters
cols <- c("highestOffenseDesc", "NIBRS_OffenseDesc") # columns with character datatype to convert to factor datatype
austinCrime2014_data_selected_zips[cols] <- lapply(austinCrime2014_data_selected_zips[cols], factor)
glimpse(austinCrime2014_data_selected_zips) # factors
View(austinCrime2014_data_selected_zips)
levels(austinCrime2014_data_selected_zips$highestOffenseDesc) #--> looks good
levels(austinCrime2014_data_selected_zips$NIBRS_OffenseDesc) # output is weird: "Burglary/\nBreaking & Entering" "Robbery"
的問題是,你需要做的字符數據的更清潔和擺脫的\ n。 – Elin