I have a dataframe that I am trying to filter. Here is the structure:
'dataframe': 45 obs. of 1450 variables:
$ X01493112 :Factor w/ 47 levels "01493112", "0145769",...
..- attr(*, "names")= chr "510130020" "510360002"
I have a feeling I can't filter it because I have factors and characters but I cannot convert it to numeric. I have tried:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
df2 <- as.numeric.factor(df1)
and numerous other conversions but I can't figure out why it won't work, when I call the new df I get
>numeric(0)
It would help to have some example data to work with, but try:
df$your_factor_variable_now_numeric <-
as.numeric(as.character(df$your_old_factor_variable))
And use it only to convert a factor variable, not the complete dataframe. You can also have a look at type.convert. If you want to convert all factors in the dataframe, you can use something along the lines
df[] <- lapply(df, function(x) as.numeric(as.character(x)))
Note that this converts all factors and might not be what you want if you have factors that do not represent numeric values. If unnecessary conversion is a problem, or if there are non-numeric factors or characters in the data, the following would be appropriate:
numerify <- function(x) if(is.factor(x)) as.numeric(as.character(x)) else x
df[] <- lapply(df, numerify)
On a more general point though, the type of your variables should not prevent you from filtering, if, with filtering, you mean subsetting the dataframe. However, the type conversion should be solved with the above code.
fun1 <- function(x) as.numeric(as.character(x))
fun2 <- function(x) as.numeric(x)
fac_to_num <- function(y) modifyList(y,lapply(y[sapply(y,is.factor)],fun1))
char_to_num <- function(y) modifyList(y,lapply(y[sapply(y,is.factor)],fun2))
Apply fac_to_num to the columns in your data for factor -> numeric conversion, char_to_num for character to numeric conversion.
Related
Turn the variable trans to a factor variable, of which unique values are “auto” and “manu” (Hint: use the function substr() to extract substrings in a character vector before converting to a factor vector)
unique(mpg$trans)
mpg$trans <- substr(mpg$trans, 1, 234)
mpg$trans <- factor(mpg$trans, levels = c("auto", "manu"))
str(mpg)
However, trans still doesn't work.
I have around 200 columns in my dataframe.
I am looking to convert the columns that has a data type of char into factors and then to levels or integers.
For example , Man becoming 1.
The below code works manually,
as.factor(df$colName1)
as.integer(df$colName1)
But how can we make that check for all columns using a loop and then convert it ?
Thanks.
df <- apply(df,2,function(x){
if(is.character(x)){
x <- as.factor(x)
levels(x) <- 1:length(levels(x))
return(x)
}
})
## I believe that this should work
With tidyverse, the syntax would be
library(tidyverse)
df %>%
mutate_if(is.character, funs(as.integer(factor(.))))
I have a large data frame with over 40 variables of different classes. About half of the variables are characters, however, I would like to coerce those variables to factor while leaving the integers, logicals, etc. as is.
I have tried using a an lapply function like the one below, but it coerces all variables instead of just the characters:
aframe2 <- as.data.frame(lapply(aframe1, factor))
I have also tried as.data.frame(aframe1, stringsAsFactors=TRUE) with no success. Is there something I am doing wrong or some other function I can use to do this?
This could be solved by using a if/else statement
aframe1[] <- lapply(aframe1, function(x) if(is.character(x)) factor(x) else x)
or create an index for factor columns and loop only on those columns
i1 <- sapply(aframe1, is.character)
aframe1[i1] <- lapply(aframe1[i1], factor)
I've got a frame with a set of different variables - integers, factors, logicals - and I would like to recode all of the "NAs" as a numeric across the whole dataset while preserving the underlying variable class. For example:
frame <- data.frame("x" = rnorm(10), "y" = rep("A", 10))
frame[6,] <- NA
dat <- as.data.frame(apply(frame,2, function(x) ifelse(is.na(x)== TRUE, -9, x) ))
dat
str(dat)
However, here the integers turn into factors; when I include as.numeric(x) in the apply() function, this introduces errors. Thanks for any and all thoughts on how to deal with this.
apply returns a matrix of type character. as.data.frame turns this into factors by default. Instead, you could do
dat <- as.data.frame(lapply(frame, function(x) ifelse(is.na(x), -9, x) ) )
I have a function in R to turn factors to numeric:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
and I have a dataframe that consists of both factors, numeric and other types of data.
I want to apply the functions above at once on the whole dataframe to turn all factors to numeric types columns.
Any idea ?
thanks
You could check whether the column is factor or not by is.factor and sapply. Use that as an index to filter out those columns and convert the columns to "numeric" by as.numeric.factor function in a lapply loop.
indx <- sapply(dat, is.factor)
dat[indx] <- lapply(dat[indx], as.numeric.factor)
You could also apply the function without subsetting (but applying it on a subset would be faster)
To prevent the columns to be converted to "factor", you could specify stringsAsFactors=FALSE argument or colClasses argument within the read.table/read.csv I would imagine the columns to have atleast a single non-numeric component which automatically convert this to factor while reading the dataset.
One option would be:
dat[] <- lapply(dat, function(x) if(is.factor(x)) as.numeric(levels(x))[x] else x)