Converting factors to numeric in R - r

I have 100s of columns in my database as factors. They actually contains numbers, but R considers them as factors. For my project requirement, I want to convert them to numeric.
I can do that in bulk using sapply / for loop. However i am not sure how to check that variable contains numbers? I cannot just check is.factor(var_name) as the data base also contains character variables which are considered as factors.
is there some other way to execute the below check:
if (is.numeric(var_name)) {
convert the variable to numeric
}
I am looking for something similar to "stringasfactors= FALSE"
which is used for retaining character variable as a character variable instead of converting to factors.
Any help/pointer would be really helpful.

One way would be to use type.convert after converting all the columns to character
df1[] <- lapply(df1, function(x) type.convert(as.character(x)))
Now, the non-numeric character columns will be converted to factor class. We can reconvert those columns back to character
df1[] <- lapply(df1, function(x) if(is.factor(x)) as.character(x) else x)

Related

Coercing multiple time-series columns to factors in large dataframe

I would like to know if there is an "easy/quick" way to convert character variables to factor.
I am aware, that one could make a vector with the column names and then use lapply. However, I am working with a large data frame with more than 200 variables, so it would be preferable not having to write the 200+ names in the vector.
I am also aware that I can coerce the entire data frame by using lapply, type.convert and sapply, but as I am working with time series data where some is categorical, and some is numerical, I am not interested in that either.
Is there any way to use the column number in this? I.e. [ ,2:200]? I tried the following, but without any luck:
df[ ,2:30] <- lapply(df[ ,2:30], type.convert)
sapply(df, factor)
With the solution above, I would still have to do multiple of them, but it would still be quicker than writing all the variable names.
I also have a feeling a loop might be usable here, but I would not be sure of how to write it out, or if it is even a way to do it.
df[ ,2:30] <- lapply(df[ ,2:30], as.factor)
As you write, that you need to convert (all?) character variables to factors, you could use mutate_if from dplyr
library(dplyr)
mutate_if(df, is.character, as.factor)
With this you only operate on columns for which is.character returns TRUE, so you don't need to worry about the column positions or names.

R call multiple columns' elements with $ operator

Is there something in R to call like df$col1:df$col5?
I would like to convert the character elements to numeric with as.numeric, so I would like to do something like as.numeric(df$col1:df$col5) to convert all elements in these columns to numeric.
df = mtcars
If you want to access multiple columns by column number
lapply(df[,c(1:3,5)], as.numeric) #Or as.character if you want
If you want to access by colnames
lapply(df[,c('mpg','cyl')], as.numeric)
You can use a numeric index to get a range of columns, as suggested in the comments.
But if you the columns are not in order you can construct a vector of names, and use that (rather than write the names explicitly, as in the other answer)
my_cols <- paste0('col', 1:5)
my_df[, my_cols] <- lapply(my_df[, my_cols], as.numeric)

In R: Type conversion of data frames with mixed data types

I generally like R, but the type conversion issues are driving me crazy.
Following issue:
I read a data frame from a database connection. The result is a data frame with character columns.
I know that the first column is a date format - all the others are numeric. However, no matter how I tried to convert the character columns of the data frame into the correct types, it didn't work out.
Upon conversion of the data frame into a matrix and then back into a data frame, all columns became type factor - and casting factors into numerics created wrong results cause the indices of the factor levels were converted instead of the real values.
Moreover, if the table is big in size - I do not want to convert each column manually. Isn't there a way to get this done automatically?
We can use type.convert by looping over the columns of the dataset with lapply. Convert the columns to character and apply the type.convert. If it is is a character class, it will convert to factor which we can reconvert it to Date class (as there is only a single column with character class. It is not sure about the format of the 'Date' class, so in case it is a different format, specify the format argument in as.Date).
df1[] <- lapply(df1, function(x) {x1 <- type.convert(as.character(x))
if(is.factor(x1))
as.Date(x1) else x1})

Identifying character variables and changing them to numeric in R

I have a dataset with nearly 30,000 rows and 1935 variables(columns). Among these many are character variables (around 350). Now I can change data type of an individual column using as.numeric on it, but it is painful to search for columns which are character type and then apply this individually on them. I have tried writing a function using a loop but since the data size is huge, laptop is crashing.
Please help.
Something like
take <- sapply(data, is.numeric)
which(take == FALSE)
identify which variables are numeric, but I don't know how extract automatically, so
apply(data[, c(putcolumnsnumbershere)], 1, as.character))
use
sapply(your.data, typeof)
to create a vector of variable types, then use this vector to identify the character vector columns to be converted.

Prevent R from coercing non-numeric strings to "NA" when using "as.numeric"

I want to convert a column of numbers to numeric, but there are certain cells that say "New" and "Gone", which I want to retain as characters.
If I use as.numeric(df$col1), the numbers are converted to numeric, but the words are coerced into "NA" values.
Is there any way that I could convert all the numbers to numeric while preventing this coercion?
You can't do it with a vector because vectors can only contain a single type. However, you could do it with a list.
Data <- data.frame(col1=c("1","2","New","3","Gone"), stringsAsFactors=FALSE)
List <- lapply(as.list(Data$col1), type.convert, as.is=TRUE)
A column of a data.frame will always be all of the same type. So you cannot have the string "New" and the number 5 in the same column.
However, an example to get you on your way:
x <- c('New', 1, 'Gone', 2)
ifelse(is.na(as.numeric(x)), x, as.numeric(x))
Depending on what you're doing this can be extended to apply to your specific case.
Per Joshua's comment, you can use functions in the ifelse statement:
ifelse(is.na(as.numeric(x)), sprintf('its a string %s', x), sprintf('its a number %f', as.numeric(x)))
However, the usual technique for dealing with this situation is as Joshua outlined in his answer.

Resources