r: convert multiple factors to numeric simultaneously - r

I know how to convert one factor of a dataframe to numeric:
rds$fcv12afa3num <- as.numeric(levels(rds$fcv12afa3))[rds$fcv12afa3]
My two questions:
But how can I convert all dataframe-columns simultaneously, if the df consists only of factors?
How can I convert several factors simultaneously, based on a pattern of the column name?
I have many NA's, if that matters.
Thanks for your answer, Christian

Without example data, I can't give a completely exact answer, but this should get you started.
factorVars <- names(YourData)[vapply(YourData, is.factor, logical(1))]
YourData[, factorVars] <- lapply(YourData[, factorVars, drop = FALSE],
as.numeric)
Some notes:
Use drop = FALSE to handle the case of there only being one factor in your data frame.
If all of the factors are data frames, you may get a list object in return. You'd have to run that list through as.data.frame to get your data frame back.

Related

How to convert all factor variables into numeric variables (in multiple data frames at once)?

I have n data frames, each corresponding to data from a city.
There are 3 variables per data frame and currently they are all factor variables.
I want to transform all of them into numeric variables.
I have started by creating a vector with the names of all the data frames in order to use in a for loop.
cities <- as.vector(objects())
for ( i in cities){
i <- as.data.frame(lapply(i, function(x) as.numeric(levels(x))[x]))
}
Although the code runs and there I get no error code, I don't see any changes to my data frames as all three variables remain factor variables.
The strangest thing is that when doing them one by one (as below) it works:
df <- as.data.frame(lapply(df, function(x) as.numeric(levels(x))[x]))
What you're essentially trying to do is modify the type of the field if it is a factor (to a numeric type). One approach using purrr would be:
library(purrr)
map(cities, ~ modify_if(., is.factor, as.numeric))
Note that modify() in itself is like lapply() but it doesn't change the underlying data structure of the objects you are modifying (in this case, dataframes). modify_if() simply takes a predicate as an additional argument.
for anyone who's interested in my question, I worked out the answer:
for ( i in cities){
assign(i, as.data.frame(lapply(get(i), function(x) as.numeric(levels(x))[x])))
}

Coercing multiple time-series columns to factors in large dataframe

I would like to know if there is an "easy/quick" way to convert character variables to factor.
I am aware, that one could make a vector with the column names and then use lapply. However, I am working with a large data frame with more than 200 variables, so it would be preferable not having to write the 200+ names in the vector.
I am also aware that I can coerce the entire data frame by using lapply, type.convert and sapply, but as I am working with time series data where some is categorical, and some is numerical, I am not interested in that either.
Is there any way to use the column number in this? I.e. [ ,2:200]? I tried the following, but without any luck:
df[ ,2:30] <- lapply(df[ ,2:30], type.convert)
sapply(df, factor)
With the solution above, I would still have to do multiple of them, but it would still be quicker than writing all the variable names.
I also have a feeling a loop might be usable here, but I would not be sure of how to write it out, or if it is even a way to do it.
df[ ,2:30] <- lapply(df[ ,2:30], as.factor)
As you write, that you need to convert (all?) character variables to factors, you could use mutate_if from dplyr
library(dplyr)
mutate_if(df, is.character, as.factor)
With this you only operate on columns for which is.character returns TRUE, so you don't need to worry about the column positions or names.

Bind rows with different data types

I have some dataframes with more than 3000 columns in each, and I want to bind them together.
When I use
library(dplyr)
bind_rows(dataframe1, dataframe2, dataframe3, dataframe4)
I get a lot of warnings:
In bind_rows_(x, .id) : Unequal factor levels: coercing to character
...
I guess it's because a column has data of type factor in one dataframe and data of type character in another dataframe. But how can I solve this problem?
I know I can use
sapply(dataframe1, class)
to get the classes of a dataframe, but as there are many columns, it is impossible to go through them all in all 4 dataframes.
This seems to be a problem about the data, but what does it mean that something has type factor? Is it a number?
Perhaps start with ?factor about what factors are.
To avoid the warnings, you either use supressWarnings, or you will need to convert to character first. For example (untested):
library(tidyverse)
l <- list(dataframe1, dataframe2, dataframe3, dataframe4)
map_dfr(l, ~mutate(., across(where(is.factor), as.character))

Identifying character variables and changing them to numeric in R

I have a dataset with nearly 30,000 rows and 1935 variables(columns). Among these many are character variables (around 350). Now I can change data type of an individual column using as.numeric on it, but it is painful to search for columns which are character type and then apply this individually on them. I have tried writing a function using a loop but since the data size is huge, laptop is crashing.
Please help.
Something like
take <- sapply(data, is.numeric)
which(take == FALSE)
identify which variables are numeric, but I don't know how extract automatically, so
apply(data[, c(putcolumnsnumbershere)], 1, as.character))
use
sapply(your.data, typeof)
to create a vector of variable types, then use this vector to identify the character vector columns to be converted.

Applying a function to a dataframe to trim empty columns within a list environment R

I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac
Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])

Resources