R Apply function on data frame columns - r

I have a function in R to turn factors to numeric:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
and I have a dataframe that consists of both factors, numeric and other types of data.
I want to apply the functions above at once on the whole dataframe to turn all factors to numeric types columns.
Any idea ?
thanks

You could check whether the column is factor or not by is.factor and sapply. Use that as an index to filter out those columns and convert the columns to "numeric" by as.numeric.factor function in a lapply loop.
indx <- sapply(dat, is.factor)
dat[indx] <- lapply(dat[indx], as.numeric.factor)
You could also apply the function without subsetting (but applying it on a subset would be faster)
To prevent the columns to be converted to "factor", you could specify stringsAsFactors=FALSE argument or colClasses argument within the read.table/read.csv I would imagine the columns to have atleast a single non-numeric component which automatically convert this to factor while reading the dataset.

One option would be:
dat[] <- lapply(dat, function(x) if(is.factor(x)) as.numeric(levels(x))[x] else x)

Related

Convert negative matrix values to NA in a loop

I have four matrices which contain positive and negative values. Now I would like to convert all negative values for each matrix to NA. The matrices are called Main_mean, Inn_mean, Isar_mean and Danube_mean.
For a single matrix this would be quite easy:
Main_mean[Main_mean<=0] <- NA.
But how should it look like in a loop?
Get the matrix in a list and apply the function to each one using lapply :
list_obj <- mget(ls(pattern = '_mean$'))
#Or make a list individually
#list_obj <- mget(c('Main_mean', 'Danube_mean', 'Inn_mean', 'Isar_mean'))
result <- lapply(list_obj, function(x) {x[x<=0] <- NA;x})
To replace the original objects you can use list2env.
list2env(result, .GlobalEnv)

Selecting Columns in R using list and names

I found this question which was helpful in subsetting a DF for numeric columns only.
Selecting only numeric columns from a data frame
However I can't figure out how to do numeric columns PLUS any other columns.
I've tried:
nums <- sapply(df, is.numeric)
df <- df[, c(nums, "charcolumn")]
and:
df <- df[,c(sapply(df, is.numeric), "Pop_Size_Group")]
both of which gave me an "undefined columns selected error"
I understand that the sapply function gives me a list of TRUE/FALSE. How can I subset my df to include all numeric columns PLUS additional columns I identify?
Maybe selecting names and concatenating them with "Pop_Size_Group"
df <- df[,c(names(df)[sapply(df, is.numeric)], "Pop_Size_Group")]

r:coerce only character variables to factor while leaving other classes

I have a large data frame with over 40 variables of different classes. About half of the variables are characters, however, I would like to coerce those variables to factor while leaving the integers, logicals, etc. as is.
I have tried using a an lapply function like the one below, but it coerces all variables instead of just the characters:
aframe2 <- as.data.frame(lapply(aframe1, factor))
I have also tried as.data.frame(aframe1, stringsAsFactors=TRUE) with no success. Is there something I am doing wrong or some other function I can use to do this?
This could be solved by using a if/else statement
aframe1[] <- lapply(aframe1, function(x) if(is.character(x)) factor(x) else x)
or create an index for factor columns and loop only on those columns
i1 <- sapply(aframe1, is.character)
aframe1[i1] <- lapply(aframe1[i1], factor)

convert factor and character to numeric in a dataframe

I have a dataframe that I am trying to filter. Here is the structure:
'dataframe': 45 obs. of 1450 variables:
$ X01493112 :Factor w/ 47 levels "01493112", "0145769",...
..- attr(*, "names")= chr "510130020" "510360002"
I have a feeling I can't filter it because I have factors and characters but I cannot convert it to numeric. I have tried:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
df2 <- as.numeric.factor(df1)
and numerous other conversions but I can't figure out why it won't work, when I call the new df I get
>numeric(0)
It would help to have some example data to work with, but try:
df$your_factor_variable_now_numeric <-
as.numeric(as.character(df$your_old_factor_variable))
And use it only to convert a factor variable, not the complete dataframe. You can also have a look at type.convert. If you want to convert all factors in the dataframe, you can use something along the lines
df[] <- lapply(df, function(x) as.numeric(as.character(x)))
Note that this converts all factors and might not be what you want if you have factors that do not represent numeric values. If unnecessary conversion is a problem, or if there are non-numeric factors or characters in the data, the following would be appropriate:
numerify <- function(x) if(is.factor(x)) as.numeric(as.character(x)) else x
df[] <- lapply(df, numerify)
On a more general point though, the type of your variables should not prevent you from filtering, if, with filtering, you mean subsetting the dataframe. However, the type conversion should be solved with the above code.
fun1 <- function(x) as.numeric(as.character(x))
fun2 <- function(x) as.numeric(x)
fac_to_num <- function(y) modifyList(y,lapply(y[sapply(y,is.factor)],fun1))
char_to_num <- function(y) modifyList(y,lapply(y[sapply(y,is.factor)],fun2))
Apply fac_to_num to the columns in your data for factor -> numeric conversion, char_to_num for character to numeric conversion.

Calculate diff in data.frame

I'm trying to calculate the returns from a data.frame of prices.
diff((na.locf(precos_mes))
Some of the columns have NAs as values, so to remove them I use locf function, but when I apply diff over it, it returns the following error:
(list) object cannot be coerced to type 'double'
And when I try to unlist it, I lose all the information from each stock vector.
diff(as.numeric(unlist(na.locf(prices))))
Try
lapply(precos_mes, function(x) diff(na.locf(x)))
Or if you don't need to remove the NA values at the beginning
sapply(precos_mes, function(x) diff(na.locf(x, na.rm=FALSE)))
data
set.seed(24)
precos_mes <- as.data.frame(matrix(sample(c(NA,0:4), 20*5,
replace=TRUE), ncol=5))

Resources