R aggregate function unexpected NA - r

When I use aggregate function on a data.frame which contains character and numeric columns, aggregate fails and returns only NAs for all. How can I solve this? My first idea was to check for value class but it did not work.
name <- rep(LETTERS[1:5],each=2)
feat <- paste0("Feat",name)
valuesA <- runif(10)*10
valuesB <- runif(10)*10
daf <- data.frame(ID=name,feature=feat,valueA=valuesA,valueB=valuesB, stringsAsFactors = FALSE)
aggregate(.~ID, data=daf,FUN=mean)
aggregate(.~ID, data=daf,FUN=function(x){
if(is.character(x)){
return(NA)
}else{ return(mean(x))}
})

Related

looping over variables of a data.frame leading one final data.frame in R

I have written a function to change any one variable (i.e., column) in a data.frame to its unique levels and return the changed data.frame.
I wonder how to change multiple variables at once using my function and get one final data.frame with all the changes?
I have tried the following, but this gives multiple data.frames while only the last data.frame is the desired output:
data <- data.frame(sid = c(33,33, 41), pid = c('Bob', 'Bob', 'Jim'))
#== My function for ONE variable:
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
return(data)
}
# Looping over `what`:
what <- c('sid', 'pid')
lapply(seq_along(what), function(i) f(data, what[i]))
In the function, we could change to return the data[[what]]
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
data[[what]]
}
data[what] <- lapply(seq_along(what), function(i) f(data, what[i]))
Or do
data[what] <- lapply(what, function(x) f(data, x))
Or simply
data[what] <- lapply(what, f, data = data)

Different ways of selecting columns inside function resulting in different results, why?

I have written a short function to clean some dataframes that I have in a list. When selecting columns using the df[,1] method, my function doesn't work. However when I select using df$Column it does. Why is this?
columns_1 <- function(x) {
x[,1] <- dmy_hm(x[,1])
x[,2] <- NULL
x[,3] <- as.numeric(x[,3])
x[,4] <- NULL
return(x)
}
MS_ <- lapply(MS_, columns_1)
columns_2 <- function(x) {
x$DateTime <- dmy_hm(x$DateTime)
x$LogSeconds <- NULL
x$Pressure <- as.numeric(x$Pressure)
x$Temperature <- NULL
return(x)
}
MS_ <- lapply(MS_, columns_2)
The function columns_2 produces the desired results (all dataframes in list are cleaned). columns_1 returns the error message:
Error in FUN(X[[i]], ...) :
(list) object cannot be coerced to type 'double'
In addition: Warning message:
All formats failed to parse. No formats found.
The issue would be that the assignment was carried out after the first run and here some columns were lost.
library(lubridate)
MS_ <- lapply(MS_, columns_1)
Instead, it can be done by assigning to a different object
MS2_ <- lapply(MS_, columns_1)
data
set.seed(24)
df1 <- data.frame(DateTime = format(Sys.Date() + 1:5, "%d-%m-%Y %H:%M"),
LogSeconds = 1:5,
Pressure = rnorm(5), Temperature = rnorm(5, 25),
stringsAsFactors = FALSE)
MS_ <- list(df1, df1)

Calculate the mode of all non-numeric columns in a dataframe

I would like to calculate the mode of each column from a dataframe. I have found similar posts on how to determine the mode of a vector of rows in a dataframe (but most have been with numeric data).
df <- data.frame(c("A","B","C","C"), c("A","A","B","C"),c("A","B","B","C"))
colnames(df) <- c("V1","V2","V3")
rownames(df) <- c(1,2,3,4)
df
I am using the following function:
modefunc <- function(x){
tabresult <- tabulate(x)
themode <- which(tabresult == max(tabresult))
if(sum(tabresult == max(tabresult))>1) themode <- NA
return(themode)
}
mode.vector <- apply(df, 1, modefunc)
Since my dataframe is not numeric, I unfortunately get the following error:
Error in tabulate(x) : 'bin' must be numeric or a factor
Any assistance with this would be helpful. Thanks in advance.

Apply a user defined function to a list of data frames

I have a series of data frames structured similarly to this:
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21))
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60))
In order to clean them I wrote a user defined function with a set of cleaning steps:
clean <- function(df){
colnames(df) <- df[2,]
df <- df[grep('^[0-9]{4}', df$year),]
return(df)
}
I'd now like to put my data frames in a list:
df_list <- list(df,df2)
and clean them all at once. I tried
lapply(df_list, clean)
and
for(df in df_list){
clean(df)
}
But with both methods I get the error:
Error in df[2, ] : incorrect number of dimensions
What's causing this error and how can I fix it? Is my approach to this problem wrong?
You are close, but there is one problem in code. Since you have text in your dataframe's columns, the columns are created as factors and not characters. Thus your column naming does not provide the expected result.
#need to specify strings to factors as false
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21), stringsAsFactors = FALSE)
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60), stringsAsFactors = FALSE)
clean <- function(df){
colnames(df) <- df[2,]
#need to specify the column to select the rows
df <- df[grep('^[0-9]{4}', df$year),]
#convert the columns to numeric values
df[, 1:ncol(df)] <- apply(df[, 1:ncol(df)], 2, as.numeric)
return(df)
}
df_list <- list(df,df2)
lapply(df_list, clean)

Call an argument name within a function in R

I would like to create a generic function naTrans that replaces 'NA' and '' by NA.
The problem is that I can't replace the dataframe testin the global environment by the modified test dataframe (mydf) created within the function. Here's my best try.
# Example dataframe containing 'NA'
test <- as.data.frame(matrix(sample(c('NA', 1:9), 10*10, TRUE), 10))
# My function
naTrans <- function (mydf) {
mydf[mydf == 'NA' | mydf ==''] <- NA
assign(deparse(substitute(mydf))[1], mydf, envir = globalenv())
}
test <- naTrans(test)
any(is.na(test))
# [1] FALSE
Surely the problem lies in the last line of code assign(print(deparse(substitute(mydf))), mydf, envir = globalenv())
Any idea?
I hope the comments in the code are clear enough
test <- as.data.frame(matrix(sample(c('NA',1:9),10*10,T),10))
naTrans <- function (mydf) {
mydf[mydf == 'NA' | mydf == ''] <- NA # use and or opertor, %in% don't work on DF but on vectors
return(mydf) # return the modified mydf (the return is optionnal, you may just use mydf here
}
test <- naTrans(test) # replace actual object by caller.

Resources