Apply Function to Specific Column in R List - r

I have seen many questions pretty similar to mine, but none of the answers I've seen have actually solved what I'm trying to do. I have a list of data frames, and I'm trying to apply the digest() function to the same column in each data frame in my list. A couple of the answers I've seen on SO to this have been:
dflist <- list(data.frame(number = 1:10, name = 1:10),
data.frame(number = 2:15, name = 1:14))
dflist <- lapply(dflist, function(x){
x$name <- digest(x$name, algo = "sha256")
return(x)
})
#OR this
dflist <- lapply(dflist, function(x) {
x %>% mutate_each(funs(digest(.,algo = "sha256")), "name")
})
Both of these give the same output - which is simply every row in the name column having the same exact value. The digest() function works but only returns the value of the first row, in every row.
I've also tried:
dflist <- lapply(dflist, function(x) {
digest(x[,"name"], algo = "sha256")
})
But this just returns only the first value from each data frame in the list.
Any advice would be much appreciated!

The digest is not vectorized
dflist1 <- lapply(dflist, function(x) {
x$name <- Vectorize(digest::digest)(x$name, algo = "sha256")
x
})
Or use it in transform
dflist1 <- lapply(dflist, transform, name = Vectorize(digest::digest)(name))

Related

looping over variables of a data.frame leading one final data.frame in R

I have written a function to change any one variable (i.e., column) in a data.frame to its unique levels and return the changed data.frame.
I wonder how to change multiple variables at once using my function and get one final data.frame with all the changes?
I have tried the following, but this gives multiple data.frames while only the last data.frame is the desired output:
data <- data.frame(sid = c(33,33, 41), pid = c('Bob', 'Bob', 'Jim'))
#== My function for ONE variable:
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
return(data)
}
# Looping over `what`:
what <- c('sid', 'pid')
lapply(seq_along(what), function(i) f(data, what[i]))
In the function, we could change to return the data[[what]]
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
data[[what]]
}
data[what] <- lapply(seq_along(what), function(i) f(data, what[i]))
Or do
data[what] <- lapply(what, function(x) f(data, x))
Or simply
data[what] <- lapply(what, f, data = data)

how to use lappy for string replacement

I am trying to use lapply to replace the elements of a string in several data.frames contained in a list. When I attempt to do this, the whole data.frame is replaced, rather than the string contained in the data.frame.
A reproducible example below:
a <- list( a = data.frame(Date = c("1900-08-31"), Val = 1000),
b = data.frame(Date = c("1900-08-31"), Val = 1000) )
lapply(a, function(x){
gsub(".{2}$","01",x$Date)
})
What I would expect to happen is the elements of a$Date and b$Date get replaced with '1900-08-01'. But what happens is a and b get replaced with "1900-08-01"
Your lapply function is returning a vector with the replacement instead of a and b with Date modified. Try this:
lapply(a, function(x){
x$Date <- gsub(".{2}$","01",x$Date)
return(x)
})

Apply a user defined function to a list of data frames

I have a series of data frames structured similarly to this:
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21))
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60))
In order to clean them I wrote a user defined function with a set of cleaning steps:
clean <- function(df){
colnames(df) <- df[2,]
df <- df[grep('^[0-9]{4}', df$year),]
return(df)
}
I'd now like to put my data frames in a list:
df_list <- list(df,df2)
and clean them all at once. I tried
lapply(df_list, clean)
and
for(df in df_list){
clean(df)
}
But with both methods I get the error:
Error in df[2, ] : incorrect number of dimensions
What's causing this error and how can I fix it? Is my approach to this problem wrong?
You are close, but there is one problem in code. Since you have text in your dataframe's columns, the columns are created as factors and not characters. Thus your column naming does not provide the expected result.
#need to specify strings to factors as false
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21), stringsAsFactors = FALSE)
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60), stringsAsFactors = FALSE)
clean <- function(df){
colnames(df) <- df[2,]
#need to specify the column to select the rows
df <- df[grep('^[0-9]{4}', df$year),]
#convert the columns to numeric values
df[, 1:ncol(df)] <- apply(df[, 1:ncol(df)], 2, as.numeric)
return(df)
}
df_list <- list(df,df2)
lapply(df_list, clean)

R df process each column by a different function provided in a list of functions

I guess my problem is very simple, but I could not find the solution in web yet.
I would like to modify a data frame with a set of functions.
The functions are defined in a list. They may have more than one argument, but one arg is always the value found on the related column in a df.
I used build in BOD data set just for convinience. The list could be this:
funs <- list(
fn1 = function(x) x+1,
fn2 = function(x) x-1
)
The function call could look like this:
searchedFunc(BOD, funs)
So after modificatin Time column values are added by 1 and demand column values are subtracted by one.
You can use sapply to be more flexible
funs <- list(
fn1 = function(x) x+1,
fn2 = function(x) x-1
)
searchedFunc <- function(df, fns) {
sapply(seq(along.with=fns), function(i) fns[[i]](df[, i]))
}
searchedFunc(BOD, funs)
Hope it helps,
alex

How to pass variables into split()?

I want to run split() in a for loop, but when I pass it variable text, it just creates a new data.frame containing the text. The idea here is to split CMPD_DF_1, CMPD_DF_2, etc. based on CMPD_DF_1[5], CMPD_DF_2[5], etc. How do I pass in the data.frame and not a string?
for (i in 1:10) {
split(paste("CMPD_DF", i, sep = "_"),
paste(paste("CMPD_DF", i, sep = "_"), "[5]", sep=""))
}
Sorry for the initial confusion. You can put your data frames in a list and then use lapply. This assumes the column you are splitting on is the same in each data frame. I'll update with a more general solution...
d1 <- data.frame(x =1:10, y = rep(letters[1:2], each = 5))
d2 <- d1
l <- list(d1,d2)
myFun <- function(x){
return(split(x,x[,2]))
}
lapply(l,myFun)
And here's a way to do this using mapply that will allow for different splitting columns in each data frame. You just pre-specify the columns in a separate list and pass them to mapply:
l <- list(d1,d2)
splitColumns <- list("y","y")
myFun2 <- function(x,col){
return(split(x,x[,col]))
}
mapply(myFun2,l,splitColumns,SIMPLIFY = FALSE)
Your code doesn't work because you're not passing a data.frame to split. You're passing a character vector that contains a string with the name of your data.frame. Something like this should work, but it's not very R-like. #joran's answer is preferable.
for (i in 1:10) {
dfname <- paste("CMPD_DF", i, sep = "_")
split(get(dfname), get(dfname)[5])
}

Resources