How to simplify functions in a for loop - r

I have a for loop containing the following:
for (i in 1:100) {
#calculate correlation
correlationList1a[[i]] <- sapply(seq(1,14),
function(x) cor(validationSetsA.list[[i]][,x], medianListA[[i]]))
correlationList2a[[i]] <- sapply(seq(1,14),
function(x) cor(validationSetsA.list[[i]][,x], medianListB[[i]]))
}
How can I simplify this? correlationList1a and correlationList2a are basically doing the same thing the only thing that is different is that correlation1a contains medianListA and correlationList2a contains medianListB.

It looks like this is a case for mapply.
mapply(function(x, y) apply(x[,seq(1,14)], 2, cor, y=y),
x = validationSetsA.list,
y = medianListA,
SIMPLIFY = FALSE)

Related

Applying a Function to a Data Frame : lapply vs traditional way

I have this data frame in R:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
I also have this function:
some_function <- function(x,y) { return(x+y) }
Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:
data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)
This does not work:
Error in `$<-.data.frame`(`*tmp*`, f, value = list()) :
replacement has 0 rows, data has 8281
I know how to do this in a more "clunky and traditional" way:
data_frame$new_column = x + y
But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?
Thank you!
When working within a data.frame you could use apply instead of lapply:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.
lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.
If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)

cbind with loop in R

I'm new with R and I've an issue
So my issue is :
I've multiples tables ex: 10 , also different list from kmeans results related to this tables (10). So I want use cbind in order to add each cluster to its table :
Ex:
NEW_table1<- cbind(table1,kmeans_table1$cluster)
NEW_table2<- cbind(table2,kmeans_table2$cluster)
...
I've tryd with this code but a get an error
for (i in 1:10)
{ assign(paste0("NEW_table", i)<-cbind(as.name(paste0("filter_table",i)),Class=(i$cluster) ))
}
> Error in i$cluster : $ operator is invalid for atomic vectors
Without seeing the data I'll have a guess that this might work:
do.call(cbind, mapply(function(x, y) cbind(x, y), tables, kmeans, simplify=F))
where tables is a list of your tables i.e. list(tables)
and kmeans is a list of your kmeans i.e. list(kmeans)
x = 1:10
x2 = list(x, x, x)
y = 10:1
y2 = list(y, y, y)
do.call(cbind, mapply(function(x, y) cbind(x, y), x2, y2, SIMPLIFY = F))
I guess what you want might be something like below
list2env(setNames(lapply(paste0("table",1:10), function(v) cbind(get(v),get(paste0("kmeans_",v))$cluster)),
paste0("NEW_table",1:10)),
envir = .GlobalEnv)
thank you all, I'v fixed by the following code:
# VAR its a list of distinct values from column in large table
VAR<- unique(table$column)
for(i in VAR){
assign(
paste0("New_table", i),cbind(get(paste0("filter_table",i)),Class=get(i)$cluster)
)
}

Using for() over variables that need to be changed

I'd like to be able use for() loop to automate the same operation that runs over many variables modifying them.
Here's simplest example to could design:
varToChange = list( 1:10, iris$Species[1:10], letters[1:10]) # assume that it has many more than just 3 elements
varToChange
for (i in varToChange ) {
if (is.character(y)) i <- as.integer(as.ordered(i))
if (is.factor(y)) i <- as.integer(i)
}
varToChange # <-- Here I want to see my elements as integers now
Here's actual example that led me to this question - taken from: Best way to plot automatically all data.table columns using ggplot2
In the following function
f <- function(dt, x,y,k) {
if (is.numeric(x)) x <- names(dt)[x]
if (is.numeric(y)) y <- names(dt)[y]
if (is.numeric(k)) k <- names(dt)[k]
ggplot(dt, aes_string(x,y, col=k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2)
instead of brutally repeating the same line many times, as a programmer, I would rather have a loop to repeat this line for me.
Something like this one:
for (i in c(x,y,k)) {
if (is.numeric(i)) i <- names(dt)[i]
}
In C/C++ this would have been done using pointers. In R - is it all possible?
UPDATE: Very nice idea to use Map below. However it does not work for this example
getColName <- function(dt, x) {
if (is.numeric(x)) {
x <- names(dt)[x]
}
x
}
f<- function(dt, x,y,k) {
list(x,y,k) <- Map(getColName, list(x,y,k), dt)
# if (is.numeric(x)) x <- names(dt)[x]
# if (is.numeric(y)) y <- names(dt)[y]
# if (is.numeric(k)) k <- names(dt)[k]
ggplot(dt, aes_string(x,y, col=k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2) # Brrr..
No need for for loop, just Map a function over each of your list items
varToChange = list( 1:10, iris$Species[1:10], letters[1:10])
myfun <- function(y) {
if (is.character(y)) y <- as.integer(as.ordered(y))
if (is.factor(y)) y <- as.integer(y)
y
}
varToChange <- Map(myfun, varToChange)
UPDATE: Map never modifies variables in place, This is simply not done in R. Use the new values returned by Map
f<- function(dt, x, y, k) {
args <- Map(function(x) getColName(dt, x), list(x=x,y=y,k=k))
ggplot(dt, aes_string(args$x,args$y, col=args$k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2)
You have two choices for iteration in R, iterate over variables themselves, or over their indices. I generally recommend iterating over indices. This case illustrates a strong advantage of that because your question is a non-issue if you are using indices.
varToChange = list( 1:10, iris$Species[1:10], letters[1:10])
for (i in seq_along(varToChange)) {
if (is.character(varToChange[[i]])) varToChange[[i]] <- as.integer(as.factor(varToChange[[i]]))
if (is.factor(varToChange[[i]])) varToChange[[i]] <- as.integer(varToChange[[i]])
}
I also replaced as.ordered() with as.factor() - the only difference between an ordered factor and a regular factor are the default contrasts used in modeling. As you are just coercing to integer, it doesn't matter.

Perform Student's t-test between data.frames contained in two lists

I have got two separate lists which contain 4 data.frames each one. I need to perform a Student's t-test (t.test) for rainfall between each data.frames within the two lists.
Here the lists:
lst1 = list(data.frame(rnorm(20), rnorm(20)), data.frame(rnorm(25), rnorm(25)), data.frame(rnorm(16), rnorm(16)), data.frame(rnorm(34), rnorm(34)))
lst1 = lapply(lst1, setNames, c('rainfall', 'snow'))
lst2 = list(data.frame(rnorm(19), rnorm(19)), data.frame(rnorm(38), rnorm(38)), data.frame(rnorm(22), rnorm(22)), data.frame(rnorm(59), rnorm(59)))
lst2 = lapply(lst2, setNames, c('rainfall', 'snow'))
What I would need to do is:
t.test(lst1[[1]]$rainfall, lst2[[1]]$rainfall)
t.test(lst1[[2]]$rainfall, lst2[[2]]$rainfall)
t.test(lst1[[3]]$rainfall, lst2[[3]]$rainfall)
t.test(lst1[[4]]$rainfall, lst2[[4]]$rainfall)
I can do it as above by writing each of the 4 data.frames (I actually have 40 with my real data) but I would like to know if there exists a smarter and quickier way to do it.
Here below what I tried (without success):
myfunction = function(x,y) {
test = t.test(x, y)
return(test)
}
result = mapply(myfunction, x=lst1, y=lst2)
x <- NULL
for (i in seq_along(lst1)){
x[[i]] <- t.test(lst1[[i]]$rainfall, lst2[[i]]$rainfall)
}
x
Works for me. I would use simplify = FALSE to get the results formatted better though.
lst1 <- list()
lst1[[1]] <- data.frame(rainfall = rnorm(10))
lst1[[2]] <- data.frame(rainfall = rnorm(10))
lst2 <- list()
lst2[[1]] <- data.frame(rainfall = rnorm(10))
lst2[[2]] <- data.frame(rainfall = rnorm(10))
myfunction = function(x,y) {
test = t.test(x$rainfall, y$rainfall)
return(test)
}
mapply(myfunction, x = lst1, y = lst2, SIMPLIFY = FALSE)

looping through dataframes using 'for' [duplicate]

This question already has answers here:
How can R loop over data frames?
(2 answers)
Closed 6 years ago.
Here is a simple made up data set:
df1 <- data.frame(x = c(1,2,3),
y = c(4,6,8),
z= c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6),
y = c(3,4,9),
z= c(6, 7, 7))
What I want to do is to create a new variable "a" which is just the sum of all three variables (x,y,z)
Instead of doing this separately for each dataframe I thought it would be more efficient to just create a loop. So here is the code I wrote:
my.list<- list(df1, df2)
for (i in 1:2) {
my.list[i]$a<- my.list[i]$x +my.list[i]$y + my.list[i]$z
}
or alternatively
for (i in 1:2) {
my.list[i]<- transform(my.list[i], a= x+ y+ z)
}
In both cases it does not work and the error "number of items to replace is not a multiple of replacement length" is returned.
What would be the best solution to writing a loop code where I can loop through dataframes?
See ?Extract:
Recursive (list-like) objects
Indexing by [ is similar to atomic vectors and selects a list of the
specified element(s).
Both [[ and $ select a single element of the list.
In short, my.list[i] returns a list of length 1, and you are trying to assign it a data.frame, so that doesn't work; whereas my.list[[i]] returns the data.frame #i in your list, which you can replace with a data.frame.
So you can use either:
for (i in 1:2) {
my.list[[i]]$a<- my.list[[i]]$x +my.list[[i]]$y + my.list[[i]]$z
}
or
for (i in 1:2) {
my.list[[i]]<- transform(my.list[[i]], a= x+ y+ z)
}
But it would be even simpler to use lapply, where you don't need [[:
my.list <- lapply(my.list, function(df) df$a <- df$x + df$y + df$z)
Rather than using an explicit loop to extract the data.frames from the list, just use lapply. It takes a list of data.frames (or any object) and a function, applies the function to every element of the list, and returns a list with the results.
# Sample data
df1 <- data.frame(x = c(1,2,3), y = c(4,6,8), z = c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6), y = c(3,4,9), z = c(6, 7, 7))
# Put them in a list
df_list <- list(df1, df2)
# Use lapply to iterate. FUN takes the function you want, and
# then its arguments (a = x + y + z) are just listed after it.
result_list <- lapply(df_list, FUN = transform, a = x + y + z)

Resources