Loop simultaneously over two lists in R - r

I have written a function that takes three arguments:
create.template <- function(t.list, x, y){
temp <- cbind(get(t.list[x]), get(t.list[y]), NA)
}
The output of this function is a data.frame with 11 columns and 17 rows.
Now I would like to create a loop over the function with two lists, one for x and one for y. Thereby
x.list <- list(1,2,3)
y.list <- list(4,5,6)
In the final step I would like to establish something like
for (x in x.list and y in y.list){
create.template(t.list, x, y)
}
and possibly combine the resulting dataframes (3 dataframes with 11 columns each) rowwise in one final dataframe.
I know that you can do this in Python with the zip() function and then append the results easily by append() and concatenate(), but I have not found an equivalent in R so far. Any help is highly appreciated!

We can get the values of multiple objects with mget, use either Reduce or do.call to cbind the list of vectors
Reduce(cbind, c(mget(ls(pattern = "\\.list")), NA))
Or
do.call(cbind, c(mget(c("x.list", "y.list")), NA))

Related

Iterating a Function Using R Lapply and a List of Function Arguments

I have a function, MyFun(a,b,c,d), that returns a list of data frames and plots. The arguments, a, b, c, d, are just character strings that represent “start date”, “end date”, “current version #”, and “previous version #”. For brevity, I am going to call the arguments by the names a, b,c,d.
I want to run MyFun(a, b, c, d) with unique sets of arguments and store all the output into a list. To do so, I created a list of lists:
arg_set1 <- list(a1,b1,c1,d1)
arg_set2 <- list(a2,b2,c2,d2)
arg_set3 <- list(a3,b3,c3,d3)
arg_sets <- list(arg_set1, arg_set2, arg_set3)
This is the part I’m uncertain of: I am attempting to use Lapply to get a list of outputs from MyFun(a,b,c,d), using the 3 lists listed in arg_sets as my input: output <- lapply(arg_sets, MyFun)
To my understanding, the above lapply statement does not work because lapply is unable to know that the series of arguments it should pass to MyFun are contained in arg_sets[1], arg_sets[2], arg_sets[3]. As an alternative, I have also tried to pass my input arguments to lapply as a data frame, with columns of a,b,c,d and each row encompassing a unique set of parameters I want lapply to pass to MyFun(a, b,c,d). However, I ran into essentially the same issue as before with the list of input arguments; I am unable to define to lapply to pass each row of the input matrix as a set of arguments to MyFun(a,b,c,d).
Any advice would be much appreciated!
Use do.call
output <- lapply(arg_sets, function(x) do.call(my_fun, x))
See this simple example,
my_fun <- function(x, y , z) {
x + y + z
}
arg_sets <- list(a = as.list(1:3), b= as.list(4:6))
lapply(arg_sets, function(x) do.call(my_fun, x))
#$a
#[1] 6
#$b
#[1] 15
If instead of list, you create a vector of arguments you can change the above function as
arg_set1 <- c(a1,b1,c1,d1)
arg_set2 <- c(a2,b2,c2,d2)
arg_set3 <- c(a3,b3,c3,d3)
arg_sets <- list(arg_set1, arg_set2, arg_set3)
lapply(arg_sets, function(x) do.call(my_fun, as.list(x)))

Compute 15 rows in parallel (through vectorization) and create df with them

I am creating 15 rows in a dataframe, like this. I cannot show my real code, but the create row function involves complex calculations that can be put in a function. Any ideas on how I can do this using lapply, apply, etc. to create all 15 in parallel and then concatenate all the rows into a dataframe? I think using lapply will work (i.e. put all rows in a list, then unlist and concatenate, but not exactly sure how to do it).
for( i in 1:15 ) {
row <- create_row()
# row is essentially a dataframe with 1 row
rbind(my_df,row)
}
Something like this should work for you,
create_row <- function(){
rnorm(10, 0,1)
}
my_list <- vector(100, mode = "list")
my_list_2 <- lapply(my_list, function(x) create_row())
data.frame(t(sapply(my_list_2,c)))
The create_row function is just make the example reproducible, then we predefine an empty list, then fill it with the result from the create_row() function, then convert the resulting list to a data frame.
Alternatively, predefine a matrix and use the apply functions, over the row margin, then use the t (transpose) function, to get the output correct,
df <- data.frame(matrix(ncol = 10, nrow = 100))
t(apply(df, 1, function(x) create_row(x)))

Looping a rep() function in r

df is a frequency table, where the values in a were reported as many times as recorded in column x,y,z. I'm trying to convert the frequency table to the original data, so I use the rep() function.
How do I loop the rep() function to give me the original data for x, y, z without having to repeat the function several times like I did below?
Also, can I input the result into a data frame, bearing in mind that the output will have different column lengths:
a <- (1:10)
x <- (6:15)
y <- (11:20)
z <- (16:25)
df <- data.frame(a,x,y,z)
df
rep(df[,1], df[,2])
rep(df[,1], df[,3])
rep(df[,1], df[,4])
If you don't want to repeat the for loop, you can always try using an apply function. Note that you cannot store it in a data.frame because the objects are of different lengths, but you could store it in a list and access the elements in a similar way to a data.frame. Something like this works:
df2<-sapply(df[,2:4],function(x) rep(df[,1],x))
What this sapply function is saying is for each column in df[,2:4], apply the rep(df[,1],x) function to it where x is one of your columns ( df[,2], df[,3], or df[,4]).
The below code just makes sure the apply function is giving the same result as your original way.
identical(df2$x,rep(df[,1], df[,2]))
[1] TRUE
identical(df2$y,rep(df[,1], df[,3]))
[1] TRUE
identical(df2$z,rep(df[,1], df[,4]))
[1] TRUE
EDIT:
If you want it as a data.frame object you can do this:
res<-as.data.frame(sapply(df2, '[', seq(max(sapply(df2, length)))))
Note this introduces NAs into your data.frame so be careful!

How to vectorize a for loop in R

I'm trying to clean this code up and was wondering if anybody has any suggestions on how to run this in R without a loop. I have a dataset called data with 100 variables and 200,000 observations. What I want to do is essentially expand the dataset by multiplying each observation by a specific scalar and then combine the data together. In the end, I need a data set with 800,000 observations (I have four categories to create) and 101 variables. Here's a loop that I wrote that does this, but it is very inefficient and I'd like something quicker and more efficient.
datanew <- c()
for (i in 1:51){
for (k in 1:6){
for (m in 1:4){
sub <- subset(data,data$var1==i & data$var2==k)
sub[,4:(ncol(sub)-1)] <- filingstat0711[i,k,m]*sub[,4:(ncol(sub)-1)]
sub$newvar <- m
datanew <- rbind(datanew,sub)
}
}
}
Please let me know what you think and thanks for the help.
Below is some sample data with 2K observations instead of 200K
# SAMPLE DATA
#------------------------------------------------#
mydf <- as.data.frame(matrix(rnorm(100 * 20e2), ncol=20e2, nrow=100))
var1 <- c(sapply(seq(41), function(x) sample(1:51)))[1:20e2]
var2 <- c(sapply(seq(2 + 20e2/6), function(x) sample(1:6)))[1:20e2]
#----------------------------------#
mydf <- cbind(var1, var2, round(mydf[3:100]*2.5, 2))
filingstat0711 <- array(round(rnorm(51*6*4)*1.5 + abs(rnorm(2)*10)), dim=c(51,6,4))
#------------------------------------------------#
You can try the following. Notice that we replaced the first two for loops with a call to mapply and the third for loop with a call to lapply.
Also, we are creating two vectors that we will combine for vectorized multiplication.
# create a table of the i-k index combinations using `expand.grid`
ixk <- expand.grid(i=1:51, k=1:6)
# Take a look at what expand.grid does
head(ixk, 60)
# create two vectors for multiplying against our dataframe subset
multpVec <- c(rep(c(0, 1), times=c(4, ncol(mydf)-4-1)), 0)
invVec <- !multpVec
# example of how we will use the vectors
(multpVec * filingstat0711[1, 2, 1] + invVec)
# Instead of for loops, we can use mapply.
newdf <-
mapply(function(i, k)
# The function that you are `mapply`ing is:
# rbingd'ing a list of dataframes, which were subsetted by matching var1 & var2
# and then multiplying by a value in filingstat
do.call(rbind,
# iterating over m
lapply(1:4, function(m)
# the cbind is for adding the newvar=m, at the end of the subtable
cbind(
# we transpose twice: first the subset to multiply our vector.
# Then the result, to get back our orignal form
t( t(subset(mydf, var1==i & mydf$var2==k)) *
(multpVec * filingstat0711[i,k,m] + invVec)),
# this is an argument to cbind
"newvar"=m)
)),
# the two lists you are passing as arguments are the columns of the expanded grid
ixk$i, ixk$k, SIMPLIFY=FALSE
)
# flatten the data frame
newdf <- do.call(rbind, newdf)
Two points to note:
Try not to use words like data, table, df, sub etc which are commonly used functions
In the above code I used mydf in place of data.
You can use apply(ixk, 1, fu..) instead of the mapply that I used, but I think mapply makes for cleaner code in this situation

R: t tests on rows of 2 dataframes

I have two dataframes and I would like to do independent 2-group t-tests on the rows (i.e. t.test(y1, y2) where y1 is a row in dataframe1 and y2 is matching row in dataframe2)
whats best way of accomplishing this?
EDIT:
I just found the format: dataframe1[i,] dataframe2[i,]. This will work in a loop. Is that the best solution?
The approach you outlined is reasonable, just make sure to preallocate your storage vector. I'd double check that you really want to compare the rows instead of the columns. Most datasets I work with have each row as a unit of observation and the columns represent separate responses/columns of interest Regardless, it's your data - so if that's what you need to do, here's an approach:
#Fake data
df1 <- data.frame(matrix(runif(100),10))
df2 <- data.frame(matrix(runif(100),10))
#Preallocate results
testresults <- vector("list", nrow(df1))
#For loop
for (j in seq(nrow(df1))){
testresults[[j]] <- t.test(df1[j,], df2[j,])
}
You now have a list that is as long as you have rows in df1. I would then recommend using lapply and sapply to easily extract things out of the list object.
It would make more sense to have your data stored as columns.
You can transpose a data.frame by
df1_t <- as.data.frame(t(df1))
df2_t <- as.data.frame(t(df2))
Then you can use mapply to cycle through the two data.frames a column at a time
t.test_results <- mapply(t.test, x= df1_t, y = df2_t, SIMPLIFY = F)
Or you could use Map which is a simple wrapper for mapply with SIMPLIFY = F (Thus saving key strokes!)
t.test_results <- Map(t.test, x = df1_t, y = df2_t)

Resources