lapply, rowSums and passign arguments - r

Consider this code
lapply(lst,rowSums)
lst is a list of five data frames. Each data frame has, for example, four columns and ten rows. I want to add the values of the columns in each row, however, I do not want to include the value of column one in the sum.
I can use a for loop and the code below:
lst_sum = list()
for (ii in c(1,5))
{
dummy <- lst[[ii]]
dummy <- rowSums(dummy[,seq(2,4,by = 1)])
lst_sum[[ii]] = dummy
}
I would like to use lapply or a similar function because I think the for loops looks ugly and inefficient.

With the help of comments I got this solution:
lapply(lst,function(x) rowSums(x[,seq(2,4,by = 1)]))

Related

How do you establish the column number that an apply function is working on to use that number to retrieve a variable indexed by that number

I am converting my for-loops in R for a model that has multiple input datasets. In the for-loop I use the current loop value to retrieve values from other datasets. I am looking to replicate this using an apply function (over columns in a dataset) however I'm struggling to establish index of the apply function in order to retrieve the appropriate variables from other data
The apply function references the column by the variable in the function which is fine and I've tried to use both colname (after having named my various columns by number) but have not had any joy. Below is an example dataset and for loop with what I'd like to achieve (simplified somewhat). The length of the vectors and the number of columns in the tabular dataset will always be equal.
iteration<-1:3
df <- data.frame("column1" = 6:10, "column2" = 12:16, "column3" = 31:35)
variable1<-rnorm(3,mean = 25)
variable2<-rnorm(3, mean = 0.21)
outcome<-numeric()
for (i in iteration) {
intermediate<-(mean(df[,i])*variable1[i])^variable2[i]
outcome<-c(outcome,intermediate)
}
outcome
The expected results are outcome above...trying this in apply
What I imagine it to be is this:
apply(df, 2, function(x) (mean(x)*variable1[colnumber(x)])^variable2[colnumber(x)]
or perhaps
apply(df, 2, function(x) (mean(x)*variable1[x])^variable2[x])
but these two obviously do not work.
first time user so apologies for any etiquette issues but found the answer to my own problem using the purrr package, but maybe this helps someone else
pmap(list(df, variable1, variable2), function(df, variable1, variable2) (mean(df)*variable1)^variable2)

Compute 15 rows in parallel (through vectorization) and create df with them

I am creating 15 rows in a dataframe, like this. I cannot show my real code, but the create row function involves complex calculations that can be put in a function. Any ideas on how I can do this using lapply, apply, etc. to create all 15 in parallel and then concatenate all the rows into a dataframe? I think using lapply will work (i.e. put all rows in a list, then unlist and concatenate, but not exactly sure how to do it).
for( i in 1:15 ) {
row <- create_row()
# row is essentially a dataframe with 1 row
rbind(my_df,row)
}
Something like this should work for you,
create_row <- function(){
rnorm(10, 0,1)
}
my_list <- vector(100, mode = "list")
my_list_2 <- lapply(my_list, function(x) create_row())
data.frame(t(sapply(my_list_2,c)))
The create_row function is just make the example reproducible, then we predefine an empty list, then fill it with the result from the create_row() function, then convert the resulting list to a data frame.
Alternatively, predefine a matrix and use the apply functions, over the row margin, then use the t (transpose) function, to get the output correct,
df <- data.frame(matrix(ncol = 10, nrow = 100))
t(apply(df, 1, function(x) create_row(x)))

assign individual vector element to individual columns by rown and by name - r

I work mostly with data table but a data frame solution would work as well.
I have the result of an apply which returns this data structure
applyres=structure(c(0.0260, 3.6775, 0.92
), .Names = c("a.1", "a.2", "a.3"))
Then I have a data table
coltoadd=c('a.1','a.2','a.3')
dt <- data.table(variable1 = factor(c("what","when","where")))
dt[,coltoadd]=as.numeric(NA)
Now I would like to add the elements of applyres to the corresponding columns, just one row at a time, because applyres is calculated from another function. I have tried different assignments but nothing seems to work. Ideally I would like to assign based on column name, just in case the columns change order in one of the two structures.
This doesn't work
dt[1,coltoadd]=applyres
I also tried
dt[1,coltoadd := applyres]
And tried to change applyrest to a matrix or a data table and transpose.
I would like to do something like this
dt[1,coltoadd[i]]=applyres[coltoadd[i]]
But not sure if it should go in a loop, doesn't seem the best way to do it.
How do I avoid doing single assignments if I have a large number of columns?
1) data.frame Convert to data.frame, perform the assignments and convert back.
DF <- as.data.frame(dt)
DF[1, -1] <- applyres
# perform remaining of assignments
dt <- as.data.table(DF)
2) loop Another possibility is a for loop:
for(i in 2:ncol(dt)) dt[1, i] <- applyres[i-1]

Use apply to add multiple columns (more than 100) of random numbers or other function in R

I would like to build a function that adds many columns of random variables or other function to a a dataframe. Here I am trying to append it to map data.
library(plyr)
add <- function(name, df){
new.df = mutate(df, name = runif(length(df[,1])))
new.df
}
The function works to add a column of data...
add("e", iris)
iris2<- add("f", iris)
The apply does not work...
I am trying to add 26 columns from the list of letters so that df$a, df$b, df$c are all random vectors.
new <- lapply(letters, add, df = tx)
What is the most efficient way to columns from a list of col names?
I would like to later loop through all of the column names in another function.
It's not very clear to me, what you want to achieve. This adds multiple columns of random numbers to a data.frame:
cbind(iris,
matrix(runif(nrow(iris)*5), ncol=5))
I don't see a reason to use an *apply function.

R: t tests on rows of 2 dataframes

I have two dataframes and I would like to do independent 2-group t-tests on the rows (i.e. t.test(y1, y2) where y1 is a row in dataframe1 and y2 is matching row in dataframe2)
whats best way of accomplishing this?
EDIT:
I just found the format: dataframe1[i,] dataframe2[i,]. This will work in a loop. Is that the best solution?
The approach you outlined is reasonable, just make sure to preallocate your storage vector. I'd double check that you really want to compare the rows instead of the columns. Most datasets I work with have each row as a unit of observation and the columns represent separate responses/columns of interest Regardless, it's your data - so if that's what you need to do, here's an approach:
#Fake data
df1 <- data.frame(matrix(runif(100),10))
df2 <- data.frame(matrix(runif(100),10))
#Preallocate results
testresults <- vector("list", nrow(df1))
#For loop
for (j in seq(nrow(df1))){
testresults[[j]] <- t.test(df1[j,], df2[j,])
}
You now have a list that is as long as you have rows in df1. I would then recommend using lapply and sapply to easily extract things out of the list object.
It would make more sense to have your data stored as columns.
You can transpose a data.frame by
df1_t <- as.data.frame(t(df1))
df2_t <- as.data.frame(t(df2))
Then you can use mapply to cycle through the two data.frames a column at a time
t.test_results <- mapply(t.test, x= df1_t, y = df2_t, SIMPLIFY = F)
Or you could use Map which is a simple wrapper for mapply with SIMPLIFY = F (Thus saving key strokes!)
t.test_results <- Map(t.test, x = df1_t, y = df2_t)

Resources