understanding sapply function in r - r

This might be the most stupid question, but I do not seem to grasp the function sapply: Here is my issue
Example:
d = matrix(1:10, 5,2)
d[3] = NA
# [,1] [,2]
#[1,] 1 6
#[2,] 2 7
#[3,] NA 8
#[4,] 4 9
#[5,] 5 10
If I would like to calculate the row means using sapply function I would use something like this:
sapply(d,mean)
#[1] 1 2 NA 4 5 6 7 8 9 10
Should it not give me the mean of the list of the elements? It just spits out the elements of my matrix rather then the mean.
When I use apply, I get the right answer:
apply(d,1,mean, na.rm=T)
[1] 3.5 4.5 8.0 6.5 7.5
Can anyone bother giving me a very dummy explanation.Highly appreicated.
Used the following links before asking the question. link 1 Link 2 Link 3

have you read ?sapply. The function takes either a vector or an expression object. It cannot take a matrix. So what happens in your example is that the matrix d is considered as a vector of numeric:
sapply(as.numeric(d),mean)

sapply (and its friends, like lapply) require a list (or a data.frame, which is really a special kind of list) as input. But even if you had turned your matrix into a data frame, it wouldn't have given you row means, it would have given you column means. If you want to understand how these functions work, it might help to look at this function (copied from here: http://adv-r.had.co.nz/Functionals.html), which shows the essence of lapply using base R. (sapply works in the same way, it just tries to simplify the output to a vector rather than always returning a list.)
lapply2 <- function(x, f, ...) {
out <- vector("list", length(x))
for (i in seq_along(x)) {
out[[i]] <- f(x[[i]], ...)
}
out
}
By the way, if the point of your question is to find the best way of calculating row means, rather than understanding sapply, then there is a function rowMeans that is the fastest way of doing it -- faster than using apply, for example.

Related

Turning a vector of vectors acquired from a simulation into a matrix. Numeric problem [duplicate]

This question already has answers here:
Using cbind on an arbitrarily long list of objects
(4 answers)
Closed 2 years ago.
I am just trying to do a basic thing but I can't seem to figure out what the problem is and I can't find answers here that are to problems exactly like mine. If anyone already knows of an answer to this elsewhere, feel free to link that.
I have a simulation that generates a vector, and I have set up my simulation such that it grabs the generate vector and makes it an element of another vector. After I run the simulation multiple times, I would like to make the vector of vectors into a matrix, but it the console output is always this:
> agx1
[1,] Numeric,7
[2,] Numeric,7
My simulation pretty much does the following:
agnx1 = c()
#some stuff happens
agnx1[i] = x1
#iteration number two takes place
agnx1[i+1] = x1
#etc..
#Now say I have
agx1[1] = c(0.796399, 0.865736, 0.885808, 0.896138, 0.896138, 0.850385, NA)
#and
agx1[2] = c(0.796399, 0.856540, 0.881432, 0.900808, 0.900808, 0.857664, NA)
#and therefore, agx1 is a vector of vectors. But whenever I try something like..
cagx1 = cbind(agx1[1:2])
#or
cagx1 = as.matrix(agx1)
# I just get:
[,1]
[1,] Numeric,7
[2,] Numeric,7
Any suggestions would be helpful.
It's hard to tell without seeing all of the data, but perhaps agx1 is a list. Try using do.call.
do.call(cbind, agx1)
Edit
Base R cbind doesn't have functionality to work on a list. Consider this:
cbind(agx1[[1]],agx1[[2]])
That works because you have unlisted the first and second elements and passed them as vectors to cbind.
You get around this problem using do.call. help(do.call) says:
Description
do.call constructs and executes a function call from a name or a
function and a list of arguments to be passed to it.
Thus, do.call helps you call cbind(agx1[[1]], agx1[[2]], ... and so on until the end of the list by constructing the cbind function call from the list of agx1 arguments.
Assuming agx1 is something like this :
agx1 <- list(c(0.796399, 0.865736, 0.885808, 0.896138, 0.896138, 0.850385, NA),
c(0.796399, 0.856540, 0.881432, 0.900808, 0.900808, 0.857664, NA))
You could use dplyr::bind_cols
dplyr::bind_cols(agx1)
# V1 V2
# <dbl> <dbl>
#1 0.796 0.796
#2 0.866 0.857
#3 0.886 0.881
#4 0.896 0.901
#5 0.896 0.901
#6 0.850 0.858
#7 NA NA

Replacing for-loops for Excel-like formula filling in dataframe/matrix

I'm trying to perform basic excel-like formula-filling in R. I want to populate the value of a "cell" based on the values of other cells in the same matrix or data.frame. The function is pretty straightforward to do with a single cell, but seems to be more difficult to scale across both rows and columns.
Say I have a simple matrix:
simple <- matrix(c(0,1,2,3,0,4,5,6,7,NA,NA,NA,8,NA,NA,NA), nrow = 4, ncol = 4)
[,1] [,2] [,3] [,4]
[1,] 0 0 7 8
[2,] 1 4 NA NA
[3,] 2 5 NA NA
[4,] 3 6 NA NA
I want to populate the NAs with the sum of columns 1 and 2 in the same row and row 1 in the same column. In Excel, for cell C2 it would be
=$A2 + $B2 + C$1
in R
simple[2,3] <- simple[2,1] + simple[2,2] + simple[1,3]
In Excel, you can simply drag the formula over the remaining cells, and voila. In R, not so easy.
Since r is vectorized, I can fill a whole column pretty easily by giving ranges instead of single cells, like so:
simple[2:4,3] <- simple[2:4,1] + simple[2:4,2] + simple[1,3]
[,1] [,2] [,3] [,4]
[1,] 0 0 7 8
[2,] 1 4 12 NA
[3,] 2 5 14 NA
[4,] 3 6 16 NA
But when I try to vectorize over both rows and columns, it doesn't work because it interprets the last value as the vector c(7,8), and tries to add that in a row-wise fashion, rather than adding it column-wise.
simple[2:4,3:4] <- simple[2:4,1] + simple[2:4,2] + simple[1,3:4]
Warning message:
In simple[2:4, 1] + simple[2:4, 2] + simple[1, 3:4] :
longer object length is not a multiple of shorter object length
[,1] [,2] [,3] [,4]
[1,] 0 0 7 8
[2,] 1 4 12 12
[3,] 2 5 15 15
[4,] 3 6 16 16
As an alternative solution, one could do nested for loops, as below:
for (i in 2:4){
for (j in 3:4){
simple[i,j] <- simple[i,1] + simple[i,2] + simple[1,j]
}
}
[,1] [,2] [,3] [,4]
[1,] 0 0 7 8
[2,] 1 4 12 13
[3,] 2 5 14 15
[4,] 3 6 16 17
This actually works and is pretty easy, but it involves nested for loops, so, enough said.
I feel like the "right" solution would be one using correct vectorization, apply(), or dplyr, but I can't seem to figure out how to make them work, short of rearranging the data from a crosstab format to a flat format, but that can explode your file size pretty quickly.
Any ideas on how to make this work in a more R-ish fashion?
Here's a more R like way to do it, let's convert simple to a data.frame first.
library(tidyverse)
df1 <- as.data.frame(simple)
df1 %>% mutate(V3 = V1 + V2 + first(V3), V4 = V1 + V2 + first(V4))
V1 V2 V3 V4
1 0 0 7 8
2 1 4 12 13
3 2 5 14 15
4 3 6 16 17
first from dplyr is handy because it lets you lock to the first value in the column, like you would in Excel with C$1
In matrix arithmetic, each component must be same dimension or any being a single-item vector. Therefore, consider aligning by replicating 7 and 8 for each needed row 2-4 (i.e., 3 times). Then transpose for 2 X 3 dimension:
simple[2:4,3:4] <- simple[2:4,1] + simple[2:4,2] + t(replicate(length(2:4), simple[1,3:4]))
Alternatively, consider sapply iterating through 7 and 8 values respectively:
simple[2:4,3:4] <- sapply(3:4, function(i) simple[2:4,1] + simple[2:4,2] + simple[1,i])
Slightly more concise with rowSums and leaving out row indexing:
simple[,3:4] <- sapply(3:4, function(i) rowSums(simple[,1:2]) + simple[1,i])
I may be late to the game but here is a data.table and base R solution which for large data sets is much faster than tidyverse. The syntax may look more confusing at first but breaking it down piece by piece is very logical and straight-forward once you have a good handle on lapply.
To make the cell and the vectors you are adding compatible you should convert the cell to a vector by simply replicating that value as many times as the number of observations or rows of the dataframe. So in your example, V3 = rep(7,4) will yield a vector with all 7s. R will then let you do V3=V1+V2+V3, where V3 on the right-hand side is the rep(7,4).
The data.table has some handy built-in special read-only symbols that will also give you the ability to extend the solution beyond the two columns you provided in the example. The two I use most frequently are .SD and .N. In this example, you can think of .SD as a way to refer to all columns except the first two and .N is always a constant number equal to the number of rows in the data.table. These symbols can be used in the j slot of a data.table which is equivalent to the columns of a matrix or data.frame object. So your code would look like this:
simple <- data.table(simple)
NAcols <- colnames(simple)[-c(1,2)] ##Can modify this to get names of columns you wish to change if its not the first two using match or grep. I can add that if you want?
simple[,NAcols:=lapply(.SD,function(i) V1+V2+rep(i[1],.N)),.SDcols=NAcols]
Note that each iteration in the lapply loop is simply the ith column and i[1] selects only the first element of that column and replicates it as many times as the number of rows (.N) before adding the three vectors together. The .SDcols is used to prevent this function from being applied to the first two columns. Though there was no need in this problem to group, data.table also allows you to specify 'by = ' as an argument if you want to group by a particular column or columns in the data.table before applying the function. Finally note that I did not need to assign the last line of code to another R object because data.table updates the old columns of 'simple' using pointers which is why it is so much faster than base R and tidyverse data frame objects. However you can use the copy function of data.table like this instead if you wish to save the original data.table for some reason:
final_result <- copy(simple)[,NAcols:=lapply(.SD,function(i) V1+V2+rep(i[1],.N)),.SDcols=NAcols]
Anyway I hope that explanation helps and if you need me to clarify anything please let me know! Best of luck!

How to sum a result of the repeated function in R

I have a very complicated function. I need to repeat this function several times and sum the result. This is easy. However, I need to sum them at the same time. Since my function is difficult to show it here, I provide a very simple example just to explain my idea. Please note that (based on the amazing questions from the comments) My function needs to be done pairwise. Also, my matrices are all the same dimensions. Finally, the result is not as a list. I need to assign the result to a new variable. That is,
Res <– myfunc(x[i,j],y[i,j])+myfunc(z[i,j],t[i,j])+..+..
Also, my function must loop over the elements of the matrices. x[i,j].
My matrices are stored in a list.
Mymatrices–list(x,y,z,t).
For example,
x <- matrix(5,5,5)
x[upper.tri(x,diag=T)] <- 0
y <– matrix(4,5,5)
y[upper.tri(y,diag=T)] <- 0
z <- matrix(3,5,5)
z[upper.tri(z,diag=T)] <- 0
t <- matrix(2,5,5)
t[upper.tri(t,diag=T)] <- 0
myfunc <– function(x,y){
sum(x,y)
}
I would like it like this:
Res <– myfunc(x[i,j],y[i,j])+myfunc(z[i,j],t[i,j])+..+..
Suppose I have 10 matrices and would like to have the sum as shown above. It is hard to do it manually. I would like to do this automatically. lapply function takes a list and I do not want it as a list.
Any help, please?
I cant either tell whether you need a matrix in the end or a value. But since you used i,j I presume you need a matrix:
Reduce("+",list(x,y,z,t))
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 14 0 0 0 0
[3,] 14 14 0 0 0
[4,] 14 14 14 0 0
[5,] 14 14 14 14 0
or do you need:
Reduce(sum,list(x,y,z,t))
[1] 140
Let's say your matrices are in a list, paired in the way you want:
input = list(list(x, y), list(z, t))
For convenience, we'll make a version of your function that takes a list as input (we could use do.call or an anonymous function instead, but this is very clear):
myfunc_list = function(x) {
myfunc(x[[1]], x[[2]])
}
We can then sapply the list function to your input list, and sum:
sum(sapply(input, myfunc_list))
# [1] 140
Glad to have helped. To be honest, I'm still not completely sure what you are asking for though - no one thinks your final answer will be a list, just an intermediate step in order to do the summation effectively. Looking at the answers, I think the Reduce function suggested by Onyambu is what you need - where x, y, z, and t are the results from your function (called pairwise on different matrices).
Is the summation really where you need help, or is it efficiently calling your function pairwise on all those matrices? That is a very different question. If that's the case, check out the map2 function in the purrr package. It takes two lists (of the same length) as inputs, computes a function on each element, and returns a list (which can be fed into Reduce).

Apply an operation to some elements of a vector by using indices

I've got a fairly basic question concerning vector operations in R. I want to apply a certain operation (i.e. increment) to specific elements of a vector by using a vector containing the indices of the elements.
For example:
ind <- c(2,5,8)
vec <- seq(1,10)
I want to add 1 to the 2nd, 5th and 8th element of vec. In the end I'd like to have:
vec <- c(1,3,3,4,6,6,7,9,8,10)
I tried vec[ind] + 1
but that returns only the three elements. I could use a for-loop, of course, but knowing R, I'm sure there's a more elegant way.
Any help would be much appreciated.
We have to assign it
vec[ind] <- vec[ind] + 1
vec
#[1] 1 3 3 4 6 6 7 9 9 10

R: a for statement wanted that allows for the use of values from each row

I'm pretty new to R..
I'm reading in a file that looks like this:
1 2 1
1 4 2
1 6 4
and storing it in a matrix:
matrix <- read.delim("filename",...)
Does anyone know how to make a for statement that adds up the first and last numbers of one row per iteration ?
So the output would be:
2
3
5
Many thanks!
Edit: My bad, I should have made this more clear...
I'm actually more interested in an actual for-loop where I can use multiple values from any column on that specific row in each iteration. The adding up numbers was just an example. I'm actually planning on doing much more with those values (for more than 2 columns), and there are many rows.
So something in the lines of:
for (i in matrix_i) #where i means each row
{
#do something with column j and column x from row i, for example add them up
}
If you want to get a vector out of this, it is simpler (and marginally computationally faster) to use apply rather than a for statement. In this case,
sums = apply(m, 1, function(x) x[1] + x[3])
Also, you shouldn't call your variables "matrix" since that is the name of a built in function.
ETA: There is an even easier and computationally faster way. R lets you pull out columns and add them together (since they are vectors, they will get added elementwise):
sums = m[, 1] + m[, 3]
m[, 1] means the first column of the data.
Something along these lines should work rather efficiently (i.e. this is a vectorised approach):
m <- matrix(c(1,1,1,2,4,6,1,2,4), 3, 3)
# [,1] [,2] [,3]
# [1,] 1 2 1
# [2,] 1 4 2
# [3,] 1 6 4
v <- m[,1] + m[,3]
# [1] 2 3 5
You probably can use an apply function or a vectorized approach --- and if you can you really should, but you ask for how to do it in a for loop, so here's how to do that. (Let's call your matrix m.)
results <- numeric(nrow(m))
for (row in nrow(m)) {
results[row] <- m[row, 1] + m[row, 3]
}
This is probably one of those 100 ways to skin a cat questions. You are perhaps looking for the rowSums function, although you might also find many answers using the apply function.

Resources