R apply statement does not work with a matrix - r

I'd like to know the reason why the following does not work on the matrix structure I have posted here (I've used the dput command).
When I try running:
apply(mymatrix, 2, sum)
I get:
Error in FUN(newX[, i], ...) : invalid 'type' (list) of argument
However, when I check to make sure it's a matrix I get the following:
is.matrix(mymatrix)
[1] TRUE
I realize that I can get around this problem by unlisting the data into a temp variable and then just recreating the matrix, but I'm curious why this is happening.

?is.matrix says:
'is.matrix' returns 'TRUE' if 'x' is a vector and has a '"dim"'
attribute of length 2) and 'FALSE' otherwise.
Your object is a list with a dim attribute. A list is a type of vector (even though it is not an atomic type, which is what most people think of as vectors), so is.matrix returns TRUE. For example:
> l <- as.list(1:10)
> dim(l) <- c(10,1)
> is.matrix(l)
[1] TRUE
To convert mymatrix to an atomic matrix, you need to do something like this:
mymatrix2 <- unlist(mymatrix, use.names=FALSE)
dim(mymatrix2) <- dim(mymatrix)
# now your apply call will work
apply(mymatrix2, 2, sum)
# but you should really use (if you're really just summing columns)
colSums(mymatrix2)

The elements of your matrix are not numeric, instead they are list, to see this you can do:
apply(m,2, class) # here m is your matrix
So if you want the column sum you have to 'coerce' them to be numeric and then apply colSums which is a shortcut for apply(x, 2, sum)
colSums(apply(m, 2, as.numeric)) # this will give you the sum you want.

Related

apply() works not as expected

I'm trying to get a hold on how the apply function works. Here is what I tried:
df = data.frame(x=c(1,2,3,4,5), x2=c(1,2,3,4,5))
apply(df$x2, 2, function(x) x*2) #doesn't work
apply(df["x2"], 2, function(x) x*2) #works
apply(df[,2], 2, function(x) (x*2)) #doesn't work
apply(df[2], 2, function(x) x*2) #works (suprisingly)
apply(df[2,], 1, function(x) x*2) #works, but gives me vertical vector
apply(df[2,], 2, function(x) x*2) #works; this gives me the output I expected in line above
Questions (as idicated by comments):
Why doesn't line 2 work although line 3 does?
Why can I use [2,] to refer to row 2 (line 6), but cannot use [,2] to refer to column 2 (line 4), but have to use [2] (line 5) instead?
In line 6 I expected to get what I got from line 7: row 2 (with double values) in a row. Why didn't I get this from line 6, I indicated row with
MARGIN=2?
apply needs to be used on something with a dimension of positive length. For simplicity some Object that has rows and columns.
That's why you have margin 1, 2. Standing for the row-wise and col-wise operation.
Check your Input values like this:
dim(df["x2"])
dim(df[,2]) #this is null, so it does not work
df[,2] gives you a vector same as df$x2. A vector does not have rows and cols. Therefore not working with apply.
In order to understand what you are doing wrong:
Type ?"[" into your console and read everything. Also play around... what you are already doing!
Have a closer look at the drop argument.
Lastly with df[2,] your subsetting a single row. It's still a dataframe.
Check dim(df[2,])
apply(df[2,], 1, function(x) x*2) #works, but gives me vertical vector
apply(df[2,], 2, function(x) x*2) #works; this gives me the output I expected in line above
The reason you don't get the same output. Is the WHOLE reason why apply exists. Please read ?apply to understand.
When you have questions after reading the two mentioned resources, feel free to ask more.
Here is a little example:
m <- matrix(1:9,nrow=3)
m
apply(m,1,max) #row-wise max value
apply(m,2,max) #col-wise max value
The problem is subsetting:
First:
df$x2 and df[, 2] are different from df["x2"] and df[2], as the former return a numeric vector, the latter return a data.frame.
Second:
df[2, ] returns the second row of your data.frame. If you use MARGIN = 1 you go through the rows, each row is represented as a (named) vector of length equal to the number of columns in your data.frame.
If you use MARGIN = 2 you go through the columns, again, each column is represented as a (named) vector of length equal to the number of rows in your data.frame.
Why doesn't line 2 work although line 3 does?
df$x2 is a vector i.e. c(1,2,3,4,5) whereas df["x2"] is a data frame with just one column. The vector has no second dimension to apply over. See ?'['] in R for details of how subsetting works, this isn't really related to the apply function
Why can I use [2,] to refer to row 2 (line 6), but cannot use [,2] to refer to column 2 (line 4), but have to use [2] (line 5) instead?
Again, see the subsetting help page, but df[,2,drop=FALSE] is probably what you need.
In line 6 I expected to get what I got from line 7: row 2 (with double values) in a row. Why didn't I get this from line 6, I indicated row with MARGIN=2?
The value section of ?apply explains the dimensions that you can expect as output from a call to apply:
If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1. If n equals 1, apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise.
In this case we see that:
> dim(df[2,])
# [1] 1 2
and so:
apply(df[2,], 1, function(x) x*2)
has n=2 and dim(df[2,])[1]=1, so you should expect an output with dimensions c(2,1).
You should look at each type and dimension of the expression
> typeof(df$x2)
[1] "double"
> dim(df$x2)
NULL
> typeof(df["x2"])
[1] "list"
> dim(df["x2"])
[1] 5 1
> typeof(df[, 2])
[1] "double"
> dim(df[, 2])
NULL
> typeof(df[2])
[1] "list
> dim(df[2])
[1] 5 1
> typeof(df[2, ])
[1] "list"
> dim(df[2,])
[1] 1 2
The line 2 does not work because you try to apply function to variable which has NULL dimension. (dim(X) must have positive length). The rest is similar. You must keep attention on the type of the expression in apply. I recommend you to simply print values to check if there are properly for the apply function.

R why does do.call not match a direct calculation?

In the process of doing something more complicated, I've found the following:
If I use do.call converting a numeric vector to a list, I am getting a different value from the applied function and I'm not sure why.
x <- rnorm(30)
median(x) # -0.01192347
# does not match:
do.call("median",as.list(x)) # -1.912244
Why?
Note: I'm trying to run various functions using a vector of function names. This works with do.call, but only if I get the correct output from do.call.
Thanks for any suggestions.
So do.call expects the args argument to be a list of arguments, so technically we'd want to pass list(x = x):
> set.seed(123)
> x <- rnorm(10)
> median(x)
[1] -0.07983455
> do.call(median,list(x = x))
[1] -0.07983455
> do.call(median,as.list(x))
[1] -0.5604756
Calling as.list on the vector x turns it into a list of length 10, as though you were going to call median and pass it 10 separate arguments. But really, we're passing just one, x. So in the end it just grabs the first element of the vector and passes that to the argument x.

for loop with if statements (both iterates over a list) gave warnings in r

I have two lists, each list contains two vectors i.e,
x <- list(c(1,2),c(3,4))
y <- list(c(2,4),c(5,6))
z <- list(c(0,0),c(1,1), c(2,3),c(4,5))
I would like to use for loop to iterate over the first list and if statement for the second list as follows:
for (j in 1:seq(x)){
if(y[[j]] == c(2,4))
z[[j]] <- c(0,0)
}
I would like to iterate over the first list and for each iteration I would like to give a condition for the second list. My function is complex, so I upload this example which is similar to what I am trying to do with my original function. So that is, I would like to choose the values of z based on the values of y. For x I just want to run the code based on the length of x.
When I run it, I got this message:
Warning messages:
1: In 1:seq(x) : numerical expression has 2 elements: only the first used
2: In if (y[[j]] == c(2, 4)) y[[j]] <- c(0, 0) :
the condition has length > 1 and only the first element will be used
I search this website and I saw similar question but it is not helpful (if loop inside a for loop which iterates over a list in R?). This question is just for the first part my question. So, it does not help me with my problem.
any help please?
The first warning is caused by using seq() which returns a [1] 1 2 in combination with the colon operator which creates a sequence between the LHS and RHS. Both values on the left and right of the colon must be of length 1. Otherwise it will take the first element and discard the rest. So 1:seq(x) is the same as writing 1:1
The second warning is that the if statement gets 2 logical values from your condition:
y[[1]] == c(2, 4)
[1] TRUE TRUE
If you want to test if elements of the vector are the same you can use your notation. If you want to test if the vectors are the same, you can use all.equal.
isTRUE(all.equal(y[[1]], c(2,4)))
[1] TRUE
It returns TRUE if vectors are equal (but not FALSE if they are not, which is why it needs to be used along with isTRUE()).
To get rid of the warnings, you can do:
for (j in seq_along(x)){
if (isTRUE(all.equal(y[[j]], c(2,4)))) {
z[[j]] <- c(0,0)
}
}
Note: seq_along() is a fast primitive for seq()
For the first part, seq() will returns [1] 1 2. So, you need to use j in seq(x) or j in 1:length(x).
and for the second part, as the command you used generates TRUE and FALSE as many as the elements in the vectors, you can use setequal(x,y). This command will check whether two objects are equal or not. two objects can be vectors, dataframes, etc, and the result is TRUE or FALSE.
The final code can be:
for (j in 1:length(x)){
if (setequal(y[[j]], c(2,4)) == TRUE) {
z[[j]] <- c(0,0)
}
}
or:
for (j in seq(x)){
if (setequal(y[[j]], c(2,4)) == TRUE) {
z[[j]] <- c(0,0)
}
}

rep function strange error

When I perform:
a <- seq(1,1.5,0.1)
b <- c(1,1.1,1.4,1.5)
x <- rep(c(a,b),times=c(2,1))
Error in rep(c(a, b), c(2, 1)) : invalid 'times' argument
Why?
When we concatenate (c) two vectors, it becomes a single vector. If the idea would be to replicate 'a' by 2 and 'b' by 1, we place them in a list, and use rep. The output will be a list, which can be unlisted to get a vector.
unlist(rep(list(a,b), c(2,1)))
Marked answer is already perfect: Here an alternative using mapply
unlist(mapply(function(x,n)rep(x,n),list(a,b),c(2,1)))

Using apply() over columns to output subsets

I have a data frame in R where the majority of columns are values, but there is one character column. For each column excluding the character column I want to subset the values that are over a threshold and obtain the corresponding value in the character column.
I'm unable to find a built-in dataset that contains the pattern of data I want, so a dput of my data can be accessed here.
When I use subsetting, I get the output I'm expecting:
> df[abs(df$PA3) > 0.32,1]
[1] "SSI_01" "SSI_02" "SSI_04" "SSI_05" "SSI_06" "SSI_07" "SSI_08" "SSI_09"
When I try to iterate over the columns of the data frame using apply, I get a recursion error:
> apply(df[2:10], 2, function(x) df[abs(df[[x]])>0.32, 1])
Error in .subset2(x, i, exact = exact) :
recursive indexing failed at level 2
Any suggestions where I'm going wrong?
The reason your solution didn't work is that the x being passed to your user-defined function is actually a column of df. Therefore, you could get your solution working with a small modification (replacing df[[x]] with x):
apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])
You could use the ... argument to apply to pass an extra argument. In this case, you would want to pass the first column:
apply(df[2:10], 2, function(x, y) y[abs(x) > 0.32], y=df[,1])
Yet another variation:
apply(abs(df[-1]) > .32, 2, subset, x=df[[1]])
The cute trick here is to "curry" subset by specifying the x parameter. I was hoping I could do it with [ but that doesn't deal with named parameters in the typical way because it is a primitive function :..(
A quick and non-sophisticated solution might be:
sapply(2:10, function(x) df[abs(df[,x])>0.32, 1])
Try:
lapply(df[,2:10],function(x) df[abs(x)>0.32, 1])
Or using apply:
apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])

Resources