Axis numbering for R's apply function - r

Given the following simple matrix
mymatrix<-matrix(1:9,nrow=3)
mymatrix
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Let's do column and row wise sums:
apply(mymatrix,1,sum)
[1] 12 15 18
> apply(mymatrix,2,sum)
[1] 6 15 24
My intuition would have the axes reversed from what we see above. I think of rows as the first dimension of a matrix. So applying the sum operation on axis-1 should give us row sums. What is the proper way to understand the thinking of having the opposite polarity?

I actually misunderstood what matrix(1:9,nrow=3) generates: I had not paid attention to the output. I had presumed it would create
1 2 3
4 5 6
7 8 9
But instead it is columns-first. So apply does exactly what I expect: sums rows when axis=1 and sums columns when axis=2.

Related

Crosschecking numbers of a matrix in R

I'm currently working with a large matrix of two columns, and what I want to check is If every line/combination (two columns) is also present in a dataframe loaded (two columns as well).
Example,
(obj_design <- matrix(c(2,5,4,7,6,6,20,12,4,0), nrow = 5, ncol = 2))
[,1] [,2]
[1,] 2 6
[2,] 5 20
[3,] 4 12
[4,] 7 4
[5,] 6 0
(refined_grid <- data.frame(i=1:4, j=1:12))
i j
1 1 1
2 2 2
3 3 3
4 4 4
5 1 5
6 2 6
7 3 7
8 4 8
9 1 9
10 2 10
11 3 11
12 4 12
Following the reproducible example, it would be selected (2,6) and (4,12).
I'm wondering if there's a function that I can use to check the whole matrix, and see if a specific line is in the dataframe, and (if possible) write separately (new dataset) which elements of the matrix it is in.
Any assistance would be wonderful.
Here is an option with match
i1 <- match(do.call(paste, as.data.frame(obj_design)),
do.call(paste, refined_grid), nomatch = 0)
refined_grid[i1,]
This code will give you which rows of the matrix exist in the dataframe.
which(paste(obj_design[,1], obj_design[,2]) %in%
paste(refined_grid$i, refined_grid$j)
)
Then you can just assign it to a vector!

How to subset a matrix with different column positions for each row? [duplicate]

This question already has answers here:
Subset a matrix according to a columns vector
(2 answers)
Closed 3 years ago.
I want to subset a matrix using different (but one) column for every row. So propably apply could do the job? But propably also smart subsetting could work, but i havent found a solution. Computation time is an issue - I have a solution with a for loop, but loading the matrix in the RAM several times is just too slow.
Here is an example:
Matrix M and vector v are given,
M<-matrix(1:15,nrow=5,ncol=3)
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 13
[4,] 4 9 14
[5,] 5 10 15
v<-c(3,1,1,2,1)
and the solution shall be:
(11,2,3,9,5)
We can try the row/column indexing
M[cbind(1:nrow(M), v)]
#[1] 11 2 3 9 5
Just for fun, here's an another solution using a vector indexing
t(M)[v + (seq_len(nrow(M)) - 1) * ncol(M)]
# [1] 11 2 3 9 5

what is the meaning of tapply(x,index) if no FUN?

I know the meaning of tapply(dat$sale,list(dat$year,dat$province),sum)in the code:
> dat=data.frame(
+ year=c(rep(2007,5),rep(2008,3),rep(2009,3)),
+ province=c("a","a","b","c","d","a","c","d","b","c","d"),
+ sale=1:11)
> tapply(dat$sale,list(dat$year,dat$province),sum)
a b c d
2007 3 3 4 5
2008 6 NA 7 8
2009 NA 9 10 11
what is the meaning of tapply(dat$sale,list(dat$year,dat$province)) if there is no FUN in it?
> tapply(dat$sale,list(dat$year,dat$province))
[1] 1 1 4 7 10 2 8 11 6 9 12
it is a subscripts ,what is the meaning of 12 or 9 in the result?
in which rule can i get 12 or 9?how to calculate it?
From ?tapply:
FUN the function to be applied, or NULL. In the case of functions
like +, %*%, etc., the function name must be backquoted or quoted. If
FUN is NULL, tapply returns a vector which can be used to subscript
the multi-way array tapply normally produces.
FUN defaults to NULL, so you get the subscripts.
Note that in R matrices/arrays, like those returned by tapply, are just vectors with dimensions. Matrices are column-major by default, so you will get the ith element of the first column until it wraps around to the second column:
> mat <- matrix(seq(9),ncol=3)
> mat
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> mat[4]
[1] 4

how do I perform spreadsheet type calculations in R?

I want to perform a calculation in R that would be simple in Excel. Let's say I have a long column of number and I want to subtract row 1 from row 2, and row 3 from row 4, and so on, so that they I am subtracting subsequent pairs of numbers. How do I do this?
Or even something simpler. How do I subtract row 1 from row 2, and then row 2 from row 3, and so on? Basically here I'm finding the difference between each two rows.
This should be very easy, but I've spent a few hours trying things and searching for answers to no avail.
Please help.
Thanks!
You are looking for the function diff(). This function will calculate differences between two consecutive numbers of vectors.
set.seed(1)
x<-sample(1:10,10)
x
[1] 3 4 5 7 2 8 9 6 10 1
diff(x)
[1] 1 1 2 -5 6 1 -3 4 -9
To do your original problem of diffing 1st and 2nd, 3rd and 4th, I'd transform the data into a matrix and do diffs along columns.
set.seed(1)
x=sample(1:10,10)
x
[1] 3 4 5 7 2 8 9 6 10 1
So our answer is going to be -1, -2, -6, 3, 9 from (3-4), (5-7), (2-8), (9-6) and (10-1).
This makes our matrix:
> matrix(x,nrow=2)
[,1] [,2] [,3] [,4] [,5]
[1,] 3 5 2 9 10
[2,] 4 7 8 6 1
and then we apply diff to columns, adding a negative sign because diff does subtraction the other way round:
> -apply(matrix(x,nrow=2),2,diff)
[1] -1 -2 -6 3 9

why doesn't reshape() in MATLAB give me the same output as matrix() in R?

The first code is in R:
> matrix(1:6,nrow=3,byrow=T)
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
The second code is in MATLAB:
>> reshape(1:6,3,2)'
ans =
1 2 3
4 5 6
In MATLAB matrices are stored column-wise. What your reshape does is take a row vector of 1:6, and start filling out a new 3-by-2 matrix column-wise:
1 4
2 5
3 6
The apostrophe after the reshape transposes this to produce:
1 2 3
4 5 6
To obtain what you actually want, create a 2-by-3 matrix using reshape first, then transpose it.
reshape(1:6, 2, 3)'
Your problem here is, that you want do the command by rows ... imho this isn't possible in Matlab. So you need a little workaround
reshape(1:6,2,3)
give you
1 3 5
2 4 6
Transposing all this.
reshape(1:6,2,3)'
gives you this result:
1 2
3 4
5 6
For more detail see
Doku reshape
First, you're transposing a 3x2 matrix in MATLAB, which you don't do in R:
>> reshape(1:6,3,2) % NOTE: no apostrophe at the end
ans =
1 4
2 5
3 6
Second, you are filling by row in R, which is different from the default (and also different from how MATLAB does it);
> matrix(1:6,nrow=3,byrow=F)
[,1] [,2]
[1,] 1 4
[2,] 2 5 # yay, results agree!
[3,] 3 6
If you want to fill by row in MATLAB, you'll have to reverse the arguments and transpose:
>> reshape(1:6, 2,3).' % NOTE: arguments for row and column counts reversed
ans =
1 2
3 4
5 6
I believe this will give you the output you are looking for:
reshape(1:6, 2, 3)'

Resources