Given an array of positions, for each row in a data frame. Fetch those elements in the most efficient way without using 'apply' functions.
Here's an example because I'm terrible at explaining. Given this matrix (or data frame):
A = matrix(1:9, nrow = 3, ncol = 3)
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Fetch the following elements from each column: 1, 3, 3
That means I want to fetch 1, 8 and 9. The first, third and third elements of each row.
I would expect that A[, c(1,3,3)] would do the trick. But it seems like I need to wrap it up in the diag function.
diag(A[, c(1,3,3)])
This looks like killing a fly with a nuke. It's extremely inefficient. I know there has to be a simple way to do this (without using apply or any of its family). Thanks in advance!
Related
This question already has answers here:
Is there a way to select all elements of a dimension when matrix-indexing a multidimensional array in R?
(1 answer)
Subset an array using a vector of indices
(1 answer)
Closed 1 year ago.
Let's assume we've a 3-dimensional array like
a <- array(1:24, dim = c(4, 3, 2))
Accessing or indexing an array is usually done via
# fixing the third (=last) dimension and output the corresponding values of the first and second dimension
a[, , 1]
> a
, , 1
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
In this example, I know the number of dimensions is three. So, I can manually type two commas followed by a number for the last dimension (here 1). But now I want to create an expression for dynamically indexing an array by fixing the last dimension a priori not knowing the number of dimensions. This construct should later be used within a loop e.g. lapply().
dims <- paste0(paste0(rep("", length(dim(a)) - 1L), collapse = ","), ",idx")
> dims
[1] ",,idx"
Am I able to convert this dynamically created string ",,idx" into whatever - I know that indexing by itself isn't strongly an expression, that can be used for indexing?
a[dims] <- ... # won't work!
Thanks in advance!
Here is a way, but I'm not sure it's the way you want:
write.table(paste0("a[",dims,"]"),"code.R",row.names = FALSE,col.names = FALSE,quote=FALSE)
source("code.R")$value
This will create a separate R script which contains just the a[,,idx] and then run that script to return the value.
I would like to substitute a one-row vector for some of the rows of a matrix in R. Here is an example.
I would like to substitute the row "5,6" for the rows in A where the entries are 1. So, I would like to make "A" look like "A_goal"
The method I attempted (see the bottom line) was close, but it seems that it's writing "down the columns" instead of across the rows.
A=matrix(c(1,2,1,3,1,2,1,3),4,2)
B=matrix(c(5,6),1,2)
A_goal = matrix(c(5,2,5,3,6,2,6,3),4,2)
A
B
A_goal
# Here is an attempt that didn't work:
A[A==1]=B
A
Matrix indexing using {<- is done with column major ordering. So you will probably need to use apply on a row basis. This is essentially a for-loop over the rows of A. You also will need to transpose since apply will also deliver the results as columns:
t(apply(A, 1, function(x) if(x[1]==1){B}else{x}))
[,1] [,2]
[1,] 5 6
[2,] 2 2
[3,] 5 6
[4,] 3 3
If you were only intending the replacement to occur where the row was c(1,1) then the logical test would need to be modified to x == c(1,1)
I'm trying to perform basic excel-like formula-filling in R. I want to populate the value of a "cell" based on the values of other cells in the same matrix or data.frame. The function is pretty straightforward to do with a single cell, but seems to be more difficult to scale across both rows and columns.
Say I have a simple matrix:
simple <- matrix(c(0,1,2,3,0,4,5,6,7,NA,NA,NA,8,NA,NA,NA), nrow = 4, ncol = 4)
[,1] [,2] [,3] [,4]
[1,] 0 0 7 8
[2,] 1 4 NA NA
[3,] 2 5 NA NA
[4,] 3 6 NA NA
I want to populate the NAs with the sum of columns 1 and 2 in the same row and row 1 in the same column. In Excel, for cell C2 it would be
=$A2 + $B2 + C$1
in R
simple[2,3] <- simple[2,1] + simple[2,2] + simple[1,3]
In Excel, you can simply drag the formula over the remaining cells, and voila. In R, not so easy.
Since r is vectorized, I can fill a whole column pretty easily by giving ranges instead of single cells, like so:
simple[2:4,3] <- simple[2:4,1] + simple[2:4,2] + simple[1,3]
[,1] [,2] [,3] [,4]
[1,] 0 0 7 8
[2,] 1 4 12 NA
[3,] 2 5 14 NA
[4,] 3 6 16 NA
But when I try to vectorize over both rows and columns, it doesn't work because it interprets the last value as the vector c(7,8), and tries to add that in a row-wise fashion, rather than adding it column-wise.
simple[2:4,3:4] <- simple[2:4,1] + simple[2:4,2] + simple[1,3:4]
Warning message:
In simple[2:4, 1] + simple[2:4, 2] + simple[1, 3:4] :
longer object length is not a multiple of shorter object length
[,1] [,2] [,3] [,4]
[1,] 0 0 7 8
[2,] 1 4 12 12
[3,] 2 5 15 15
[4,] 3 6 16 16
As an alternative solution, one could do nested for loops, as below:
for (i in 2:4){
for (j in 3:4){
simple[i,j] <- simple[i,1] + simple[i,2] + simple[1,j]
}
}
[,1] [,2] [,3] [,4]
[1,] 0 0 7 8
[2,] 1 4 12 13
[3,] 2 5 14 15
[4,] 3 6 16 17
This actually works and is pretty easy, but it involves nested for loops, so, enough said.
I feel like the "right" solution would be one using correct vectorization, apply(), or dplyr, but I can't seem to figure out how to make them work, short of rearranging the data from a crosstab format to a flat format, but that can explode your file size pretty quickly.
Any ideas on how to make this work in a more R-ish fashion?
Here's a more R like way to do it, let's convert simple to a data.frame first.
library(tidyverse)
df1 <- as.data.frame(simple)
df1 %>% mutate(V3 = V1 + V2 + first(V3), V4 = V1 + V2 + first(V4))
V1 V2 V3 V4
1 0 0 7 8
2 1 4 12 13
3 2 5 14 15
4 3 6 16 17
first from dplyr is handy because it lets you lock to the first value in the column, like you would in Excel with C$1
In matrix arithmetic, each component must be same dimension or any being a single-item vector. Therefore, consider aligning by replicating 7 and 8 for each needed row 2-4 (i.e., 3 times). Then transpose for 2 X 3 dimension:
simple[2:4,3:4] <- simple[2:4,1] + simple[2:4,2] + t(replicate(length(2:4), simple[1,3:4]))
Alternatively, consider sapply iterating through 7 and 8 values respectively:
simple[2:4,3:4] <- sapply(3:4, function(i) simple[2:4,1] + simple[2:4,2] + simple[1,i])
Slightly more concise with rowSums and leaving out row indexing:
simple[,3:4] <- sapply(3:4, function(i) rowSums(simple[,1:2]) + simple[1,i])
I may be late to the game but here is a data.table and base R solution which for large data sets is much faster than tidyverse. The syntax may look more confusing at first but breaking it down piece by piece is very logical and straight-forward once you have a good handle on lapply.
To make the cell and the vectors you are adding compatible you should convert the cell to a vector by simply replicating that value as many times as the number of observations or rows of the dataframe. So in your example, V3 = rep(7,4) will yield a vector with all 7s. R will then let you do V3=V1+V2+V3, where V3 on the right-hand side is the rep(7,4).
The data.table has some handy built-in special read-only symbols that will also give you the ability to extend the solution beyond the two columns you provided in the example. The two I use most frequently are .SD and .N. In this example, you can think of .SD as a way to refer to all columns except the first two and .N is always a constant number equal to the number of rows in the data.table. These symbols can be used in the j slot of a data.table which is equivalent to the columns of a matrix or data.frame object. So your code would look like this:
simple <- data.table(simple)
NAcols <- colnames(simple)[-c(1,2)] ##Can modify this to get names of columns you wish to change if its not the first two using match or grep. I can add that if you want?
simple[,NAcols:=lapply(.SD,function(i) V1+V2+rep(i[1],.N)),.SDcols=NAcols]
Note that each iteration in the lapply loop is simply the ith column and i[1] selects only the first element of that column and replicates it as many times as the number of rows (.N) before adding the three vectors together. The .SDcols is used to prevent this function from being applied to the first two columns. Though there was no need in this problem to group, data.table also allows you to specify 'by = ' as an argument if you want to group by a particular column or columns in the data.table before applying the function. Finally note that I did not need to assign the last line of code to another R object because data.table updates the old columns of 'simple' using pointers which is why it is so much faster than base R and tidyverse data frame objects. However you can use the copy function of data.table like this instead if you wish to save the original data.table for some reason:
final_result <- copy(simple)[,NAcols:=lapply(.SD,function(i) V1+V2+rep(i[1],.N)),.SDcols=NAcols]
Anyway I hope that explanation helps and if you need me to clarify anything please let me know! Best of luck!
The R language doesn't allow vectors to be variables. Why it is missing the feature? it would be nice my data frame with following features have something like this:
X1 X2
1. [1,2,3] [2,3,4] <br>
2. .... ....
I tried df <- as.data.frame(c(1,2,3),c(1,2,3)) but keep getting 3 rows created with numeric type instead I want a single row with vector type
Use an array:
array(rbind(1:3, 2:4), dim = c(1, 2, 3))
#, , 1
#
# [,1] [,2]
#[1,] 1 2
#
#, , 2
#
# [,1] [,2]
#[1,] 2 3
#
#, , 3
#
# [,1] [,2]
#[1,] 3 4
R does not allow to work with row data types u can have column data types. Each column can be a separate data type vector. You are not thinking the 'R' way. This short coming as you are putting it, is the strength of R. You can use an Array as suggested in the previous answer, if you strictly want to follow your chain of thought. Or just try to see how you can do what you want to do in 'R'. Rather than try to impose your known language to R.
Consider the following matrix:
MAT <- matrix(nrow=3,ncol=3,1:9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
I want to retrieve the row number if I provide a vector which exactly matches a row in MAT. So if I provide c(2,5,8), I should get back 2. I'm unsure how to accomplish this; the closest thing I know is using which to find the location of a single number in a matrix. An alternate could be a very slow quadruple for loop checking if the given vector matches a row in the matrix. Is there a one line solution for this problem?
You can use identical to test, apply loop and which to identify:
which(apply(MAT,1,function(x) identical(x,c(2L,5L,8L))))
[1] 2
Note that the values in the matrix are stored as integers, so you need to specify that in the vector to test.
You can apply a simple matching function to each row, then use which to find the row number:
search_vec = c(2, 5, 8)
vec_matches = apply(MAT, 1, function(row, search_vec) all(row == search_vec), search_vec)
row_num = which(vec_matches)