I have some arrays that I will need to fill in. The names of the arrays are variable, but the same functions will happen to them throughout. Basically I need a way to replace only one "sheet" of an array with another without manually entering the array name. Example below:
big_array_1 <- array(dim = c(5,5,10))
big_array_1[,,1] <- sample(c(1:10), 25, replace=T)
big_array_2 <- array(dim = c(5,5,10))
big_array_2[,,1] <- sample(c(40:50), 25, replace=T)
small_array <- array(dim = c(5,5,2))
small_array[,,] <- sample(c(20:30), 50, replace=T)
so each big array will have to have its second sheet (the third dimension) replaced by the second sheet of the small array, but I want to just be able to set a number (i.e. big array "1" or "2") to make this work in my code instead of change the name manually every time.
# So I know I can do this, but I want to avoid manually changing the "_1" to "_2" when I run the script
big_array_1[,,2] <- small_array[,,2]
# instead, I'm hoping I can use a variable and some kind of assign()
arraynumber <- 1
# but this gives an error for assigning a non-language object
get(paste0("big_array_",arraynumber))[,,2] <- small_array[,,2]
# and this gives an error for invalid first argument.
assign(get(paste0("big_array_",arraynumber))[,,2], small_array[,,2])
# even though get(paste0("big_array_",arraynumber))[,,2] works on its own.
Any suggestions?
In R, you cannot assign values to the result of get(). Additionally, it is not advisable to use assign even attach, eval+parse, list2env, and other environment-changing, dynamic methods that tend to be hard to debug.
As commented, simply use named lists for identically structured objects. Lists can contain any object from arrays to data frames to plots without a number limit. Even more, you avoid flooding global environment with separately named objects but work with a handful of lists containing many underlying elements which better manages data for assignment or iterative needs.
Definition
set.seed(8620)
# NAMED LIST OF TWO ARRAYS
big_array_list <- list(big_array_1 = array(dim = c(5,5,10)),
big_array_2 = array(dim = c(5,5,10)))
big_array_list$big_array_1[,,1] <- sample(c(1:10), 25, replace=TRUE)
big_array_list$big_array_2[,,1] <- sample(c(40:50), 25, replace=TRUE)
# NAMED LIST OF ONE ARRAY
small_array_list <- list(small_array_1 = array(sample(c(20:30), 50, replace=TRUE),
dim = c(5,5,2)))
Assignment
# ASSIGN BY FIXED NAME
big_array_list$big_array_1[,,2] <- small_array_list$small_array_1[,,2]
big_array_list$big_array_1[,,2]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 30 29 26 24 23
# [2,] 21 20 22 20 24
# [3,] 27 24 26 30 30
# [4,] 30 26 24 29 25
# [5,] 26 21 26 20 30
# ASSIGN BY DYNAMIC NAME
arraynumber <- 1
big_array_list[[paste0("big_array_",arraynumber)]][,,2] <- small_array_list[[paste0("small_array_",arraynumber)]][,,2]
big_array_list[[paste0("big_array_",arraynumber)]][,,2]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 30 29 26 24 23
# [2,] 21 20 22 20 24
# [3,] 27 24 26 30 30
# [4,] 30 26 24 29 25
# [5,] 26 21 26 20 30
# ASSIGN BY INDEX
big_array_list[[1]][,,2] <- small_array_list[[1]][,,2]
big_array_list[[1]][,,2]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 30 29 26 24 23
# [2,] 21 20 22 20 24
# [3,] 27 24 26 30 30
# [4,] 30 26 24 29 25
# [5,] 26 21 26 20 30
Iterative Needs
# RETURN DIMENSIONS OF EACH big_array
lapply(big_array_list, dim)
# SHOW FIRST 5 ELEMENTS OF EACH big_array
sapply(big_array_list, `[`, 1:5)
# RETURN LIST WHERE ALL big_arrays ARE EQUAL TO small_array
mapply(`<-`, big_array_list, small_array_list, SIMPLIFY=FALSE)
Related
I'm new to R and stuck. I want to reduce the number of columns in a 92x8192 matrix. The matrix consists of 92 observations and each column resembles a data point in a spectrum. The value corresponds to an intensity that is an integer. I want to reduce the "resolution" (i.e. the number of data points = columns) of the spectrum in a somewhat controlled way.
Example:
[,1] [,2] [,3] [,4] [,5] [,6] [...]
[1,] 1 2 3 4 5 6
[2,] 7 8 9 10 11 12
[3,] 13 14 15 16 17 18
[4,] 19 20 21 22 23 24
[5,] 25 26 27 28 29 30
[6,] 31 32 33 34 35 36
What i would like to do is compare adjacent columns (for each row) e.g [1,1] and [1,2], and find the max value of those two entries (that would be [1,2] in that case). The smaller value should be dropped, and the next two adjacent columns should be evaluated. So that in the end there will only be ncol/2 left. I know there is something like pmax. But since my knowledge with loops and functions is far too limited at this point i don't know how to not only compare two columns at a time but do it for all 4096 pairs of values in each row. In the end the matrix should look like this:
[,1] [,2] [,3] [...]
[1,] 2 4 6
[2,] 8 10 12
[3,] 14 16 18
[4,] 20 22 24
[5,] 26 28 30
[6,] 32 34 36
The values i have used are not a good example because i know that in this case it looks like i could just drop every other column and i know how to do that.
Apologies if the question is worded in a complicated way but i think the task isn't really all that complicated.
Thanks for any help or suggestions on how to go about this task.
Example matrix:
> set.seed(101)
> x_full <- matrix(runif(30), nrow=5)
> x_full
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.37219838 0.3000548 0.8797957 0.59031973 0.7007115 0.79571976
[2,] 0.04382482 0.5848666 0.7068747 0.82043609 0.9568375 0.07121255
[3,] 0.70968402 0.3334671 0.7319726 0.22411848 0.2133520 0.38940777
[4,] 0.65769040 0.6220120 0.9316344 0.41166683 0.6610615 0.40645122
[5,] 0.24985572 0.5458286 0.4551206 0.03861056 0.9233189 0.65935508
Now reduce:
> x_reduced <- sapply(seq(1, ncol(x_full), 2), function(colnum) { pmax(x_full[, colnum], x_full[, colnum + 1]) })
> x_reduced
[,1] [,2] [,3]
[1,] 0.3721984 0.8797957 0.7957198
[2,] 0.5848666 0.8204361 0.9568375
[3,] 0.7096840 0.7319726 0.3894078
[4,] 0.6576904 0.9316344 0.6610615
[5,] 0.5458286 0.4551206 0.9233189
How it works: seq(1, ncol(x_full), 2) generates a sequence of integers representing the odd numbers up to the number of columns of x_full. Then sapply() applies a function to this sequence and presents the results in a tidy format (in this case it happens to be a matrix as we require). The function being applied is one that we specify using function: for column numbered colnum it just applies pmax() across that column and the next.
Example solution
mat = mat <- matrix(1:16,nrow=4)
m <- matrix(nrow=nrow(mat),ncol=ncol(mat)/2+1) #preassign a solution matrix to save time
for (i in seq(1,ncol(mat),2)){m[,i/2+1]<-(pmax(mat[,i],mat[,i+1]))}
your solution is then stored in m
I have found read.csv("file.csv")$V1 that may help to split exported table into columns but my data is organised in a row-by-row fashion, so I would like to record elements to vector sequentially from element[1][1] ->...->element[n][n]. Any thoughts how it could be accomplished in R?
Update:
Once imported mydata looks like:
dat <- matrix(1:27, nrow = 3)
dat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 4 7 10 13 16 19 22 25
[2,] 2 5 8 11 14 17 20 23 26
[3,] 3 6 9 12 15 18 21 24 27
Desired output would be vector: c(1, 2, 3, 4, 5, 6, 7.....)
With the code I provided a simple solution could be to extract simply the row, but it looks too much easy maybe I missed something.
new_dat <- dat[1, ]
new_dat
[1] 1 4 7 10 13 16 19 22 25
Edit
My solution works well but it is not efficient. Here I have an improved loop versions so you can store objects separately in only one command.
First define elements that will be the name of the objects:
val <- c(1:3)
nam <- "new_dat_"
and then extract all elements with the loop.
for(i in 1:nrow(dat)){ assign(paste(nam, val, sep = "")[i], dat[i, ]) }
after that use ls() and you should see 3 elements named new_dat_1","new_dat_2", "new_dat_3" "val" each of them contains one row of your dat. This solution can be very helpful if you have to extract several rows and not just one and lead to this output:
new_dat_3
[1] 3 6 9 12 15 18 21 24 27
new_dat_1
[1] 1 4 7 10 13 16 19 22 25
new_dat_2
[1] 2 5 8 11 14 17 20 23 26
This question already has answers here:
Mean of each element of a list of matrices
(3 answers)
Closed 6 years ago.
I have a list of matrices:
.list <- list(matrix(1:25, ncol = 5), matrix(11:35, ncol = 5))
I would like to use the Reduce method to find the element-by-element means of the matrices in the list.
In other words, I am looking for the following result:
res = matrix(6:30, ncol = 5)
I tried the following:
res = Reduce(mean, .list)
but I get an error:
Error in mean.default(init, x[[i]]) :
'trim' must be numeric of length one
Note that an element of a matrix can be NA.
Any help would be appreciated! Thank you!!
I just realized that this could be achieved the following way (using the Reduce function):
tmp = Reduce('+', .list)
result = tmp/length(.list)
This is probably easier to solve via an array, rather than a list, as R has some inbuilt, vectorised approaches to this problem.
To get an array from .list, unlist it and supply the relevant dimensions (which could be automated by looking up the dim() of .list[[1]] and length(.list):
arr <- array(unlist(.list), dim = c(5,5,2))
Then, the desired result is obtained via rowMeans() (yes, really!)
rowMeans(arr, dim = 2)
R> rowMeans(arr, dim = 2)
[,1] [,2] [,3] [,4] [,5]
[1,] 6 11 16 21 26
[2,] 7 12 17 22 27
[3,] 8 13 18 23 28
[4,] 9 14 19 24 29
[5,] 10 15 20 25 30
The na.rm argument handles the NA case too:
R> rowMeans(arr, dim = 2, na.rm = TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] 6 11 16 21 26
[2,] 7 12 17 22 27
[3,] 8 13 18 23 28
[4,] 9 14 19 24 29
[5,] 10 15 20 25 30
A slower way is to use apply(), which may be more instructive as to what rowMeans() is doing:
R> apply(arr, 1:2, mean, na.rm = TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] 6 11 16 21 26
[2,] 7 12 17 22 27
[3,] 8 13 18 23 28
[4,] 9 14 19 24 29
[5,] 10 15 20 25 30
i.e applying the mean function, grouping the data by the row and column dimensions. Think of the array as a box, with the height of the box being the third dimension. This box consists of little cubes, like a rubic cube. We want the mean of the little cubes stacked up above each row and column combination; the mean of the little cubes stacked above (1,1), and so on. This is what the apply() and rowMeans() functions do for you, if you treat the multiple matrices in a list as an array.
Here is one way with mapply.
matrix(do.call(mapply, c(function(...) mean(unlist(list(...))), .list)), ncol=5)
As a side note, .list isn't the best way to use a keyword as a variable name. In R, the period prefix means something like "meta-variable", and these variables won't show up when you call ls(). You could do list. or the easier to read list_.
I have a matrix, like the one generated with this code:
> m = matrix(data=c(1:50), nrow= 10, ncol = 5);
> colnames(m) = letters[1:5];
If I filter the columns, and the result have more than one column, the new matrix keeps the names. For example:
> m[, colnames(m) != "a"];
b c d e
[1,] 11 21 31 41
[2,] 12 22 32 42
[3,] 13 23 33 43
[4,] 14 24 34 44
[5,] 15 25 35 45
[6,] 16 26 36 46
[7,] 17 27 37 47
[8,] 18 28 38 48
[9,] 19 29 39 49
[10,] 20 30 40 50
Notice that here, the class is still matrix:
> class(m[, colnames(m) != "a"]);
[1] "matrix"
But, when the filter lets only one column, the result is a vector, (integer vector in this case) and the column name, is lost.
> m[, colnames(m) == "a"]
[1] 1 2 3 4 5 6 7 8 9 10
> class(m[, colnames(m) == "a"]);
[1] "integer"
The name of the column is very important.
I would like to keep both, matrix structure (a one column matrix) and the column's name.
But, the column's name is more important.
I already know how to solve this by the long way (by keeping track of every case). I'm wondering if there is an elegant, enlightening solution.
You need to set drop = FALSE. This is good practice for programatic use
drop
For matrices and arrays. If TRUE the result is coerced to the lowest possible dimension (see the examples)
m[,'a',drop=FALSE]
This will retain the names as well.
You can also use subset:
m.a = subset(m, select = colnames(m) == "a")
I have an N-by-M matrix X, and I need to calculate an N-by-N matrix Y:
Y[i, j] = sum((X[i,] - X[j,]) ^ 2) 0 <= i,j <= N
For now, I have to use nested loops to do it with O(n2). I would like to know if there's a better way, like using matrix operations.
more generally, sum(....) can be a function, fun(x1,x 2) of which x1, x2 are M-by-1 vectors.
you can use expand.grid to get a data.frame of possible pairs:
X <- matrix(sample(1:5, 50, replace=TRUE), nrow=10)
row.ind <- expand.grid(1:dim(X)[1], 1:dim(X)[2])
Then apply along each pair using a function:
myfun <- function(n) {
sum((X[row.ind[n, 1],] - X[row.ind[n, 2],])^2)
}
Y <- matrix(unlist(lapply(1:nrow(row.ind), myfun)), byrow=TRUE, nrow=nrow(X))
> Y
[,1] [,2] [,3] [,4] [,5]
[1,] 0 28 15 31 41
[2,] 31 28 33 30 33
[3,] 28 0 15 7 19
[4,] 33 30 19 34 11
[5,] 15 15 0 12 22
[6,] 10 19 10 21 20
[7,] 31 7 12 0 4
[8,] 16 17 16 13 2
[9,] 41 19 22 4 0
[10,] 14 11 28 9 2
>
I bet there is a better way but its Friday and I'm tired!
(x[i]-x[j])^2 = x[i]² - 2*x[i]*x[j] + x[j]²
and than is middle part just matrix multiplication -2*X*tran(X) (matrix) and other parts are just vetrors and you have to run this over each element
This has O(n^2.7) or whatever matrix multiplication complexity is
Pseudocode:
vec=sum(X,rows).^2
Y=X * tran(X) * -2
for index [i,j] in Y:
Y[i,j] = Y[i,j] + vec[i]+vec[y]
In MATLAB, for your specific f, you could just do this:
Y = pdist(X).^2;
For a non-"cheating" version, try something like this (MATLAB):
[N, M] = size(X);
f = #(u, v) sum((u-v).^2);
helpf = #(i, j) f(X(i, :), X(j, :))
Y = arrayfun(helpf, meshgrid(1:N, 1:N), meshgrid(1:N, 1:N)');
There are more efficient ways of doing it with the specific function sum(...) but your question said you wanted a general way for a general function f. In general this operation will be O(n^2) times the complexity of each vector pair operation because that's how many operations need to be done. If f is of a special form, some calculations' results can be reused.