Setting NA in a matrix using another logical matrix - r

I just saw what seemed like a perfectly good question that was deleted and since like the original questioner I couldn't find a duplicate, I'm posting again.
Assume that I have a simple matrix ("m"), which I want to index with another logical matrix ("i"), keeping the original matrix structure intact. Something like this:
# original matrix
m <- matrix(1:12, nrow = 3, ncol = 4)
# logical matrix
i <- matrix(c(rep(FALSE, 6), rep(TRUE, 6)), nrow = 3, ncol = 4)
m
i
# Desired output:
matrix(c(rep(NA,6), m[i]), nrow(m), ncol(m))
# however this seems bad programming...
Using m[i] returns a vector and not a matrix. What is the correct way to achieve this?

The original poster added a comment saying he'd figured out a solution, then almost immediately deleted it:
m[ !i ] <- NA
I had started an answer that offered a slightly different solution using the is.na<- function:
is.na(m) <- !i
Both solutions seem to be reasonable R code that rely upon logical indexing. (The i matrix structure is not actually relied upon. A vector of the proper length and entries would also have preserved the matrix structure of m.)

Both solutions provide above works and are fine. Here is another solution to produce a new matrix, without modifying the previous one. Make sure that your matrix of logical values are well store as logical, and not as character.
vm <- as.vector(m)
vi <- as.vector(i)
new_v <- ifelse(vi, vm, NA)
new_mat <- matrix(new_v, nrow = nrow(m), ncol=ncol(m))

Related

Is there a native R syntax to extract rows of an array?

Imagine I have an array in R with N dimensions (a matrix would be an array with 2 dimensions) and I want to select the rows from 1 to n of my array. I was wondering if there was a syntax to do this in R without knowing the number of dimensions.
Indeed, I can do
x = matrix(0, nrow = 10, ncol = 2)
x[1:5, ] # to take the 5 first rows of a matrix
x = array(0, dim = c(10, 2, 3))
x[1:5, , ] # to take the 5 first rows of a 3D array
So far I haven't found a way to use this kind of writing to extract rows of an array without knowing its number of dimensions (obviously if I knew the number of dimensions I would just have to put as many commas as needed). The following snippet works but does not seem to be the most native way to do it:
x = array(0, dim = c(10, 2, 3, 4)
apply(x, 2:length(dim(x)), function(y) y[1:5])
Is there a more R way to achieve this?
Your apply solution is the best, actually.
apply(x, 2:length(dim(x)), `[`, 1:5)
or even better as #RuiBarradas pointed out (please vote his comment too!):
apply(x, -1, `[`, 1:5)
Coming from Lisp, I can say, that R is very lispy.
And the apply solution is a very lispy solution.
And therefore it is very R-ish (a solution following the functional programming paradigm).
Function slice.index() is easily overlooked (as I know to my cost! see magic::arow()) but can be useful in this case:
x <- array(runif(60), dim = c(10, 2, 3))
array(x[slice.index(x,1) %in% 1:5],c(5,dim(x)[-1]))
HTH, Robin

In R, is it possible to use a pair, tuple or equivalent in a matrix?

I am trying to create a matrix of coordinates(indexes) that I randomly pick one from using the sample function. I then use these to select a cell in another matrix. What is the best way to do this? The trouble is how to store these integers in the matrix so that they are easy to separate. Right now I have them stored as strings with a comma, that I then split. Someone suggested I use a pair, or a string, but I cannot seam to get these to work with a matrix. Thanks!
EDIT:What i currently have looks like this (changed a little to make sense out of context):
probs <- matrix(c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0),5,5)
cordsMat <- matrix("",5,5)
for (x in 1:5){
for (y in 1:5){
cordsMat[x,y] = paste(x,y,sep=",")
}
}
cords <- sample(cordsMat,1,,probs)
cordsVec <- unlist(strsplit(cords,split = ","))
cordX <- as.numeric(cordsVec[1])
cordY <- as.numeric(cordsVec[2])
otherMat[cordX,cordY]
It sort of works but i would also be interested for a better way, as this will get repeated a lot.
If you want to set the probabilities it can easily be done by providing it to sample
# creating the matrix
matrix(sample(rep(1:6, 15:20), 25), 5) -> other.mat
# set the probs vec
probs <- c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0)
# the coordinates matrix
mat <- as.matrix(expand.grid(1:nrow(other.mat),1:ncol(other.mat)))
# sampling a row randomly
sample(mat, 1, prob=probs) -> rand
# getting the value
other.mat[mat[rand,1], mat[rand,2]]
[1] 6

Subtract each col in a df from every other col

I would like to try out a normalisation method a friend recommended, in which each col of a df should be subtracted, at first from the first col and next from every other col of that df.
eg:
df <- data.frame(replicate(9,1:4))
x_df_1 <- df[,1] - df[2:ncol(df)]
x_df_2 <- df[,2] - df[c(1, 3:ncol(df))]
x_df_3 <- df[,3] - df[c(1:2, 4:ncol(df))]
...
x_cd_ncol(df) <- df[c(1: (1-ncol(df)))]
As the df has 90 cols, doing this by hand would be terrible (and very bad coding). I am sure there must be an elegant way to solve this and to receive at the end a list containing all the dfs, but I am totally stuck how to get there. I would appreciate a dplyr method (for familiarity) but any working solution would be fine.
Thanks a lot for your help!
Sebastian
I may have found a solution that I am sharing here.
Please correct me if im wrong.
This is a permutation without replacement task.
The original df has 90 cols.
Lets check how many combinations there are possible first:
(from: https://davetang.org/muse/2013/09/09/combinations-and-permutations-in-r/)
comb_with_replacement <- function(n, r){
return( factorial(n + r - 1) / (factorial(r) * factorial(n - 1)) )
}
comb_with_replacement(90,2) #4095 combinations
Now using a modified answer from here: https://stackoverflow.com/a/16921442/10342689
(df has 90 cols. don't know how to create this proper as an example df here.)
cc_90 <- combn(colnames(df), 90)
result <- apply(cc_90, 2, function(x) df[[x[1]]]-df[[x[2]]])
dim(result) #4095
That should work.
In R one can index using negative indices to represent "all except this index".
So we can re-write the first of your normalization rows:
x_df_1 <- df[,1] - df[2:ncol(df)]
# rewrite as:
x_df_1 <- df[,1] - df[,-1]
From this, it's a pretty easy next step to write a loop to generate the 90 new dataframes that you generated 'by hand':
list_of_dfs=lapply(seq_len(ncol(df)),function(x) df[,x]-df[,-x])
This seems to be somewhat different to what you're proposing in your own answer to your question, though...

How can I create Variables A1,A2,...,A100 using a loop in R?

I am trying to create 1000 variables, which I want to name with the index number. I don't know how to create these new variables.
for(i in 1:1000) {
Ui <- rnorm(200,0,1)
}
This is a common sort of thing that people want to do, especially when they are coming from other programming languages. However, there are better ways to accomplish the same thing, and you should not follow recommendations to use assign; that is bad advice that you will likely regret later on.
The way we do this sort of thing in R is to use lists, specifically named lists:
x <- replicate(1000,rnorm(200,0,1),simplify = FALSE)
x <- setNames(x,paste0("A",seq_along(x)))
Now x is a named list of length 1000, each element is a vector of length 200 from a normal(0,1) distribution.
You can refer to each one via x[[1]] or x[["A1"]] as needed. Additionally, since they are all in the same object, you can operate on the easily as a group using tools like lapply.
Pretty much any time you find yourself wanting to create a sequence of objects with similar names, that should be a signal to you that you should be using a list instead.
There is no point in cluttering your environment with so many variables, try to store them in a named list instead
l1 <- setNames(lapply(1:5, function(x) rnorm(5)), paste0("A", 1:5))
l1
#$A1
#[1] 0.4951453 -1.4278665 0.5680115 0.3537730 -0.7757363
#$A2
#[1] -0.11096037 0.05958700 0.02578168 1.00591996 0.54852030
#$A3
#[1] 0.1058318 0.6988443 -0.8213525 -0.1072289 0.8757669
#$A4
#[1] -0.6629634 0.8321713 -0.3073465 -0.2645550 -1.0064132
#$A5
#[1] 2.2191246 0.2054360 -0.1768357 1.6875302 -1.1495807
Now you can access individual list element as
l1[["A1"]]
#[1] 0.4951453 -1.4278665 0.5680115 0.3537730 -0.7757363
Moreover, other method is to generate all the numbers together and then split them into list.
groups = 5
each = 5
setNames(split(rnorm(groups * each), rep(seq_len(groups), each = each)),
paste0("A", seq_len(groups)))
I agree with the others that this is not a good idea. Anyway, to answer your question how you wolud do that is
k <- 1000 # number of variables
n <- 200 # sample size of each variable
for(i in 1:k){
assign(paste0("variable", i), rnorm(n, 0, 1))}
variable1
-0.012947062 0.728284959 -1.627796366 0.003471491 ...
However, personally I would prefer another solution. The both answers so far suggest using lists. I find lsts quite cumbersome, especially if you are new to R. So I would suggest creating a matrix where every column contains one variable.
# creates a matrix
m <- matrix(rep(NA, n*k), ncol= k)
# generates rnorm() in each column
for(i in 1:k){
m[ , i] <- rnorm(n, 0, 1)
}
# now you can name the columns
colnames(m) <- paste0("variable", 1:k)
m
variable1 variable2 ...
[1,] 0.30950749 -2.07388046
[2,] -1.13232330 -0.55511476
...

Efficiently substract from matrix columnwise

I have a rather simple problem and was wondering whether some of you guys know a very efficient (=fast) solution for this:
I have two matrices mat and arr and want to accomplish the following: Take every column of arr and substract it from mat. Then take the logarithm of one minus the absolute value of the difference. That's it. Right now, I'm using sapply (see below), but I'm pretty sure that it's possible to do it faster (maybe using sweep?)
Code:
mat <- matrix(.3, nrow=10, ncol = 4)
arr <- matrix(.1, nrow=10, ncol = 10000)
i <- ncol(arr)
result <- sapply(1:i, function(ii) (log(1-abs(mat-arr[,ii]))))
Thanks for any ideas!
We could replicate and then do a difference
result2 <- matrix(log(1- abs(rep(mat, ncol(arr)) -
rep(arr, ncol(mat)))), ncol = i)
identical(result, result2)
#[1] TRUE

Resources