Removing elements that are multiples of 4 from a matrix in R - r

This code is supposed to remove all mutiples of 4 from the given vector, when I run it, only 8 gets removed.
multipleoffour<- function(y){
y2<-y
for (n in y )
{if (n%%4==0)
y2<-y2[-n]
}
return (y2)
}
multipleoffour(c(2,4,6,8,10,12,14))

Since R is vectorized this is more of an R way to do this:
multipleoffour<- function(y){
y[y %% 4 != 0]
}
multipleoffour(c(2,4,6,8,10,12,14))
## [1] 2 6 10 14

The reason why your code doesn't work is because y2<-y2[-n] remove the nth element of the vector and not n itself. In your example it removed 8, which is the 4th element of your vector. Otherwise, I agree with other answers about how to do it more efficiently.

Related

Why does R not remove elements properly over an empty set of indices?

I have encountered some strange behaviour in R. Suppose I have a matrix and I want to remove a specified set of rows and columns. Here is an example where this works perfectly well.
#Create a matrix
MATRIX <- matrix(1:50, nrow = 4, ncol = 5)
rownames(MATRIX) <- c('a', 'b', 'c', 'd')
colnames(MATRIX) <- c('a', 'b', 'c', 'd', 'e')
#Specify rows and columns to remove
REMOVE.ROW <- 3
REMOVE.COL <- 2
#Print the matrix without these rows or columns
MATRIX[-REMOVE.ROW, -REMOVE.COL]
a c d e
a 1 9 13 17
b 2 10 14 18
d 4 12 16 20
However, when one or both of the objects REMOVE.ROW or REMOVE.COL are empty, instead of removing nothing (and therefore giving back the original matrix), it gives me back an empty matrix.
#Specify rows and columns to remove
REMOVE.ROW <- integer(0)
REMOVE.COL <- integer(0)
#Print the matrix without these rows or columns
MATRIX[-REMOVE.ROW, -REMOVE.COL]
<0 x 0 matrix>
Intuitively, I would have expected the removal of an empty set of indices to leave me with the original set of indices, and so I would have expected the full matrix back from this command. For some reason, R removes all rows and columns from the matrix in this case. As far as I can make out, this appears to be a bug in R, but perhaps there is some good reason for it that I am unaware of.
Question: Can someone explain why R is doing things this way? Aside from using if-then statements to deal with the special cases, is there any simple adjustment I can make to have R behave as I want it to?
Empty objects have this strange property that they are not NULL, hace length 0 but are not subsettable. A possible workaround is to consider every possible combination and use the property that length(integer0) is equal to zero. I understand that this solution might not be ideal.
is.na(integer(0))
#> logical(0)
is.null(integer(0))
#> [1] FALSE
length(integer(0))
#> [1] 0
integer(0)[[1]]
#> Error in integer(0)[[1]]: subscript out of bounds
integer(0)[[0]]
#> Error in integer(0)[[0]]: attempt to select less than one element in get1index <real>
MATRIX <- matrix(1:50, nrow = 4, ncol = 5)
#> Warning in matrix(1:50, nrow = 4, ncol = 5): data length [50] is not a sub-
#> multiple or multiple of the number of rows [4]
REMOVE.ROW <- integer(0)
REMOVE.COL <- integer(0)
if (all(length(REMOVE.ROW > 0) , length(REMOVE.COL) > 0)) {
MATRIX[-REMOVE.ROW, -REMOVE.COL]
} else {
if (length(REMOVE.ROW) > 0 && length(REMOVE.COL) == 0) {
MATRIX[-REMOVE.ROW, ]
} else {
if (length(REMOVE.ROW) == 0 && length(REMOVE.COL) > 0) {
MATRIX[, -REMOVE.COL]
} else {
MATRIX
}
}
}
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 5 9 13 17
#> [2,] 2 6 10 14 18
#> [3,] 3 7 11 15 19
#> [4,] 4 8 12 16 20
Created on 2021-11-27 by the reprex package (v2.0.1)
The problem is that R is using arithmetic negation, not set negation
Based on a helpful comment (hat tip to IceCreamToucan) it appears that this is occurring because of the two-step process involved in indexing matrices using negative indices, which are constructed using arithmetic negation instead of set negation. This appears to be one of those cases where the standard mathematical interpretation of an operation is different to the computational interpretation.
In the mathematical interpretation of indexing a matrix over a set of indices we view set negation as producing a new set composed of elements that are in the original 'sample space' but outside the negated set. In the computational interpretation in R the application of the negative sign is instead producing negative arithmetic values, and these are subsequently interpreted as elements to remove when calling the matrix.
What is happening in this case: For the usual case where we have a non-empty set of indices, using the negation sign simply turns the indices into negative values and then when we call the matrix it looks over all the indices other than the negative values.
#Specify rows and columns to remove
REMOVE.ROW <- 3
REMOVE.COL <- 2
#See negatives of the removed indices
identical(MATRIX[-REMOVE.ROW, -REMOVE.COL], MATRIX[-3, -2])
[1] TRUE
However, when we use an empty vector of indices, the negative of that vector is still the empty vector of indices ---i.e., the vector integer(0) is identical to its negative -integer(0). Consequently, when we try to remove the empty vector of indices, we are actually asking to call the matrix over the negative of the empty vector, which is still the empty vector.
#The empty vector is equivalent to its negative
identical(integer(0), -integer(0))
[1] TRUE
#Therefore, calling over these vectors is equivalent
identical(MATRIX[-integer(0), -integer(0)], MATRIX[integer(0), integer(0)])
[1] TRUE
So, the problem here is that you are interpreting -REMOVE.ROW and -REMOVE.COL as if they were using set negation when actually they are just taking the initial vectors of values and turning them negative (i.e., multiplying them by negative one).
Fixing the problem: There does not seem to be a standard function to call the matrix in a way that interprets the indices using set negation, so you will need to use conditional logic to construct the solution for a specific case or for a custom function. Here is a custom function sub.matrix to remove particular rows and columns, where these are interpreted in the sense of set negation.
sub.matrix <- function(x, remove.rows = integer(0), remove.cols = integer(0)) {
#Check that input x is a matrix
if (!('matrix' %in% class(x))) {
stop('This function is only for objects of class \'matrix\'') }
#Create output matrix
R <- length(remove.rows)
C <- length(remove.cols)
if ((R > 0)&(C > 0)) { OUT <- MATRIX[-remove.rows, -remove.cols] }
if ((R == 0)&(C > 0)) { OUT <- MATRIX[, -remove.cols] }
if ((R > 0)&(C == 0)) { OUT <- MATRIX[-remove.rows, ] }
if ((R == 0)&(C == 0)) { OUT <- MATRIX }
#Return the output matrix
OUT }

How to combine a 'for' loop with an 'if' statement?

I am a newbie, and my question is probably the easiest many of you can imagine. Still, I am stuck, and been looking for answers everywhere, so please help. I am writing a simple loop, consisting of a 'for' and an 'if' statement, evaluating a simple vector. I keep getting wrong results and I do not know where the problem is. Here is the code:
Y <- c(3,5,2,7)
for (n in Y) {
if (n < 6) (Y[n]=0)
}
When evaluated, the result is: 3 0 0 7 0, which is NOT what I am expecting! The code is supposed to evaluate every item in Y and return 0 if it is less than 6. So it should be: 0 0 0 7. What am I doing wrong? Please help!
When you do for (n in Y) you are actually iterating over the elements of your array, so when you try to change the value of Y, you're not passing the index but are passing the value.
You need to do something like this (don't know which language you're using):
Y <- c(3,5,2,7)
for (n in 1:length(Y)) {
if (Y[n] < 6) (Y[n]=0)
}
With your original code what's happening is:
if (3 < 6) (Y[3]=0) --it changes position 3
if (5 < 6) (Y[5]=0) --it adds a 0 to position 5
if (2 < 6) (Y[2]=0) --it changes position 2
if (7 < 6) (Y[7]=0) --nothing happens
As R is a vectorized language, for loops are rarely needed, especially for basic operations. Getting acquainted with the vector features of R can save a lot of typing (and headaches). So, for replacing all values in Y which are less than 6 we can write
Y[Y < 6] <- 0
Y
[1] 0 0 0 7
Here, Y < 6 returns a logical vector of those elements of Y which fulfill the condition
[1] TRUE TRUE TRUE FALSE
and which are then used for subsetting.
Furthermore, R has the replace() function which can be used likewise:
replace(Y, Y < 6, 0)
[1] 0 0 0 7

R vector staying the same length after indexing within recursive function

I wrote a recursive binary search function in R which finds the smallest element in a vector that is greater than a given value:
binary_next_biggest <- function(x, vec){
if (length(vec) == 1){
if (x < vec[1]){
return(vec[1])
} else {
return(NA)
}
} else {
mid = ceiling(length(vec)/2)
if (x < vec[mid]){
return(binary_next_biggest(x, vec[1:mid]))
} else {
return(binary_next_biggest(x, vec[mid+1:length(vec)]))
}
}
}
I've written this exact same function in Python with no issues (code below), but in R it does not work.
import numpy as np
def binary_next_biggest(x, arr):
if len(arr)==1:
if x < arr[0]:
return arr[0]
else:
return None
else:
mid = int(np.ceil(len(arr)/2)-1)
if x < arr[mid]:
return binary_next_biggest(x, arr[:mid+1])
else:
return binary_next_biggest(x, arr[mid+1:])
Through debugging in RStudio I discovered the mechanics of why it's not working: indexing the vector in my above function is returning a vector of the same length, so that if
vec <- 1:10
and vec is indexed within the function,
vec[6:10]
the resulting vector passed to the new call of binary_next_biggest() is
6 7 8 9 10 NA NA NA NA NA
where I would expect
6 7 8 9 10
What's going on here? I know I can just rewrite it as a while loop iteratively changing indexes, but I don't understand why vector indexing is behaving this way in the code I've written. Within the interactive R console indexing behaves as expected and changes the vector length, so why would it behave differently within a function, and what would be the appropriate way to index for what I'm trying to do?
The cause of the strange behavior of the code is an error in indexing of the vector elements. The part mid+1:length(vec) should be (mid+1):length(vec) because the : operator is executed before addition.
Here is an illustration of the difference.
5 + 1:10
# [1] 6 7 8 9 10 11 12 13 14 15
(5+1):10
# [1] 6 7 8 9 10
There might be a reason why you're doing a binary search (simplified example of more complicated problem?), but there are easier ways to do this in R.
vec <- 1:1000
x <- 49
min(vec[which(vec > x)])
# [1] 50
Which works even if vec isn't ordered.
vec <- sample.int(1000)
min(vec[which(vec > x)])
# [1] 50

adding values to the vector inside for loop in R

I have just started learning R and I wrote this code to learn on functions and loops.
squared<-function(x){
m<-c()
for(i in 1:x){
y<-i*i
c(m,y)
}
return (m)
}
squared(5)
NULL
Why does this return NULL. I want i*i values to append to the end of mand return a vector. Can someone please point out whats wrong with this code.
You haven't put anything inside m <- c() in your loop since you did not use an assignment. You are getting the following -
m <- c()
m
# NULL
You can change the function to return the desired values by assigning m in the loop.
squared <- function(x) {
m <- c()
for(i in 1:x) {
y <- i * i
m <- c(m, y)
}
return(m)
}
squared(5)
# [1] 1 4 9 16 25
But this is inefficient because we know the length of the resulting vector will be 5 (or x). So we want to allocate the memory first before looping. This will be the better way to use the for() loop.
squared <- function(x) {
m <- vector("integer", x)
for(i in seq_len(x)) {
m[i] <- i * i
}
m
}
squared(5)
# [1] 1 4 9 16 25
Also notice that I have removed return() from the second function. It is not necessary there, so it can be removed. It's a matter of personal preference to leave it in this situation. Sometimes it will be necessary, like in if() statements for example.
I know the question is about looping, but I also must mention that this can be done more efficiently with seven characters using the primitive ^, like this
(1:5)^2
# [1] 1 4 9 16 25
^ is a primitive function, which means the code is written entirely in C and will be the most efficient of these three methods
`^`
# function (e1, e2) .Primitive("^")
Here's a general approach:
# Create empty vector
vec <- c()
for(i in 1:10){
# Inside the loop, make one or elements to add to vector
new_elements <- i * 3
# Use 'c' to combine the existing vector with the new_elements
vec <- c(vec, new_elements)
}
vec
# [1] 3 6 9 12 15 18 21 24 27 30
If you happen to run out of memory (e.g. if your loop has a lot of iterations or vectors are large), you can try vector preallocation which will be more efficient. That's not usually necessary unless your vectors are particularly large though.

Vectorized (non-loop) solution returns wrong result (solution with for-loop returns correct result)

I have theoretically identical solutions, one is vectorized solution and another is with for-loop. But vectorized solution returns wrong result and I want to understand why. Solution's logic is simple: need to replace NA with previous non-NA value in the vector.
# vectorized
f1 <- function(x) {
idx <- which(is.na(x))
x[idx] <- x[ifelse(idx > 1, idx - 1, 1)]
x
}
# non-vectorized
f2 <- function(x) {
for (i in 2:length(x)) {
if (is.na(x[i]) && !is.na(x[i - 1])) {
x[i] <- x[i - 1]
}
}
x
}
v <- c(NA,NA,1,2,3,NA,NA,6,7)
f1(v)
# [1] NA NA 1 2 3 3 NA 6 7
f2(v)
# [1] NA NA 1 2 3 3 3 6 7
The two pieces of code are different.
The first one replace NA with the previous element if this one is not NA.
The second one replace NA with the previous element if this one is not NA, but the previous element can be the result of a previous NA substitution.
Which one is correct really depends on you. The second behaviour is more difficult to vectorize, but there are some already implemented functions like zoo::na.locf.
Or, if you only want to use base packages, you could have a look at this answer.
These two solutions are not equivalent. The first function is rather like:
f2_as_f1 <- function(x) {
y <- x # a copy of x
for (i in 2:length(x)) {
if (is.na(y[i])) {
x[i] <- y[i - 1]
}
}
x
}
Note the usage of the y vector.

Resources