How to concisely deal with subsets when their lengths become zero? - r

To exclude elements from a vector x,
x <- c(1, 4, 3, 2)
we can subtract a vector of positions:
excl <- c(2, 3)
x[-excl]
# [1] 1 2
This also works dynamically,
(excl <- which(x[-which.max(x)] > quantile(x, .25)))
# [1] 2 3
x[-excl]
# [1] 1 2
until excl is of length zero:
excl.nolength <- which(x[-which.max(x)] > quantile(x, .95))
length(excl.nolength)
# [1] 0
x[-excl.nolength]
# integer(0)
I could kind of reformulate that, but I have many objects to which excl is applied, say:
letters[1:4][-excl.nolength]
# character(0)
I know I could use setdiff, but that's rather long and hard to read:
x[setdiff(seq(x), excl.nolength)]
# [1] 1 4 3 2
letters[1:4][setdiff(seq(letters[1:4]), excl.nolength)]
# [1] "a" "b" "c" "d"
Now, I could exploit the fact that nothing is excluded if the element number is greater than the number of elements:
length(x)
# [1] 4
x[-5]
# [1] 1 4 3 2
To generalize that I should probably use .Machine$integer.max:
tmp <- which(x[-which.max(x)] > quantile(x, .95))
excl <- if (!length(tmp) == 0) tmp else .Machine$integer.max
x[-excl]
# [1] 1 4 3 2
Wrapped into a function,
e <- function(x) if (!length(x) == 0) x else .Machine$integer.max
that's quite handy and clear:
x[-e(excl)]
# [1] 1 2
x[-e(excl.nolength)]
# [1] 1 4 3 2
letters[1:4][-e(excl.nolength)]
# [1] "a" "b" "c" "d"
But it seems a little fishy to me...
Is there a better equally concise way to deal with a subset of length zero in base R?
Edit
excl comes out as dynamic result of a function before (as shown with which above) and might be of length zero or not. If length(excl) == 0 nothing should be excluded. Following lines of code, e.g. x[-excl] should not have to be changed at best or as little as possible.

You can overwrite [ with your own function.
"[" <- function(x,y) {if(length(y)==0) x else .Primitive("[")(x,y)}
x <- c(1, 4, 3, 2)
excl <- c(2, 3)
x[-excl]
#[1] 1 2
excl <- integer()
x[-excl]
#[1] 1 4 3 2
rm("[") #Go back to normal mode

I would argue this is somewhat opinion based.
For example i find:
x <- x[-if(length(excl <- which(x[-which.max(x)] > quantile(x, .95))) == 0) .Machine$integer.max else excl]
very unreadable, but some people like one-liners. Reading package code you'll often find this is instead split up into one of the many suggestions you gave
excl <- which(x[-which.max(x)] > quantile(x, .95))
if(length(excl) != 0)
x <- x[-excl]
Alternatively, you could avoid which, and simply use the logical vector for subsetting, and this would likely be considered more clean by most
x <- x[!x[-which.max(x)] > quantile(x, .95)]
This would avoid zero-length index problem, at the cost of some loss of efficiency.
As a side note, the very example used above and in the question seems somewhat off. First which.max only returns the first index which is equal to the max value, and in addition the index will be offset for every value removed. More likely the expected example would be
x <- x[!(x > quantile(x, .95))[-which(x == max(x))]]

How bout this?
a <- letters[1:3]
excl1 <- c(1,3)
excl2 <- c()
a[!(seq_along(a) %in% excl1)]
a[!(seq_along(a) %in% excl2)]

Related

How to add possible divisor numbers?

How do I retrieve maximum sum of possible divisors numbers
I have a below function which will give possible divisors of number
Code
divisors <- function(x) {
y <- seq_len(ceiling(x / 2))
y[x %% y == 0]
}
Example
Divisors of 99 will give the below possible values.
divisors(99)
[1] 1 3 9 11 33
My expected Logic :
Go from last digit to first digit in the divisors value
The last number is 33, Here next immediate number divisible by 33 is 11 . So I selected 11 , now traversing from 11 the next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
33 + 11 + 1 = 45
Move to next number 11, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
11 + 1 = 12
Here immediate
Move to next number 9, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
9 + 3 + 1 = 13
Move to next number 3, Now next immediate number divisible by 3 is 1. So selected 1. Now add all the numbers.
3+1=4
Now maximum among these is 45.
Now I am struggling to write this logic in R . Help / Advice much appreciated.
Note : Prime numbers can be ignored.
update
For large integers, e.g., the maximum integer .Machine$integer.max (prime number), you can run the code below (note that I modified functions divisors and f a bit)
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
f <- function(y) {
if (length(y) <= 2) {
return(as.integer(sum(y)))
}
l <- length(y)
h <- y[l]
yy <- y[-l]
h + f(yy[h %% yy == 0])
}
and you will see
> n <- .Machine$integer.max - 1
> x <- divisors(n)
> max(sapply(length(x):2, function(k) f(head(x, k))))
[1] 1569603656
You can define a recursive function f that gives successive divisors
f <- function(y) {
if (length(y) == 1) {
return(y)
}
h <- y[length(y)]
yy <- y[-length(y)]
c(f(yy[h %% yy == 0]), h)
}
and you will see all possible successive divisor tuples
> sapply(rev(seq_along(x)), function(k) f(head(x, k)))
[[1]]
[1] 1 11 33
[[2]]
[1] 1 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then, we apply f within sapply like below
> max(sapply(rev(seq_along(x)), function(k) sum(f(head(x, k)))))
[1] 45
which gives the desired output.
You can also use the following solution. It may sound a little bit complicated and of course there is always an easier, more efficient solution. However, I thought this could be useful to you. I will take it from your divisors output:
> x
[1] 1 3 9 11 33
# First I created a list whose first element is our original x and from then on
# I subset the first element till the last element of the list
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
> lst
[[1]]
[1] 1 3 9 11 33
[[2]]
[1] 1 3 9 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then I wrote a custom function in order to implement your conditions and gather your desired output. For this purpose I created a function factory which in fact is a function that creates a function:
As you might have noticed the outermost function does not take any argument. It only sets up an empty vector out to save our desired elements in. It is created in the execution environment of the outermost function to shield it from any changes that might affect it in the global environment
The inner function is the one that takes our vector x so in general we call the whole setup like fnf()(x). First element of of our out vector is in fact the first element of the original x(33). Then I found all divisors of the first element whose quotient were 0. After I fount them I took the second element (11) as the first one was (33) and stored it in our out vector. Then I modified the original x vector and omitted the max value (33) and repeated the same process
Since we were going to repeat the process over again, I thought this might be a good case to use recursion. Recursion is a programming technique that a function actually calls itself from its body or from inside itself. As you might have noticed I used fn inside the function to repeat the process again but each time with one fewer value
This may sound a bit complicated but I believed there may be some good points for you to pick up for future exploration, since I found them very useful, hoped that's the case for you too.
fnf <- function() {
out <- c()
fn <- function(x) {
out <<- c(out, x[1])
z <- x[out[length(out)]%%x == 0]
if(length(z) >= 2) {
out[length(out) + 1] <<- z[2]
} else {
return(out)
}
x <- x[!duplicated(x)][which(x[!duplicated(x)] == z[2]):length(x[!duplicated(x)])]
fn(x)
out[!duplicated(out)]
}
}
# The result of applying the custom function on `lst` would result in your
# divisor values
lapply(lst, function(x) fnf()(sort(x, decreasing = TRUE)))
[[1]]
[1] 33 11 1
[[2]]
[1] 11 1
[[3]]
[1] 9 3 1
[[4]]
[1] 3 1
[[5]]
[1] 1
In the end we sum each element and extract the max value
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 45
Testing a very large integer number, I used dear #ThomasIsCoding's modified divisors function:
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
x <- divisors(.Machine$integer.max - 1)
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 1569603656
You'll need to recurse. If I understand correctly, this should do what you want:
fact <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x)/2)
factors <- div[x %% div == 0L]
return(factors)
}
maxfact <- function(x) {
factors <- fact(x)
if (length(factors) < 3L) {
return(sum(factors))
} else {
return(max(factors + mapply(maxfact, factors)))
}
}
maxfact(99)
[1] 45

How can I find and replace a specific sequence of numbers in a vector in R?

I need to replace the sequence "1,0,1" with "1,1,1" whenever it is found in a vector. How can I do this?
x <- c(1,2,3,4,1,0,1)
Edit:
This search needs to be dynamic. If after changing from 1,0,1 to 1,1,1 another 1,0,1 occurs, this must also be replaced.
Considering:
x <- c (1,2,3,4,1,0,1,0,1,2)
I want the algorithm to do:
x <- c (1,2,3,4,1,1,1,0,1,2)
And after:
x <- c (1,2,3,4,1,1,1,1,1,2)
A function that deals dynamically with the length of the sub-vector (being sought). Solutions that convert to/from strings are going to be hugely inefficient asymptotically. Solutions that hard-code a sub-vec of length 3 are limited to sub-vecs of length 3. This deals with anything as long as the source vector is as large or larger than the sub-vec to be found.
#' Find a matching sub-vector
#'
#' Given a vector (`invec`) and a no-larger sub-vector (`subvec`),
#' determine if the latter occurs perfectly.
#' #param invec vector
#' #param subvec vector
#' #return integer positions, length 0 or more
find_subvec <- function(invec, subvec) {
sublen <- seq_along(subvec) - 1L
if (length(subvec) > length(invec)) return(integer(0))
which(
sapply(seq_len(length(invec) - length(subvec) + 1L),
function(i) all(subvec == invec[i + sublen]))
)
}
Use:
find_subvec(c(1,2,3,4,1,0,1), c(1,0,1))
# [1] 5
find_subvec(c(1,2,3,4,1,0,1,0,1), c(1,0,1))
# [1] 5 7
A literal replacement.
z <- c(1,1,1)
x <- c(1,2,3,4,1,0,1)
y <- c(1,0,1)
z <- c(1,1,1)
ind <- find_subvec(x, y)
for (i in ind) x[i + seq_along(y) - 1] <- z
x
# [1] 1 2 3 4 1 1 1
There could be edge cases as mentioned by #Onyambu when the expected results are not clear, but one option could be:
x + (x == 0 & c(NA, head(x, -1)) == 1 & c(tail(x, -1), NA) == 1)
1] 1 2 3 4 1 1 1
Here, it is not treating x as a string, but it is assessing whether the lag and lead values are 1 and the value in the middle is 0.
This should work well enough
library(tidyverse)
x <- c(1,2,3,4,1,0,1,0,1)
x %>%
reduce(str_c) %>%
str_replace_all("(?<=1)0(?=1)","1")
#> [1] "123411111"
Created on 2020-06-14 by the reprex package (v0.3.0)

How does R remember which values were indexed?

Code:
x <- c(1, 1, 3, 2, 3, 5)
x[x < 3] <- x[x < 3] * 100
x
Output:
[1] 100 100 3 200 3 5
I expected to get the output “[1] 100 100 200”. How does R remember the indexes where x < 3? Because when running x[x < 3], you get the output “1,1,2” (and the indexes of those numbers are 1,2,4).
Hence, R remember the values where x < 3, and manipulates those values. But how does R know the indexes of those values?
If you want to get the result of your call you need to assign it to a new variable:
x2 <- (x[x < 3] <- x[x<3] * 100)
x2
#[1] 100 100 200
Note that x will still be changed (as a result of your assignment x[x < 3] <- x[x<3] * 100).
If that is not desired, you can simply do:
x2 <- x[x<3]*100
x2
#[1] 100 100 200
x
#[1] 1 1 3 2 3 5
If you look at what is x < 3 you'll find it yields
[1] TRUE TRUE FALSE TRUE FALSE FALSE
so x[x<3] <- x[x < 3] * 100 means multiply by 100 the first, second and fourth value of x (x[x < 3]*100) and replace the first, second and fourth value of x by these new values (x[x<3] <-).
It's not remembering the locations of the x < 3 values. The left-hand side of your equation is explicitly the locations of the x < 3 values.
So the thing on the right might be (100, 100, 200), but the thing on the left is the placeholders at places 1, 2 and 4.
In your code, you ask R to go to x and multiple each value less than 3 by 100 that is in this code x[x < 3]*100. R, in this case, looks for the place of your element. That is, the code goes to x values and takes them one by one. The one that obeys the condition is then multiplied by 100. Then, the code returns you all the values of x. The correct way is to assign a name to your specific values as #Cath answer.
In addition, it is a good idea to write a general function for your convenience. Here is a function.
myfun <- function(x, n){
res <- lapply(x, function(i) (if (i < n) {i *100}else{0}))
res
}
or simply use the comment of #Cath.
myfun <- function(x, n){
res <- x[x < n]*100
res
}
myfun(x, 3)
x <- list(1, 1, 3, 2, 3, 5)
n=3
Then,
> myfun(x, 3)
[[1]]
[1] 100
[[2]]
[1] 100
[[3]]
[1] 0
[[4]]
[1] 200
[[5]]
[1] 0
[[6]]
[1] 0

Find closest value with condition

I have a function that finds me the nearest values for each row in a matrix. It then reports a list with an index of the nearest rows. However, I want it to exclude values if they are +1 in the first AND +1 in the second column away from a particular set of values (-1 in the first and -1 in the second column should also be removed). Moreover, +1 in first column and -1 in second column with respect to the values of interest should also be avoided.
As an example, if I want things closes to c(2, 1), it should accept c(3,1) or (2,2) or (1,1), but NOT c(3,2) and not c(1,0).
Basically, for an output to be reported either column 1 or column 2 should be a value of 1 away from a row of interest, but not both.
input looks like this
x
v1 v2
[1,] 3 1
[2,] 2 1
[3,] 3 2
[4,] 1 2
[5,] 8 5
myfunc(x)
The output looks like this. Notice that the closest thing to row 2 ($V2 in output) is row 1,3,4. The answer should only be 1 though.
$V1
[1] 2 3
$V2
[1] 1 3 4
$V3
[1] 1 2
$V4
[1] 2
$V5
integer(0)
Here is myfunc
myfunc = function(t){
d1 <- dist(t[,1])
d2 <- dist(t[,2])
dF <- as.matrix(d1) <= 1 & as.matrix(d2) <= 1
diag(dF) <- NA
colnames(dF) <- NULL
dF2 <- lapply(as.data.frame(dF), which)
return(dF2)
}
Basically, the rows that you want to find should differ from your reference element by +1 or -1 in one column and be identical in the other column. That means that the sum over the absolute values of the differences is exactly one. For your example c(2, 1), this works as follows:
c(3, 1): difference is c(1, 0), thus sum(abs(c(1, 0))) = 1 + 0 = 1
c(1, 1): difference is c(-1, 0), thus sum(abs(c(-1, 0))) = 1 + 0 = 1
etc.
The following function checks exactly this:
myfunc <- function(x) {
do_row <- function(r) {
r_mat <- matrix(rep(r, length = length(x)), ncol = ncol(x), byrow = TRUE)
abs_dist <- abs(r_mat - x)
return(which(rowSums(abs_dist) == 1))
}
return(apply(x, 1, do_row))
}
do_row() does the job for a single row, and then apply() is used to do this with each row. For your example, I get:
myfunc(x)
## [[1]]
## [1] 2 3
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 1
##
## [[4]]
## integer(0)
##
## [[5]]
## integer(0)
Using sweep(), one can write a shorter function:
myfunc2 <- function(x) {
apply(x, 1, function(r) which(rowSums(abs(sweep(x, 2, r))) == 1))
}
But this seems harder to understand and it turns out that it is slower by about a factor two for your matrix x. (I have also tried it with a large matrix, and there, the efficiency seems about the same.)

R split numeric vector at position

I am wondering about the simple task of splitting a vector into two at a certain index:
splitAt <- function(x, pos){
list(x[1:pos-1], x[pos:length(x)])
}
a <- c(1, 2, 2, 3)
> splitAt(a, 4)
[[1]]
[1] 1 2 2
[[2]]
[1] 3
My question: There must be some existing function for this, but I can't find it? Is maybe split a possibility? My naive implementation also does not work if pos=0 or pos>length(a).
An improvement would be:
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))
which can now take a vector of positions:
splitAt(a, c(2, 4))
# [[1]]
# [1] 1
#
# [[2]]
# [1] 2 2
#
# [[3]]
# [1] 3
And it does behave properly (subjective) if pos <= 0 or pos >= length(x) in the sense that it returns the whole original vector in a single list item. If you'd like it to error out instead, use stopifnot at the top of the function.
I tried to use flodel's answer, but it was too slow in my case with a very large x (and the function has to be called repeatedly). So I created the following function that is much faster, but also very ugly and doesn't behave properly. In particular, it doesn't check anything and will return buggy results at least for pos >= length(x) or pos <= 0 (you can add those checks yourself if you're unsure about your inputs and not too concerned about speed), and perhaps some other cases as well, so be careful.
splitAt2 <- function(x, pos) {
out <- list()
pos2 <- c(1, pos, length(x)+1)
for (i in seq_along(pos2[-1])) {
out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
}
return(out)
}
However, splitAt2 runs about 20 times faster with an x of length 106:
library(microbenchmark)
W <- rnorm(1e6)
splits <- cumsum(rep(1e5, 9))
tm <- microbenchmark(
splitAt(W, splits),
splitAt2(W, splits),
times=10)
tm
Another alternative that might be faster and/or more readable/elegant than flodel's solution:
splitAt <- function(x, pos) {
unname(split(x, findInterval(x, pos)))
}

Resources