Function which selects "closest elements" in vector/list - r

Lets say I have a vector of 10 numbers, i.e a<-(1:10)
I want a function where I can give an index, and then it returns the 4 "closest" elements.
In this case:
f(4) = c(2,3,5,6)
f(2) = c(10,1,3,4)
Note that the original vector could be anything, so if a <- c(10,2,6,4,7,9,1)
f(2) = c(1,10,6,4)
f(5) = c(3,4,9,1)

The natural way to do this is with modular arithmetic. This is a case in which R's 1-based indexing is mildly annoying. It is best to subtract 1 from the index, use modular arithmetic to get the 0-based indices that you would want, and then add 1 to get back to 1-based:
f <- function(v,i){
m <- length(v)
j <- i-1
indices <- 1 + c(j-2,j-1,j+1,j+2) %% m
v[indices]
}
#test:
a <- c(10,2,6,4,7,9,1)
print(f(a,2))
print(f(a,5))
Output:
[1] 1 10 6 4
[1] 6 4 9 1

Related

How to add possible divisor numbers?

How do I retrieve maximum sum of possible divisors numbers
I have a below function which will give possible divisors of number
Code
divisors <- function(x) {
y <- seq_len(ceiling(x / 2))
y[x %% y == 0]
}
Example
Divisors of 99 will give the below possible values.
divisors(99)
[1] 1 3 9 11 33
My expected Logic :
Go from last digit to first digit in the divisors value
The last number is 33, Here next immediate number divisible by 33 is 11 . So I selected 11 , now traversing from 11 the next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
33 + 11 + 1 = 45
Move to next number 11, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
11 + 1 = 12
Here immediate
Move to next number 9, Now next immediate number divisible by 11 is 1. So selected 1. Now add all the numbers.
9 + 3 + 1 = 13
Move to next number 3, Now next immediate number divisible by 3 is 1. So selected 1. Now add all the numbers.
3+1=4
Now maximum among these is 45.
Now I am struggling to write this logic in R . Help / Advice much appreciated.
Note : Prime numbers can be ignored.
update
For large integers, e.g., the maximum integer .Machine$integer.max (prime number), you can run the code below (note that I modified functions divisors and f a bit)
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
f <- function(y) {
if (length(y) <= 2) {
return(as.integer(sum(y)))
}
l <- length(y)
h <- y[l]
yy <- y[-l]
h + f(yy[h %% yy == 0])
}
and you will see
> n <- .Machine$integer.max - 1
> x <- divisors(n)
> max(sapply(length(x):2, function(k) f(head(x, k))))
[1] 1569603656
You can define a recursive function f that gives successive divisors
f <- function(y) {
if (length(y) == 1) {
return(y)
}
h <- y[length(y)]
yy <- y[-length(y)]
c(f(yy[h %% yy == 0]), h)
}
and you will see all possible successive divisor tuples
> sapply(rev(seq_along(x)), function(k) f(head(x, k)))
[[1]]
[1] 1 11 33
[[2]]
[1] 1 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then, we apply f within sapply like below
> max(sapply(rev(seq_along(x)), function(k) sum(f(head(x, k)))))
[1] 45
which gives the desired output.
You can also use the following solution. It may sound a little bit complicated and of course there is always an easier, more efficient solution. However, I thought this could be useful to you. I will take it from your divisors output:
> x
[1] 1 3 9 11 33
# First I created a list whose first element is our original x and from then on
# I subset the first element till the last element of the list
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
> lst
[[1]]
[1] 1 3 9 11 33
[[2]]
[1] 1 3 9 11
[[3]]
[1] 1 3 9
[[4]]
[1] 1 3
[[5]]
[1] 1
Then I wrote a custom function in order to implement your conditions and gather your desired output. For this purpose I created a function factory which in fact is a function that creates a function:
As you might have noticed the outermost function does not take any argument. It only sets up an empty vector out to save our desired elements in. It is created in the execution environment of the outermost function to shield it from any changes that might affect it in the global environment
The inner function is the one that takes our vector x so in general we call the whole setup like fnf()(x). First element of of our out vector is in fact the first element of the original x(33). Then I found all divisors of the first element whose quotient were 0. After I fount them I took the second element (11) as the first one was (33) and stored it in our out vector. Then I modified the original x vector and omitted the max value (33) and repeated the same process
Since we were going to repeat the process over again, I thought this might be a good case to use recursion. Recursion is a programming technique that a function actually calls itself from its body or from inside itself. As you might have noticed I used fn inside the function to repeat the process again but each time with one fewer value
This may sound a bit complicated but I believed there may be some good points for you to pick up for future exploration, since I found them very useful, hoped that's the case for you too.
fnf <- function() {
out <- c()
fn <- function(x) {
out <<- c(out, x[1])
z <- x[out[length(out)]%%x == 0]
if(length(z) >= 2) {
out[length(out) + 1] <<- z[2]
} else {
return(out)
}
x <- x[!duplicated(x)][which(x[!duplicated(x)] == z[2]):length(x[!duplicated(x)])]
fn(x)
out[!duplicated(out)]
}
}
# The result of applying the custom function on `lst` would result in your
# divisor values
lapply(lst, function(x) fnf()(sort(x, decreasing = TRUE)))
[[1]]
[1] 33 11 1
[[2]]
[1] 11 1
[[3]]
[1] 9 3 1
[[4]]
[1] 3 1
[[5]]
[1] 1
In the end we sum each element and extract the max value
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 45
Testing a very large integer number, I used dear #ThomasIsCoding's modified divisors function:
divisors <- function(x) {
y <- seq(x / 2)
y[as.integer(x) %% y == 0]
}
x <- divisors(.Machine$integer.max - 1)
lst <- lapply(0:(length(x)-1), function(a) x[1:(length(x)-a)])
Reduce(max, lapply(lst, function(x) sum(fnf()(sort(x, decreasing = TRUE)))))
[1] 1569603656
You'll need to recurse. If I understand correctly, this should do what you want:
fact <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x)/2)
factors <- div[x %% div == 0L]
return(factors)
}
maxfact <- function(x) {
factors <- fact(x)
if (length(factors) < 3L) {
return(sum(factors))
} else {
return(max(factors + mapply(maxfact, factors)))
}
}
maxfact(99)
[1] 45

How can I find and replace a specific sequence of numbers in a vector in R?

I need to replace the sequence "1,0,1" with "1,1,1" whenever it is found in a vector. How can I do this?
x <- c(1,2,3,4,1,0,1)
Edit:
This search needs to be dynamic. If after changing from 1,0,1 to 1,1,1 another 1,0,1 occurs, this must also be replaced.
Considering:
x <- c (1,2,3,4,1,0,1,0,1,2)
I want the algorithm to do:
x <- c (1,2,3,4,1,1,1,0,1,2)
And after:
x <- c (1,2,3,4,1,1,1,1,1,2)
A function that deals dynamically with the length of the sub-vector (being sought). Solutions that convert to/from strings are going to be hugely inefficient asymptotically. Solutions that hard-code a sub-vec of length 3 are limited to sub-vecs of length 3. This deals with anything as long as the source vector is as large or larger than the sub-vec to be found.
#' Find a matching sub-vector
#'
#' Given a vector (`invec`) and a no-larger sub-vector (`subvec`),
#' determine if the latter occurs perfectly.
#' #param invec vector
#' #param subvec vector
#' #return integer positions, length 0 or more
find_subvec <- function(invec, subvec) {
sublen <- seq_along(subvec) - 1L
if (length(subvec) > length(invec)) return(integer(0))
which(
sapply(seq_len(length(invec) - length(subvec) + 1L),
function(i) all(subvec == invec[i + sublen]))
)
}
Use:
find_subvec(c(1,2,3,4,1,0,1), c(1,0,1))
# [1] 5
find_subvec(c(1,2,3,4,1,0,1,0,1), c(1,0,1))
# [1] 5 7
A literal replacement.
z <- c(1,1,1)
x <- c(1,2,3,4,1,0,1)
y <- c(1,0,1)
z <- c(1,1,1)
ind <- find_subvec(x, y)
for (i in ind) x[i + seq_along(y) - 1] <- z
x
# [1] 1 2 3 4 1 1 1
There could be edge cases as mentioned by #Onyambu when the expected results are not clear, but one option could be:
x + (x == 0 & c(NA, head(x, -1)) == 1 & c(tail(x, -1), NA) == 1)
1] 1 2 3 4 1 1 1
Here, it is not treating x as a string, but it is assessing whether the lag and lead values are 1 and the value in the middle is 0.
This should work well enough
library(tidyverse)
x <- c(1,2,3,4,1,0,1,0,1)
x %>%
reduce(str_c) %>%
str_replace_all("(?<=1)0(?=1)","1")
#> [1] "123411111"
Created on 2020-06-14 by the reprex package (v0.3.0)

Swapping elements between more than 2 arrays

Swapping elements within a single array (x) is a classic problem in computer science. The immediate (but by no means only, e.g., XOR) solution in a low-level language like C is to use a temporary variable:
x[0] = tmp
x[0] = x[1]
x[1] = tmp
The above algorithm swaps the first and second elements of x.
To swap elements between two subarrays, x and y, is similar
x[0] = tmp
x[0] = y[1]
y[1] = tmp
What about for the case of 3 arrays with the added restriction that an element of Array 1 must be swapped with an element of Array 2 and an element of Array 2 must be swapped with an element of Array 3? Elements in Arrays 1 and 3 are not swapped with one another.
How can such an approach (with the added restriction) be generalized to k arrays?
You could create a for-loop that repeats your set of instructions:
l=list(x = c(1,2,3,4,5),y = c(5,4,3,2,1),z = c(6,7,8,9,10))
swap_elements <- function(l)
{
for(i in 1:(length(l)-1))
{
tmp = l[[i]][1]
l[[i]][1] = l[[i+1]][2]
l[[i+1]][2] = tmp
}
return(l)
}
Output:
> swap_elements(l)
$x
[1] 4 2 3 4 5
$y
[1] 7 1 3 2 1
$z
[1] 6 5 8 9 10
if the Arrays are stacked into a matrix, you can lag the rows to create the required action
k <- 6
#generate dummy data with k rows and 3 columns
mat <- matrix(seq_len(3*k), nrow=k, byrow=TRUE)
mat
#lag the matrix
mat[c(seq_len(k)[-1], 1),]

Is there a general algorithm to identify a numeric series?

I am looking for a general purpose algorithm to identify short numeric series from lists with a max length of a few hundred numbers. This will be used to identify series of masses from mass spectrometry (ms1) data.
For instance, given the following list, I would like to identify that 3 of these numbers fit the series N + 1, N +2, etc.
426.24 <= N
427.24 <= N + 1/x
371.10
428.24 <= N + 2/x
851.47
451.16
The series are all of the format: N, N+1/x, N+2/x, N+3/x, N+4/x, etc, where x is an integer (in the example x=1). I think this constraint makes the problem very tractable. Any suggestions for a quick/efficient way to tackle this in R?
This routine will generate series using x from 1 to 10 (you could increase it). And will check how many are contained in the original list of numbers.
N = c(426.24,427.24,371.1,428.24,851.24,451.16)
N0 = N[1]
x = list(1,2,3,4,5,6,7,8,9,10)
L = 20
Series = lapply(x, function(x){seq(from = N0, by = 1/x,length.out = L)})
countCoincidences = lapply(Series, function(x){sum(x %in% N)})
Result:
unlist(countCoincidences)
[1] 3 3 3 3 3 3 3 3 3 2
As you can see, using x = 1 will have 3 coincidences. The same goes for all x until x=9. Here you have to decide which x is the one you want.
Since you're looking for an arithmetic sequence, the difference k is constant. Thus, you can loop over the vector and subtract each value from the sequence. If you have a sequence, subtracting the second term from the vector will result in values of -k, 0, and k, so you can find the sequence by looking for matches between vector - value and its opposite, value - vector:
x <- c(426.24, 427.24, 371.1, 428.24, 851.47, 451.16)
unique(lapply(x, function(y){
s <- (x - y) %in% (y - x);
if(sum(s) > 1){x[s]}
}))
# [[1]]
# NULL
#
# [[2]]
# [1] 426.24 427.24 428.24

Perform an operation on a vector using the previous value after an initial value

In Excel, it's easy to perform a calculation on a previous cell by referencing that earlier cell. For example, starting from an initial value of 100 (step = 0), each next step would be 0.9 * previous + 9 simply by dragging the formula bar down from the first cell (step = 1). The next 10 steps would look like:
step value
[1,] 0 100.00000
[2,] 1 99.00000
[3,] 2 98.10000
[4,] 3 97.29000
[5,] 4 96.56100
[6,] 5 95.90490
[7,] 6 95.31441
[8,] 7 94.78297
[9,] 8 94.30467
[10,] 9 93.87420
[11,] 10 93.48678
I've looked around the web and StackOverflow, and the best I could come up with is a for loop (below). Are there more efficient ways to do this? Is it possible to avoid a for loop? It seems like most functions in R (such as cumsum, diff, apply, etc) work on existing vectors instead of calculating new values on the fly from previous ones.
#for loop. This works
value <- 100 #Initial value
for(i in 2:11) {
current <- 0.9 * value[i-1] + 9
value <- append(value, current)
}
cbind(step = 0:10, value) #Prints the example output shown above
It seems like you're looking for a way to do recursive calculations in R. Base R has two ways of doing this which differ by the form of the function used to do the recursion. Both methods could be used for your example.
Reduce can be used with recursion equations of the form v[i+1] = function(v[i], x[i]) where v is the calculated vector and x an input vector; i.e. where the i+1 output depends only the i-th values of the calculated and input vectors and the calculation performed by function(v, x) may be nonlinear. For you case, this would be
value <- 100
nout <- 10
# v[i+1] = function(v[i], x[i])
v <- Reduce(function(v, x) .9*v + 9, x=numeric(nout), init=value, accumulate=TRUE)
cbind(step = 0:nout, v)
filter is used with recursion equations of the form y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p] where y is the calculated vector and x an input vector; i.e. where the output can depend linearly upon lagged values of the calculated vector as well as the i-th value of the input vector. For your case, this would be:
value <- 100
nout <- 10
# y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p]
y <- c(value, stats::filter(x=rep(9, nout), filter=.9, method="recursive", sides=1, init=value))
cbind(step = 0:nout, y)
For both functions, the length of the output is given by the length of the input vector x.
Both of these approaches give your result.
Use our knowledge about the geometric series.
i <- 0:10
0.9 ^ i * 100 + 9 * (0.9 ^ i - 1) / (0.9 - 1)
#[1] 100.00000 99.00000 98.10000 97.29000 96.56100 95.90490 95.31441 94.78297 94.30467 93.87420 93.48678
You could also use purrr::accumulate:
data.frame(value = purrr::accumulate(0:10, ~ .x * .9 + 9, .init = 100))
value
1 100.00000
2 99.00000
3 98.10000
4 97.29000
5 96.56100
6 95.90490
7 95.31441
8 94.78297
9 94.30467
10 93.87420
11 93.48678
12 93.13811
.init is the initial value and there is also the argument .dir if you want to control the direction ("forward" is the default)

Resources