recursive map dependent of two vectors - r

Basically;
a<-c(1,2,1,2)
b<-c(1,2,3,4)
I seek a function that returns a vector c with c[n]=b[n]+b[n-1] if a[n] even or b[n]+2b[n-1] otherwise.
Is there anything easier than a brute force for-loop? Some sort of advanced "Reduce" or equivalent.

x <- c(0, b[-length(b)]) # shifted b, 0 for first element
c <- ifelse((a %% 2) == 0, b + x, b + 2*x)
Be careful, length of a should be equal to length of b.

Related

Nesting a function in R until a condition is met

I am looking for an efficient way to nest the same function in R until a condition is met. I hope the following example illustrates my problem clearly.
Consider the function
f(x) = x^2 + 1, with x > 1.
Denote
f^{(k)}(x) = f(f(f(...f(x)))),
where the function f is evaluated k times within itself. Let M > 0, with M given.
Is there any efficient routine in R to determine the minimum value of k such that f^{(k)}(2) > M?
Thank you.
Nothing special for that. Just use a loop:
function(x, M) {
k <- 0
repeat {
x <- x^2 + 1
k <- k + 1
if (x > M)
break
}
k
}
Not particularly efficient, but often the overhead of evaluating f will be greater than the overhead of the loop. If that's not the case (and it might not be for this particular f), I'd suggest doing the equivalent thing in C or C++ (perhaps using Rcpp).
This would be the recursive approach:
# 2^2 + 1 == 5
# 5^2 + 1 == 26
# 26^2 + 1 == 677
f <- function(x,M,k=0){
if(x <= M) k <- f(x^2 + 1,M=M,k+1)
return(k)
}
f(2,3) # 1
f(2,10) # 2
f(2,50) # 3
f(2,700) # 4

Create conditional random sequence

I want to create random sequences for the variables a, b, c, d, e and f with the length of 6000 under specific conditions.
I want to randomly draw from a discrete uniform distribution between 10 and 40 for every sequence, but under the following condition:
a = f < (a+b)/2 < e < c < b < d
Does anyone know how I would code that?
The conditions are somewhat ad-hoc. A hit and miss approach which draws random vectors until the conditions are satisfied could work (though it might not be optimal). Something like:
randvect <- function(){
v <- sample(10:40,5)
while(any(c(v[1] >= v[2],
mean(v[1:2]) >= v[5],
v[5] >= v[3],
v[3] >= v[2],
v[2] >= v[4]))){
v <- sample(10:40,5)
}
v
}
For example,
> randvect()
[1] 16 26 25 36 23
(I don't bother with f since it is the same as a).
To get 6000:
vects <- replicate(6000,randvect())
With all the misses in the hit and miss, that takes about 30 seconds to evaluate on my machine.
This question isn’t really well defined, as there are different implementations that result in different distributions. For instance, taking the condition b=d. The latter is the most natural interpretation, but the most computationally expensive. You can improve it by randomly taking b and d, and then if b > d, then switch b and d. I think this logic can be extended to e,c,b,d: randomly choose four numbers between 10 and 40, then assign e to be the smallest, c the second smallest, etc. I think this will produce the same distribution as the “throw out” method, but I’m not sure. So to get e,c,b, and d:
numbers = sort(sample(10:40,4,replace = TRUE))
e = numbers[1]
c = numbers[2]
b = numbers[3]
d = numbers[4]
I'm still thinking about what to do with a, however.
John Coleman's answer will get there, and is may be a better way to randomly sample, but could potentially take a long time depending on what your allowable space is.
Another option to figure out the allowable space, and sample starting with a.
a has to be between 10 and 34 (to leave room for e, c, b, and d)
the average of a and b has to be =< (b - 2) and < 37. This means b has to be 5 or more than a, and less than 39
a + 4 < b < min((37 * 2) - a, 39)
The rest are a bit more straightforward. These can be wrapped into a function.
I'm going to use data.table more for looking at the results at the end. Also I'm using the function resample described in help(sample) to handle cases where there is only a single value to sample.
library(data.table)
resample <- function(x, ...) x[sample.int(length(x), ...)]
funky <- function() {
a <- resample(10:34, 1)
f <- a
b <- resample((a + 5):min(((37 * 2) - a + 1), 39), 1)
e <- resample(ceiling((a+b)/2 + 0.1):min(38, b - 2), 1)
c <- resample((e + 1):(b - 1), 1)
d <- resample((b + 1):40, 1)
c(a, b, c, d, e, f)
}
A few issues found by trial and error. In e, the 0.1 is added so that if the average is currently an integer, it gets increased by 1, but if the value is X.5 it will get rounded up to X + 1.
dat <- data.table(t(replicate(10000, funky())))
setnames(dat, c("a", "b", "c", "d", "e", "f"))
The following will return all rows that fail the tests in the original question. A few iterations with 10k samples and it doesn't look like anything is failing.
dat[!(a == f &
f < ((a + b) / 2) &
((a + b) / 2) < e &
e < c &
c < b &
b < d)]

Is there a general algorithm to identify a numeric series?

I am looking for a general purpose algorithm to identify short numeric series from lists with a max length of a few hundred numbers. This will be used to identify series of masses from mass spectrometry (ms1) data.
For instance, given the following list, I would like to identify that 3 of these numbers fit the series N + 1, N +2, etc.
426.24 <= N
427.24 <= N + 1/x
371.10
428.24 <= N + 2/x
851.47
451.16
The series are all of the format: N, N+1/x, N+2/x, N+3/x, N+4/x, etc, where x is an integer (in the example x=1). I think this constraint makes the problem very tractable. Any suggestions for a quick/efficient way to tackle this in R?
This routine will generate series using x from 1 to 10 (you could increase it). And will check how many are contained in the original list of numbers.
N = c(426.24,427.24,371.1,428.24,851.24,451.16)
N0 = N[1]
x = list(1,2,3,4,5,6,7,8,9,10)
L = 20
Series = lapply(x, function(x){seq(from = N0, by = 1/x,length.out = L)})
countCoincidences = lapply(Series, function(x){sum(x %in% N)})
Result:
unlist(countCoincidences)
[1] 3 3 3 3 3 3 3 3 3 2
As you can see, using x = 1 will have 3 coincidences. The same goes for all x until x=9. Here you have to decide which x is the one you want.
Since you're looking for an arithmetic sequence, the difference k is constant. Thus, you can loop over the vector and subtract each value from the sequence. If you have a sequence, subtracting the second term from the vector will result in values of -k, 0, and k, so you can find the sequence by looking for matches between vector - value and its opposite, value - vector:
x <- c(426.24, 427.24, 371.1, 428.24, 851.47, 451.16)
unique(lapply(x, function(y){
s <- (x - y) %in% (y - x);
if(sum(s) > 1){x[s]}
}))
# [[1]]
# NULL
#
# [[2]]
# [1] 426.24 427.24 428.24

How many values of a vector are divisible by 2? Use R

I have an ex. where I have to see how many values of a vector are divisible by 2. I have this random sample:
set.seed(1)
y <- sample(c(0:99, NA), 400, replace=TRUE)
I created a new variable d to see which of the values are or aren't divisible by 2:
d <- y/2 ; d
What I want to do is to create a logical argument, where all entire numbers give true and the rest gives false. (ex: 22.0 -> TRUE & 24.5 -> FALSE)
I used this command, but I believe that the answer is wrong since it would only give me the numbers that are in the sample:
sum(d %in% y, na.rm=T)
I also tried this (I found on the internet, but I don't really understand it)
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol
sum(is.wholenumber(d),na.rm = T)
Are there other ways that I could use the operator "%%"?
you can sum over the mod operator like so: sum(1-y%%2) or sum(y%%2 == 0). Note that x %% 2 is the remainder after dividing by two which is why this solution works.
Here are three different ways:
length(y[y %% 2 == 0])
length(subset(y, y %% 2 == 0))
length(Filter(function(x) x %% 2 == 0, y))
Since we're talking about a division by 2, I would actually take it to the bit level and check if the last bit of the number is a 0 or a 1 (a 0 means it would be divisible by 2).
Going out on a limb here (not sure how the compiler handles this division by 2) but think that would likely be more optimized than a division, which is typically fairly expensive.
To do this at the bit level, you can just do an AND operation between the number itself and 1, if result it 1 it means won't be divisible by 2:
bitwAnd(a, b)

Ifelse() with three conditions

I have two vectors:
a<-rep(1:2,100)
b<-sample(a)
I would like to have an ifelse condition that compares each value of a with the corresponding value of b, and does the following:
if a>b 1
if a<b 0
if a=b sample(1:2,length(a),replace=T)
the first two can be done with :
ifelse(a>b,1,0)
but I'm not sure how to incorporate the case where a and b are equal.
How about adding another ifelse:
ifelse(a>b, 1, ifelse(a==b, sample(1:2, length(a), replace = TRUE), 0))
In this case you get the value 1 if a>b, then, if a is equal to b it is either 1 or 2 (sample(1:2, length(a), replace = TRUE)), and if not (so a must be smaller than b) you get the value 0.
This is an easy way:
(a > b) + (a == b) * sample(2, length(a), replace = TRUE)
This is based on calculations with boolean values which are cast into numerical values.
There is ambiguity in your question. Do you want different random values for all indexes where a==b or one random value for all indexes?
The answer by #Rob will work in the second scenario. For the first scenario I suggest avoiding ifelse:
u<-rep(NA,length(a))
u[a>b] <- 1
u[a<b] <- 0
u[a==b] <- sample(1:2,sum(a==b),replace=TRUE)

Resources