I want to create random sequences for the variables a, b, c, d, e and f with the length of 6000 under specific conditions.
I want to randomly draw from a discrete uniform distribution between 10 and 40 for every sequence, but under the following condition:
a = f < (a+b)/2 < e < c < b < d
Does anyone know how I would code that?
The conditions are somewhat ad-hoc. A hit and miss approach which draws random vectors until the conditions are satisfied could work (though it might not be optimal). Something like:
randvect <- function(){
v <- sample(10:40,5)
while(any(c(v[1] >= v[2],
mean(v[1:2]) >= v[5],
v[5] >= v[3],
v[3] >= v[2],
v[2] >= v[4]))){
v <- sample(10:40,5)
}
v
}
For example,
> randvect()
[1] 16 26 25 36 23
(I don't bother with f since it is the same as a).
To get 6000:
vects <- replicate(6000,randvect())
With all the misses in the hit and miss, that takes about 30 seconds to evaluate on my machine.
This question isn’t really well defined, as there are different implementations that result in different distributions. For instance, taking the condition b=d. The latter is the most natural interpretation, but the most computationally expensive. You can improve it by randomly taking b and d, and then if b > d, then switch b and d. I think this logic can be extended to e,c,b,d: randomly choose four numbers between 10 and 40, then assign e to be the smallest, c the second smallest, etc. I think this will produce the same distribution as the “throw out” method, but I’m not sure. So to get e,c,b, and d:
numbers = sort(sample(10:40,4,replace = TRUE))
e = numbers[1]
c = numbers[2]
b = numbers[3]
d = numbers[4]
I'm still thinking about what to do with a, however.
John Coleman's answer will get there, and is may be a better way to randomly sample, but could potentially take a long time depending on what your allowable space is.
Another option to figure out the allowable space, and sample starting with a.
a has to be between 10 and 34 (to leave room for e, c, b, and d)
the average of a and b has to be =< (b - 2) and < 37. This means b has to be 5 or more than a, and less than 39
a + 4 < b < min((37 * 2) - a, 39)
The rest are a bit more straightforward. These can be wrapped into a function.
I'm going to use data.table more for looking at the results at the end. Also I'm using the function resample described in help(sample) to handle cases where there is only a single value to sample.
library(data.table)
resample <- function(x, ...) x[sample.int(length(x), ...)]
funky <- function() {
a <- resample(10:34, 1)
f <- a
b <- resample((a + 5):min(((37 * 2) - a + 1), 39), 1)
e <- resample(ceiling((a+b)/2 + 0.1):min(38, b - 2), 1)
c <- resample((e + 1):(b - 1), 1)
d <- resample((b + 1):40, 1)
c(a, b, c, d, e, f)
}
A few issues found by trial and error. In e, the 0.1 is added so that if the average is currently an integer, it gets increased by 1, but if the value is X.5 it will get rounded up to X + 1.
dat <- data.table(t(replicate(10000, funky())))
setnames(dat, c("a", "b", "c", "d", "e", "f"))
The following will return all rows that fail the tests in the original question. A few iterations with 10k samples and it doesn't look like anything is failing.
dat[!(a == f &
f < ((a + b) / 2) &
((a + b) / 2) < e &
e < c &
c < b &
b < d)]
Related
I am dealing with a variation of the well-known subset sum problem which I am really in need of some help with. In my problem, I have a matrix m with two columns(a, b) and n rows. I want to find the rows that the sum of the corresponding a and b values equal two target values (a_target, b_target). Some constraints are that a_target, b_target, a,b are all whole positive integers and I am only interested in the first solution that meets the criteria of the two targets being returned or, if no solution meets the criteria, the closest. This closest can be defined as the sum of the error across the two targets. As this method will be run on large datasets, I would need the solution to be optimised.
The problem could be set up as follows:
m <- matrix(data=sample(1:100, 200, replace=T),
ncol=2,
dimnames = list(
NULL,c("a","b")
))
head(m)
a b
[1,] 44 80
[2,] 51 24
[3,] 31 68
[4,] 46 55
[5,] 34 98
[6,] 93 49
a_target <- 500
b_target <- 700
To give some background, the ordinary subset sum problem deals with finding any subset of a set of integers that sums to some target t, which is NP-complete. There are multiple methods to do this with varying time optimisations. One such package in R is subsetsum, documentation. I have taken code form this package with an aim to modify it for use in my problem but I'm not sure if it is possible, for instance this solution requires t to be in increasing order to work so I'm not sure how applicable that would be with two t values. The code where t is a single column i.e. a vector is:
subsetsum <- function(S, t) {
n <- length(S)
inds <- NULL
x <- logical(n)
F <- numeric(t + 1)
G <- logical(t + 1)
G[1] <- TRUE
print(paste("n,inds,x,F,G",n,inds,x,F,G))
for (k in 1:n) {
H <- c(logical(S[k]), G[1:(t + 1 - S[k])])
H <- (G < H)
j <- which(H)
F[j] <- k
G[j] <- TRUE
if (G[t + 1]) break
}
wch <- which(G)
j <- wch[length(wch)]
fmax <- j - 1
while (j > 1) {
k <- F[j]
x[k] <- TRUE
j <- j - S[k]
}
inds <- which(x)
return(list(val = sum(S[inds]), inds = inds))
}
I am looking for an efficient way to nest the same function in R until a condition is met. I hope the following example illustrates my problem clearly.
Consider the function
f(x) = x^2 + 1, with x > 1.
Denote
f^{(k)}(x) = f(f(f(...f(x)))),
where the function f is evaluated k times within itself. Let M > 0, with M given.
Is there any efficient routine in R to determine the minimum value of k such that f^{(k)}(2) > M?
Thank you.
Nothing special for that. Just use a loop:
function(x, M) {
k <- 0
repeat {
x <- x^2 + 1
k <- k + 1
if (x > M)
break
}
k
}
Not particularly efficient, but often the overhead of evaluating f will be greater than the overhead of the loop. If that's not the case (and it might not be for this particular f), I'd suggest doing the equivalent thing in C or C++ (perhaps using Rcpp).
This would be the recursive approach:
# 2^2 + 1 == 5
# 5^2 + 1 == 26
# 26^2 + 1 == 677
f <- function(x,M,k=0){
if(x <= M) k <- f(x^2 + 1,M=M,k+1)
return(k)
}
f(2,3) # 1
f(2,10) # 2
f(2,50) # 3
f(2,700) # 4
I have an ex. where I have to see how many values of a vector are divisible by 2. I have this random sample:
set.seed(1)
y <- sample(c(0:99, NA), 400, replace=TRUE)
I created a new variable d to see which of the values are or aren't divisible by 2:
d <- y/2 ; d
What I want to do is to create a logical argument, where all entire numbers give true and the rest gives false. (ex: 22.0 -> TRUE & 24.5 -> FALSE)
I used this command, but I believe that the answer is wrong since it would only give me the numbers that are in the sample:
sum(d %in% y, na.rm=T)
I also tried this (I found on the internet, but I don't really understand it)
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol
sum(is.wholenumber(d),na.rm = T)
Are there other ways that I could use the operator "%%"?
you can sum over the mod operator like so: sum(1-y%%2) or sum(y%%2 == 0). Note that x %% 2 is the remainder after dividing by two which is why this solution works.
Here are three different ways:
length(y[y %% 2 == 0])
length(subset(y, y %% 2 == 0))
length(Filter(function(x) x %% 2 == 0, y))
Since we're talking about a division by 2, I would actually take it to the bit level and check if the last bit of the number is a 0 or a 1 (a 0 means it would be divisible by 2).
Going out on a limb here (not sure how the compiler handles this division by 2) but think that would likely be more optimized than a division, which is typically fairly expensive.
To do this at the bit level, you can just do an AND operation between the number itself and 1, if result it 1 it means won't be divisible by 2:
bitwAnd(a, b)
Basically;
a<-c(1,2,1,2)
b<-c(1,2,3,4)
I seek a function that returns a vector c with c[n]=b[n]+b[n-1] if a[n] even or b[n]+2b[n-1] otherwise.
Is there anything easier than a brute force for-loop? Some sort of advanced "Reduce" or equivalent.
x <- c(0, b[-length(b)]) # shifted b, 0 for first element
c <- ifelse((a %% 2) == 0, b + x, b + 2*x)
Be careful, length of a should be equal to length of b.
I have a small issue with a polynomial:
z²+alpha1*z + alpha2 = 0
I need to fins the values of alpha1 and alpha2 wihtin the roots of |z| < 1. Is there any program in R or Matlab able to do it?
the thing is that the alpha values are not known. I need to find the allowed area where the roots of the polynomial are <= |1|
#Jonel_R, your problem can be solved analytically.
First I'll rename your variables to make it easier to type. I'll also use some notation abuse...
We want to find the values (a, b) such that the roots of z^2 + a z + b == 0 satisfy the property |z|<=1.
The roots are given by (-a +- sqrt(d))/2, where d = a^2 - 4b
There are 3 possibilities. Two real distinct roots, one real root or two complex conjugate roots.
The middle case happens when d = 0, i. e., b = a^2 / 4. This is a parabola in the a vs. b plane. Not all points in this parabola generate polynomials whose root satisfy |z|<=1, however. The root, is this case, is simply -a/2, so we mus add the condition -1 <= a/2 <=1, i. e., -2 <= a <= 2.
Now let's consider the first case. The points in the a vs. b plae that generate polynomials with two distinct real roots lie below the parabola, i. e., they must satisfy b < a^2/4. The additional condition is that |z| = |(-a +- sqrt(d))/2| <= 1.
The condition can be written as -1 <= (-a +- sqrt(d))/2 <= 1, where +- means both roots must satisfy the condition. Working this out we get:
a-2 <= sqrt(d) <= a+2 & a-2 <= -sqrt(d) <= a+2
Since both sqrt(d) and -sqrt(d) must lie in the interval [a-2, a+2], and d > 0, then this interval must contain zero in its interior. This means -2 < a < 2.
The conditions can be joined as:
a-2 <= -sqrt(d) < 0 < sqrt(d) <= a+2
Squaring gives:
(a-2)^2 >= d & d <= (a+2)^2
d <= a^2 - 4a + 4 & d <= a^2 + 4a + 4
-4b <= -4a + 4 & -4b <= +4a + 4
b >= a-1 & b >= -a-1
This means that b must be located above the lines b = a-1 and b=-a-1. Also, a must be in [-2,2]. And, of course, we must have b < a^2/4. Wow...
Now the last case: complex roots. This is easier. Since d < 0, the roots are -a/2 +- i * sqrt(-d)/2. The absolute value of this is a^2/4 - d/4. This equals b, simply. So the condition is b <= 1, and, as always, b lying above that parabola.
That's it... Quite interesting problem. :-)
You can try the following test function: It'll plot the points with real roots in blue and complex roots in red.
test <- function(x=2, n=10000)
{
plot(c(-x,x), c(-x,x), type="n")
plot(function(a) (a^2)/4, from=-x, to=x, add=T)
plot(function(a) a-1, from=-x, to=x, add=T)
plot(function(a) -a-1, from=-x, to=x, add=T)
a <- runif(n, -x, x)
b <- runif(n, -x, x)
for( i in 1:n )
{
if( all(abs(polyroot(c(b[i],a[i],1))) <= 1) )
{
col <- ifelse(b[i] < 0.25*a[i]^2, "blue", "red")
points(a[i], b[i], pch=".", col=col)
}
}
}
BTW: the syntax for polyroot is polyroot(c(C, B, A)) gives the roots of Ax^2 + Bx + C. I believe #agstudy response got it wrong.
Similar to matlab solution in R ,
polyroot(c(1,alpha1,alpha2))
EDIT here a method to get the values of alpha graphically, it can be used to get intution about the plausible values. The idea here is :
choose a range of aplha1
choose a range of alpha2
for each combination of alpha1 and alpha2, compute the roots. Compute the module (||), if > 1 we remove it.
we get a grid of values with 4 column: alpha1,alpha2,norm1,norm2 where All the norms are <1
I plot alpha1 versus alpha2 to get regions...
So the code
## I choose alpha1 in interavl [-1,1]
alpha1 <- seq(-1, 1, length=200)
## I choose alpha2 in interavl [-2,2]
alpha2 <- seq(-2, 2, length=200)
dat <- expand.grid(data.frame(alpha1,alpha2))
## for each combination of (alpha1,alpha2)
## i compute the module of the roots
## I replace |roots|> 1 by NA
ll <- apply(dat,1,function(x) {
rr =Mod(polyroot(c(1,x['alpha1'],x['alpha2'])))
res <- ifelse(rr>1,NA,rr)
if (length(res)==1) res <- rep(res,2)
if (length(res)==0) res <- rep(NA,2)
else res
})
dat <- na.omit(cbind(dat,t(ll)))
## finally i plot the result
library(lattice)
xyplot(alpha2~alpha1,data=dat)
In matlab:
roots(1,alpha1,alpha2)
see http://www.mathworks.se/help/matlab/ref/roots.html