Get a seed to generate a specific set of pseudo-casual number

Get a seed to generate a specific set of pseudo-casual number - r

I was wondering if there is a way in R to find the specific seed that generates a specific set of numbers;
For example:
sample(1:300, 10)
I want to find the seed that gives me, as the output of the previous code:
58 235 243 42 281 137 2 219 284 184

As far as I know there is no elegant way to do this, but you could brute force it:
desired_output <- c(58, 235, 243, 42, 281, 137, 2, 219, 284, 184)
MAX_SEED <- .Machine$integer.max
MIN_SEED <- MAX_SEED * -1
i <- MIN_SEED
while (i < MAX_SEED - 1) {
set.seed(i)
actual_output <- sample(1:300, 10)
if (identical(actual_output, desired_output)) {
message("Seed found! Seed is: ", i)
break
}
i <- i + 1
}
This takes 11.5 seconds to run with the first 1e6 seeds on my laptop - so if you're unlucky then it would take about 7 hours to run. Also, this is exactly the kind of task you could run in parallel in separate threads to cut the time down quite a bit.
EDIT: Updated to include negative seeds which I had not considered. So in fact it could take twice as long.

Related

Number of items to replace is not a multiple of replacement length?

Not sure what is wrong here.
Building the model along with the example
LBsPD <- c()
for (i in 1:5000) {
FishCaught <- sample(x=c(7,4,2), size=1, prob = c(.1,.6,.3),replace = TRUE)
YellowPercent <- sample(x=c(0,.25,.35), size=1, prob = c(.25,.5,.25),replace = TRUE)
BluePercent <- 1-YellowPercent
BlueLBs <- rnorm(n=365, mean=35, sd=18)
YellowLBs <- rnorm(n=365, mean=30, sd=18)
LBsPerDay <- FishCaught * ((BluePercent * BlueLBs[BlueLBs > 20]) + (YellowPercent * YellowLBs[YellowLBs > 20]))
LBsPD[i] <- LBsPerDay
}
Keep getting the 50+ errors "Number of items to replace is not a multiple of replacement length" But in the example it is the same.

Here's a drawn-out explanation that #AndrewChisholm started.
I'll start by (1) setting the random seed, so you can repeat this in your console, and (2) stepping through the for loop's first iteration.
set.seed(42) # since this is a random process
i <- 1 # first pass in the loop
FishCaught <- sample(x=c(7,4,2), size=1, prob = c(.1,.6,.3),replace = TRUE)
YellowPercent <- sample(x=c(0,.25,.35), size=1, prob = c(.25,.5,.25),replace = TRUE)
BluePercent <- 1-YellowPercent
BlueLBs <- rnorm(n=365, mean=35, sd=18)
Now, let's look at the components of your next expression:
# FishCaught * ((BluePercent * BlueLBs[BlueLBs > 20]) +
# (YellowPercent * YellowLBs[YellowLBs > 20]))
str(BlueLBs[BlueLBs > 20])
# num [1:291] 24.8 41.5 46.4 42.3 33.1 ...
str(YellowLBs[YellowLBs > 20])
# num [1:255] 64.1 22.5 36.3 59.3 31.6 ...
It doesn't matter now that BluePercent is 1 and YellowPercent is 0, since 0*somevec is still the length of the vector, so you are effectively trying to add vectors of different sizes. What does this mean to you?
c(1,3,5,7,9) + c(1,1000,1)
# Warning in c(1, 3, 5, 7, 9) + c(1, 1000, 1) :
# longer object length is not a multiple of shorter object length
# [1] 2 1003 6 8 1009
The bigger problem here is that R does not consider this a problem: it warns you that it is suspect, but it happily "recycles" the values for you. So this is not what is causing your error.
LBsPerDay <- FishCaught * ((BluePercent * BlueLBs[BlueLBs > 20]) + (YellowPercent * YellowLBs[YellowLBs > 20]))
# Warning in (BluePercent * BlueLBs[BlueLBs > 20]) + (YellowPercent * YellowLBs[YellowLBs > :
# longer object length is not a multiple of shorter object length
str(LBsPerDay)
# num [1:291] 174 291 325 296 232 ...
This is not a syntax error, but you should treat it as a "data is now corrupt" error, because you really don't know for certain what numbers were added/multiplied to other numbers (see my previous example of with c(1,1000,1) to see that if we think things should be aligned, then those results are going to result in some incorrect logical conclusions from this process).
Here's the real problem:
LBsPD[i] <- LBsPerDay
# Warning in LBsPD[i] <- LBsPerDay :
# number of items to replace is not a multiple of replacement length
First, some clarification:
This is a Warning, not an Error. The only way I get an error with that is if I had previous set options(warn=2) (which, btw, is not a bad idea here). Warnings can often be ignored if you expect them, but in this case you should really pay attention to it and treat it as an error.
LBsPerDay is length 291, but you try trying to cram 291 numbers into one position in the vector LBsPD[i]. That is, the length of the LHS using [i] is always going to be the length of i, which is 1; whereas the length of the RHS is (in this case) 291.
Options:
I'm inferring that your BlueLBs[BlueLBs > 20] might be a filter so that fish caught below 20 (pounds? kilos? grams?) will not be "caught". In that case, let's just replace those under 20 with 0 ... but please check me on this logic ... a blue/yellow that is below 20 will be changed to 0, effectively "not caught":
LBsPerDay <- FishCaught * ((BluePercent * replace(BlueLBs, BlueLBs <= 20, 0)) + (YellowPercent * replace(YellowLBs, YellowLBs <= 20, 0)))
str(LBsPerDay)
# num [1:365] 174 291 325 296 232 ...
(No warning, no error.)
If you intend LBsPD to contain all of the weights for each iteration in your simulation, then start with LBsPD <- list(), in which case you'll eventually use
LBsPD <- list()
for (i in 1:5000) {
# ...
LBsPD[[i]] <- LBsPerDay
}
where after 3 (of 5000) iterations, your LBsPD looks like:
str(LBsPD)
# List of 3
# $ : num [1:365] 174 291 325 296 232 ...
# $ : num [1:365] 160.7 97 161.5 145 99.6 ...
# $ : num [1:365] 30.3 121.4 111.7 210.8 139.7 ...
BTW, you might notice that both BlueLBs and YellowLBs have negatives ... not sure if that's a problem, but negative pounds seems suspect. (Because "normal distributions" are by definition unbounded, many things labeled as "normally distributed" are typically not asymptotically compliant. For hasty simulations like this, I often revert to a triangular distribution, which may be normal-enough for some applications, and certainly never gives negative or extremely-positive weights.)

Hoping for help to translate a thought experiment into R code, using randomization

I'm more experienced with R than many of my peers, yet it sometimes takes hours to move a novel-to-me concept into the code line, and usually a few more to get a successful output. I don't know how to describe this in R language, so I hope you can help me- either with sample code, or pointing me in the right direction.
I have c(X1,X2,X3,...Xn) for starting variable, a non-random numeric value.
I have c(Y1,Y2,Y3,...Yn) for change variable, a non-random numeric value denoting by how much to change X, give or take, and a value between 0-10.
I have c(Z1,Z2,Z3,...Zn) which is the min and max range of X.
What I want to observe is the random sampling of all numbers X, which have all randomly had corresponding Y variable subtracted or added to them. What I'm trying to ask in this problem, is how many times will I draw X values which are exactly the X values which I initially input as well as give or take only a low Y value.
For instance,
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Zees<-c(c(115,155),c(450,474),c(510,648),c(200,244))
First iteration: X=c(135,562,579,222), second iteration: X=c(130,471,585,230)<- as you can see, X of second iteration has changed by (-5*Y1), (+3*Y2), (+2*Y3), and (+11*Y4)
What I want to output is a list of randomized X values which have changed by only a factor of their corresponding Y value, and always fall within the range of given Z values. Further, I want to examine how many times at least one- and only one- X value will be be significantly different from the corresponding,starting input X.
I feel like I'm not wording the question succinctly, but I also feel that this is why I've posted. I'm not trying to ask for hand-holding, but rather seeking advice.

I am not sure that I understood the question, do you want to reiterate the process numerous times? is it for the purpose of simulation?. Here is a start of a solution.
library(dplyr)
x <- c(135,462,579,222)
y <- c(1,3,3,2)
z.lower <- c(115, 450, 510, 200)
z.upper <- c(155, 474, 648, 244)
temp.df <- data.frame(x, y, z.lower, z.upper)
df %>%
mutate(samp = sample(seq(-10, 10, 1), nrow(temp.df))) %>% ### Sample numbers between 0 and 10
mutate(new.val = x + samp * y) %>% ### Create new X
mutate(is.bound = new.val < z.upper & new.val > z.lower) ### Check that falls in bounds
x y z.lower z.upper samp new.val is.bound
1 135 1 115 155 -10 125 TRUE
2 462 3 450 474 10 492 FALSE
3 579 3 510 648 8 603 TRUE
4 222 2 200 244 6 234 TRUE

For this dataset, this is a possibility:
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Zees<-c(c(115,155),c(450,474),c(510,648),c(200,244))
n = 10000
x_range_l <- split(Zees, rep(seq_len(length(Zees) / 2), each = 2))
mapply(function(y, x_range) sample(seq(from = x_range[1], to = x_range[2], by = y), size = n, replace = T),
Whys, x_range_l)
Note that this option depends more on the Zees than the Exes. A more complete way to do it would be:
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Why_Range <- c(20, 4, 13, 11)
x_range_l <- Map(function(x, y, rng) c(x - y * rng, x + y * rng), Exes, Whys, Why_Range)
n = 10000
mapply(function(y, x_range) sample(seq(from = x_range[1], to = x_range[2], by = y), size = n, replace = T),
Whys, x_range_l)

Filtering a vector on condition

I am trying to filter a vector of integers.
My condition is that the distance between 2 consecutive elements should be at least 100 ; if not, remove the element and look at the next candidate.
Here is an example :
set.seed(42)
input <- sort(sample(1:1000, 20))
head(input, 20)
[1] 24 49 74 128 146 153 165 228 303 321 356 410 532 561 601 622 634 839 882 997
If I start from the first element 24, I would like to keep the first element that has a distance at least 100 from it.
In this case, it would be 128.
Then, from 128, repeat the same process.
The result should be :
24 128 228 356 532 634 839 997
I managed to create a quick and dirty loop that gives the correct result, but I can guess that it would not be very efficient for very large vectors...
result <- integer(length(input))
result[1] <- input[1]
for(i in seq_along(input)[-1]) {
if(is.na(input[2])) break
if(input[2] - input[1] < 100) {
input <- input[-2]
} else {
result[i] <- input[2]
input <- input[-1]
}
}
result <- result[result != 0]
What would be an efficient way to get the expected result ? Can it be done using vectorization ?

unique(Reduce(function(x,y)ifelse(y-x>=100,y,x),input,accumulate = T))
[1] 24 128 228 356 532 634 839 997

Not thoroughly tested, but I believe this gets you there. I am using purrr::accumulate. This is a pretty neat problem :-) hoping to see some other solutions/approaches, so maybe leave this open (unanswered) for a bit...
library(purrr)
input <- c(24, 49, 74, 128, 146, 153, 165, 228, 303, 321, 356, 410, 532, 561, 601, 622, 634, 839, 882, 997)
idx <- which(accumulate(diff(input), ~ ifelse(.x >= 100, .y, .x + .y)) >= 100)
input[c(1, idx + 1)]
#> [1] 24 128 228 356 532 634 839 997
And to make this read a little more purrr, I suppose we could do:
accumulate(diff(input), ~ if_else(.x >= 100, .y, .x + .y)) %>%
map_lgl(~ . >= 100) %>%
which %>%
{ input[c(1, . + 1)] }

Using RGL to plot 3d line segments in R

I having some problems with an application of the rgl 3d graphing package.
I'm trying to draw some line segments. And my data is arranged in a dataframe called 'markers' with six columns, one for each of the starting x, y, and z values, and one for each of the ending x, y, and z values.
startX startY startZ endX endY endZ
69.345 45.732 20 115 39.072 1.92413
80.270 38.480 30 175 44.548 0.36777
99.590 33.596 20 175 35.224 0.06929
32.120 41.218 20 115 39.294 2.81424
11.775 37.000 30 175 35.890 1.38047
76.820 44.104 22 115 44.992 4.14674
85.790 23.384 18 115 36.112 0.40508
80.040 17.464 20 175 31.080 2.59038
103.615 38.850 22 115 39.220 3.18201
41.200 31.006 30 175 36.260 3.48049
88.665 43.956 30 115 39.738 0.50635
109.365 23.976 20 175 33.374 3.99750
This should be a piece of cake. Just feed those values to the segment3d() command and I should get the plot I want. Only I can't figure out how to correctly pass the respective starting and ending pairs into segment3d().
I've tried just about everything possible ($ notation, indexing, concatenating, using a loop, apply and sapply, etc.), including reading the documentation. It's great, it says for the arguments x, y, and z: "Any reasonable way of defining the coordinates is acceptable." Ugh... it does refer you to the xyz.coords utility.
So I went over that documentation. And I think I understand what it does; I can even use it to standardize my data e.g.
starts <- xyz.coords(markers$startX, markers$startY, markers$startZ)
ends <- xyz.coords(markers$endX, markers$endY, markers$endZ)
But then I'm still not sure what to do with those two lists.
segments3d(starts, ends)
segments3d(starts + ends)
segments3d((starts, ends), (starts, ends), (starts, ends))
segments3d(c(starts, ends), c(starts, ends), c(starts, ends))
segments3d(c(starts$x, ends$x), c(starts$y, ends$y), c(starts$z, ends$z))
I mean I know why the above don't work. I'm basically just trying things at this point as this is making me feel incredibly stupid, like there is something obvious—I mean facepalm level obvious—I'm missing.
I went through the rgl documentation itself looking for an example, and the only place I found them using segment3d() in any manner resembling what I'm trying to do, they used the '+' notation I tried above. Basically they built 2 matrices and added the second to the first.

Something like this should work.
library(rgl)
open3d(scale=c(1/5,1,1))
segments3d(x=as.vector(t(markers[,c(1,4)])),
y=as.vector(t(markers[,c(2,5)])),
z=as.vector(t(markers[,c(3,6)])))
axes3d()
title3d(xlab="X",ylab="Y",zlab="Z")
The problem is that segments3d(...) takes the x (and y and z) values in pairs. So rows 1-2 are the first segment, rows 3-4 are the second segment, etc. You need to interleave, e.g. $startx and $endx, etc. The code above does that.

Code for creating the data set:
markers <- data.frame(startX = c(69.345, 80.270, 99.590, 32.120, 11.775, 76.820, 85.790, 80.040, 103.615, 41.200, 88.665, 109.365), startY = c(45.732, 38.480, 33.596, 41.218, 37.000, 44.104, 23.384, 17.464, 38.850, 31.006, 43.956, 23.976), startZ = c(20, 30, 20, 20, 30, 22, 18, 20, 22, 30, 30, 20), endX = c(115, 175, 175, 115, 175, 115, 115, 175, 115, 175, 115, 175), endY = c(39.072, 44.548, 35.224, 39.294, 35.890, 44.992, 36.112, 31.080, 39.220, 36.260, 39.738, 33.374), endZ = c(1.92413, 0.36777, 0.06929, 2.81424, 1.38047, 4.14674, 0.40508, 2.59038, 3.18201, 3.48049, 0.50635, 3.99750))

R: bizarre behavior of set.seed()

Odd thing happens when in R when I do set.seed(0) and set.seed(1);
set.seed(0)
sample(1:100,size=10,replace=TRUE)
#### [1] 90 27 38 58 91 21 90 95 67 63
set.seed(1)
sample(1:100,size=10,replace=TRUE)
#### [1] 27 38 58 91 21 90 95 67 63 7
When changing the seed from 0 to 1, I get the exact same sequence, but shifted over by 1 cell!
Note that if I do set.seed(2), I do get what appears to be a completely different (random?) vector.
set.seed(2)
sample(1:100,size=10,replace=TRUE)
#### [1] 19 71 58 17 95 95 13 84 47 55
Anyone know what's going on here?

This applies to the R implementation of the Mersenne-Twister RNG.
set.seed() takes the provided seed and scrambles it (in the C function RNG_Init):
for(j = 0; j < 50; j++)
seed = (69069 * seed + 1);
That scrambled number (seed) is then scrambled 625 times to fill out the initial state for the Mersenne-Twister:
for(j = 0; j < RNG_Table[kind].n_seed; j++) {
seed = (69069 * seed + 1);
RNG_Table[kind].i_seed[j] = seed;
}
We can examine the initial state for the RNG using .Random.seed:
set.seed(0)
x <- .Random.seed
set.seed(1)
y <- .Random.seed
table(x %in% y)
You can see from the table that there is a lot of overlap. Compare this to seed = 3:
set.seed(3)
z <- .Random.seed
table(z %in% x)
table(z %in% y)
Going back to the case of 0 and 1, if we examine the state itself (ignoring the first two elements of the vector which do not apply to what we are looking at), you can see that the state is offset by one:
x[3:10]
# 1280795612 -169270483 -442010614 -603558397 -222347416 1489374793 865871222
# 1734802815
y[3:10]
# -169270483 -442010614 -603558397 -222347416 1489374793 865871222 1734802815
# 98005428
Since the values selected by sample() are based on these numbers, you get the odd behavior.

As you can see from the other answer, seeds 0 and 1 result in almost similar initial states. In addition, Mersenne Twister PRNG has a severe limitation - "almost similar initial states will take a long time to diverge"
It is therefore advisable to use alternatives like WELL PRNG (which can be found in randtoolbox package)