R "non-random" number simulaton

R "non-random" number simulaton - r

This question is in regards to R. I would like to generate a "non-random" sample of 1's and 0's where the next value generated is dependant on the previous value. For example, if the first value in the vector is 1 then there is a 60% chance that next value is 1, and if the next value is a 0, then there is a 60% chance that the following will be 0. I have attached the question as it was posed to me. I greatly appreciate any help.

Given your conditions, I would generate a sequence using for loop.
set.seed(111)
n = 10000 #As the Q said at least 10,000
seq = vector()
seq[1] = sample(0:1, 1) #Also given in the Q
for(i in 2:n) {
if(seq[i-1] == 0){
seq[i] = sample(0:1, 1 , prob = c(0.6,0.4))
}
else{
seq[i] = sample(0:1, 1 , prob = c(0.4,0.6))
}
}
table(seq)
seq
0 1
4961 5039

Related

While Loops and Midpoints

Recently, I learned how to write a loop that initializes some number, and then randomly generates numbers until the initial number is guessed (while recording the number of guesses it took) such that no number will be guessed twice:
# https://stackoverflow.com/questions/73216517/making-sure-a-number-isnt-guessed-twice
all_games <- vector("list", 100)
for (i in 1:100){
guess_i = 0
correct_i = sample(1:100, 1)
guess_sets <- 1:100 ## initialize a set
trial_index <- 1
while(guess_i != correct_i){
guess_i = sample(guess_sets, 1) ## sample from this set
guess_sets <- setdiff(guess_sets, guess_i) ## remove it from the set
trial_index <- trial_index + 1
}
## no need to store `i` and `guess_i` (as same as `correct_i`), right?
game_results_i <- data.frame(i, trial_index, guess_i, correct_i)
all_games[[i]] <- game_results_i
}
all_games <- do.call("rbind", all_games)
I am now trying to modify the above code to create the following two loops:
(Deterministic) Loop 1 will always guess the midpoint (round up) and told if their guess is smaller or bigger than the correct number. They will then re-take the midpoint (e.g. their guess and the floor/ceiling) until they reach the correct number.
(Semi-Deterministic) Loop 2 first makes a random guess and is told if their guess is bigger or smaller than the number. They then divide the difference by half and makes their next guess randomly in a smaller range. They repeat this process many times until they reach the correct number.
I tried to write a sketch of the code:
#Loop 2:
correct = sample(1:100, 1)
guess_1 = sample(1:100, 1)
guess_2 = ifelse(guess_1 > correct, sample(50:guess_1, 1), sample(guess_1:100, 1))
guess_3 = ifelse(guess_2 > correct, sample(50:guess_2, 1), sample(guess_2:100, 1))
guess_4 = ifelse(guess_4 > correct, sample(50:guess_3, 1), sample(guess_3:100, 1))
#etc
But I am not sure if I am doing this correctly.
Can someone please help me with this?
Thank you!
Example : Suppose I pick the number 68
Loop 1: first random guess = 51, (100-51)/2 + 51 = 75, (75-50)/2 + 50 = 63, (75 - 63)/2 + 63 = 69, (69 - 63)/2 + 63 = 66, etc.
Loop 2: first random guess = 53, rand_between(53,100) = 71, rand_between(51,71) = 65, rand(65,71) = 70, etc.

I don't think you need a for loop for this, you can create structures since the beginning, with sample, sapply and which:
## correct values can repeat, so we set replace to TRUE
corrects <- sample(1:100, 100, replace = TRUE)
## replace is by default FALSE in sample(), if you don't want repeated guesses
## sapply() creates a matrix
guesses <- sapply(1:100, function(x) sample(1:100, 100))
## constructing game_results_i equal to yours, but could be simplified
game_results_i <- data.frame(
i = 1:100,
trial_index = sapply(
1:100,
function(x) which(
## which() returns the index of the first element that makes the predicate true
guesses[, x] == corrects[x]
)
),
guess_i = corrects,
correct_i = corrects # guess_i and correct_i are obviously equal
)

Ok, let's see if now I match question and answer properly :)
If I got correctly your intentions, in both loops, you are setting increasingly finer lower and upper bounds. Each guess reduces the search space. However, this interpretation does not always match your description, please double check if it can be acceptable for your purposes.
I wrote two functions, guess_bisect for the deterministic loop_1 and guess_sample for loop_2:
guess_bisect <- function(correct, n = 100) {
lb <- 0
ub <- n + 1
trial_index <- 1
guess <- round((ub - lb) / 2) + lb
while (guess != correct) {
# cat(lb, ub, guess, "\n") # uncomment to print the guess iteration
if (guess < correct)
lb <- guess
else
ub <- guess
guess <- round((ub - lb) / 2) + lb
trial_index <- trial_index + 1
}
trial_index
}
guess_sample <- function(correct, n = 100) {
lb <- 0
ub <- n + 1
trial_index <- 1
guess <- sample((lb + 1):(ub - 1), 1)
while (guess != correct) {
# cat(lb, ub, guess, "\n") # uncomment to print the guess iteration
if (guess < correct)
lb <- guess
else
ub <- guess
guess <- sample((lb + 1):(ub - 1), 1)
trial_index <- trial_index + 1
}
trial_index
}
Obviously, guess_bisect always produces the same results with the same input, guess_sample changes randomly instead.
By plotting the results in a simple chart, it seems that the deterministic bisection is on the average much better, as the random sampling may become happen to pick improvements from the wrong sides. x-axis is the correct number, spanning 1 to 100, y-axis is the trial index, with guess_bisect you get the red curve, with many attempts of guess_sample you get the blue curves.

R Limit matrix in random walk

I have the following code for a random walk, in which I start from i and add up cumulatively for each line.
However, I need to limit my random walk on each line. One way I thought of doing this, would be from the index j (where the value in the position is less than or equal to 0 or greater than or equal to t) of each line replace with null.
simulate_binomial = function(cenarios, rodadas, p){
return(matrix(data=rbinom(cenarios*rodadas, 1, p), nrow=cenarios, ncol=rodadas))
}
i = 2
t = 10
p = 0.8
max_walk = 100
samples = simulate_binomial(1000, max_walk, p)
samples[samples==0] = -1
walk = t(apply(cbind(i, samples), 1, cumsum))
walk1 = apply(walk, 1, function(x) (which((x <= 0) | (x >= t))[1]))
So my walk1 would be the indices of each line that would have a value less than or equal to zero or greater than or equal to t. However, I don't know how to assign null for this index onwards in the line.
My intention is to assign null so that I can plot precisely without this null part and see the effect of the ruin on each line / "scenario".
Can anyone help me plz?

You can change your last apply to :
walk1 <- t(apply(walk, 1, function(x) {
inds <- (which((x <= 0) | (x >= t))[1])
x[(inds+1):length(x)] <- NA
x
}))

Monte Carlo Simulation for Pattern

I am writing a Monte Carlo simulation to check how many times y was not immediately next to another y. I conjured up a vector of 40 x's and 10 y's placed at random position in the vector. My goal is to calculate the probabilities of not having any adjacent y's in the vector. Here is what I tried:
nrep = 100000
count = 0
for (i in 1:nrep) {
x = sample(c(rep('x', 40), c(rep('y', 10))))
if (x[i]!=x[i+1] && x[i+1]!=x[i+2]) count = count + 1
}
print(count/nrep)
The result is a very small number, which doesn't seem to make sense to me.

The if part is not correct. We can use head/tail to check for consecutive elements and see if there are any two consecutive 'y's in one iteration.
nrep = 100000
count = 0
set.seed(2020)
for (i in 1:nrep) {
x = sample(rep(c('x', 'y'), c(40, 10)))
if(any(head(x, -1) == 'y' & tail(x, -1) == 'y')) count = count + 1
}
count/nrep
#[1] 0.891

Monte Carlo for 3 or more consecutive faces

I wrote this code to check for 3 or more consecutive faces in a simulation of 100000 iterations, with five rolls of a fair die. I think it is in the right track, but I am missing something. I keep getting a missing value error:
nrep = 500000
count = 0
for (i in 1:nrep) {
roll = sample(6, 5)
print(roll)
if (roll[i] == roll[i+1] & roll[i+1] == roll[i+2]) count = count + 1
}
print(count)
Please advise on a correction using base R only.

Adding to my comment, you can use the function rle() to compute the lengths and values of runs of equal values in a vector. You can do something like the following
nrep = 500000
count = 0
for (i in 1:nrep) {
roll = sample(6, 5, replace = TRUE)
roll_rle = rle(roll)
if (any(roll_rle$lengths >= 3)) {
print(roll)
count = count + 1
}
}

how to generate a vector which satisfy some conditions?

everyone!
how to generate a vector which satisfy some conditions?
Problem: generate a vector a such that length(a)=400000 which is made up of 8 elements:0, 5, 10, 50, 500, 5000, 50000, 300000. Each element appears a set number of times, namely 290205, 100000, 8000, 1600, 160, 32, 2, 1, respectively. Further, a is blocked into 4,000 "groups" of 100 consecutive elements; call them a_k, k=1,...,4000. These groups must satisfy the following:
The sum of every group exceeds 150, i.e. sum_i a_k_i>150 for all k.
The elements 5, 10 and 50 appear between 25 and 29 times in each group, i.e. for all k, the set {i|a_i_k in (5,10,50)} has magnitude between 25 and 29.
0 never appears more than 8 times in a row in any group.
I have tried this many times, but it does not seem to work:
My current code is as follows:
T <- 4*10^(5) # data size
x <- c(0, 5, 10, 50, 500, 5000, 50000, 300000) #seed vector
t <- c(290205, 100000, 8000, 1600, 160, 32, 2, 1) #frequency
A <- matrix(0, 4000, 100) #4000 groups
k <- rep(0, times = 8) #record the number of seeds
for(m in 1:4000) {
p <- (t - k)/(T - 100*(m - 1)) #seed probability
A[, m] <- sample(x, 100, replace = TRUE, prob = p) #group m
sm <- 0
i <- 0
for(j in 1:92) {
if(sum(A[m,j:j + 8])==0){
if(A[m,j] > 0 & A[m,j] < 500) {i <- i+1}
sm <- sm+A[100*m+j]
}
else j <- 0
}
if (sm >= 150 & i > 24 & i < 30 & j != 0) {
m <- m + 1
for (n in seq_len(x)) {
k[n] <- sum(A[, m+1] == x[n]) + k[n]
}
}
}

How about just doing it by construction? For example:
amat<-matrix(rep(c(rep(rep(c(0,5),c(8,3)),8),
rep(c(0,NA),c(8,4))),4000),nrow=100)
amat[97:100,1:2205]<-c(rep(10,3),0)
amat[97:98,2206:4000]<-c(5,5)
amat[99:100,2206:2897]<-c(10,10)
amat[99:100,2898]<-c(5,50)
amat[99:100,2899:3307]<-c(5,50)
amat[99:100,3308:3902]<-c(50,50)
amat[which(is.na(amat))]<-rep(c(10,500,5000,5e4,3e5),c(1,160,32,2,1))
a<-c(amat)
This satisfies all your conditions:
Element counts:
>sapply(c(0,5,10,50,500,5000,50000,300000),function(x)length(which(a==x)))
[1] 290205 100000 8000 1600 160 32 2 1
Group sums:
> table(colSums(amat)>=150)
TRUE
4000
5,10,50 frequency:
> table(sapply(1:4000,function(x)abs(sum(amat[,x] %in% c(5,10,50))-27)<=2))
TRUE
4000
Runs of 0:
> table(sapply(1:4000,function(x)max(rle(amat[,x])$lengths[rle(amat[,x])$values==0])<=8))
#If this is slow, we can just use max(rle(amax[,x]))<=8
# because there aren't many valid groups with strings of 9+
# non-0 elements
TRUE
4000
if in fact we're never allowed to have strings of 9 0s, we'll need to make a slight adjustment to groups 2:2206, because, e.g. a[100:108]==0

I can start it off and maybe someone can help get to the next step. My approach is to start with the constraints and let sample work out the numbers.
set.seed(77)
choose <- c(0,5,10,50,500,5000,50000,300000)
freqs <- c(290205,100000,8000,1600,160,32,2,1)
probs <- freqs/sum(freqs)
check.sum <- function(vec) sum(vec) >= 150
check.interval <- function(vec) abs(sum(vec %in% c(5,10,50))-27)<=2
check.runs <- function(vec, runmax=8) max(rle(vec)$lengths[rle(vec)$values==0]) <= runmax
check.all <- function(vector) {
logicals <- c(check.sum(vector),
check.runs(vector),
check.runs(vector)
)
return(all(logicals))
}
nums <- NULL
res <- list()
for(i in 1:4000) {
nums <- numeric(100)
while(!check.all(nums)) {nums <- sample(choose, 100, replace=T,prob=probs)}
res[i] <- list(nums)
}
str(res)
List of 4000
$ : num [1:100] 1e+01
So this gets you a list of 4,000 groups of 100 numbers that fit the constraints. It only took about two seconds of system time.
Next step is for someone to get a way to build something similar except eliminate 300000 once it is used, and 50000 once it is used twice and so on.

Inspired by #plafort's approach, I've come up with the following that seems to work very quickly and should be capable of generating all vectors satisfying your conditions:
elts<-c(0,5,10,50,500,5000,50000,300000)
freq<-c(290205,100000,8000,1600,160,32,2,1)
ngrp<-4000L
grp.cond1<-function(x)sum(x)>=150
grp.cond2<-function(x)abs(sum(x %in% c(5,10,50))-27)<=2
grp.cond3<-function(x)max(rle(x)$lengths[rle(x)$values==0])<=8
check.all<-function(mat){
all(sapply(1:ncol(mat),function(y)grp.cond1(mat[,y])),
sapply(1:ncol(mat),function(y)grp.cond2(mat[,y])),
sapply(1:ncol(mat),function(y)grp.cond3(mat[,y])))}
while(!check.all(amat)){amat<-matrix(sample(rep(elts,freq)),ncol=ngrp)}
a<-c(amat)
I've also written the code in a way that should be easy to generalize to other element sets/counts, group numbers, and group-wise conditions.
Unfortunately it seems these conditions are pretty stringent, and it may take a long time to generate an acceptable a. I let the while loop run ~1300 times with no success...

Thanks for everyone! I have figured out my problem.
rm(list = ls())
media <- matrix(rep(rep(c(0,5,NA),c(72,25,3)),4000),nrow=100)
media[98:100,1:2400] <-c(10,10,10)
media[98:99,2401:3200] <-c(50,10)
media[98:99,3201:4000] <-c(50,0)
media[100,2401:4000] <-rep(c(0,500,5000,50000,300000),c(1405,160,32,2,1))
obj1 <- matrix(0,100L,4000)
obj2 <-obj1
grp.cond<-function(x) max(rle(x)$lengths[rle(x)$values==0])<=8
elts<-c(0,5,10,50,500,5000,50000,300000)
for(i in 1:4000){
freq<-c(sapply(elts, function(x) length(which(media[,i]==x))))
while(!grp.cond(obj1[,i])){obj1[,i]<-c(sample(rep(elts,freq)))}
i<-i+1
}
elts1<-c(1:4000)
freq1<-rep(1,times=4000)
a1<-sample(rep(elts1,freq1))
for(i in 1:4000){obj2[,i]<-obj1[,a1[i]]}
a <- c(obj2)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R "non-random" number simulaton - r

Related

While Loops and Midpoints

R Limit matrix in random walk

Monte Carlo Simulation for Pattern

Monte Carlo for 3 or more consecutive faces

how to generate a vector which satisfy some conditions?

Categories

Resources