I am working with the R programming language. I am trying to build a loop that performs the following :
Step 1: Keep generating two random numbers "a" and "b" until both "a" and "b" are greater than 12
Step 2: Track how many random numbers had to be generated until it took for Step 1 to be completed
Step 3: Repeat Step 1 and Step 2 100 times
Since I do not know how to keep generating random numbers until a condition is met, I tried to generate a large amount of random numbers hoping that the condition is met (there is probably a better way to write this):
results <- list()
for (i in 1:100){
# do until break
repeat {
# repeat many random numbers
a = rnorm(10000,10,1)
b = rnorm(10000,10,1)
# does any pair meet the requirement
if (any(a > 12 & b > 12)) {
# put it in a data.frame
d_i = data.frame(a,b)
# end repeat
break
}
}
# select all rows until the first time the requirement is met
# it must be met, otherwise the loop would not have ended
d_i <- d_i[1:which(d_i$a > 10 & d_i$b > 10)[1], ]
# prep other variables and only keep last row (i.e. the row where the condition was met)
d_i$index = seq_len(nrow(d_i))
d_i$iteration = as.factor(i)
e_i = d_i[nrow(d_i),]
results[[i]] <- e_i
}
results_df <- do.call(rbind.data.frame, results)
Problem: When I look at the results, I noticed that the loop is incorrectly considering the condition to be met, for example:
head(results_df)
a b index iteration
4 10.29053 10.56263 4 1
5 10.95308 10.32236 5 2
3 10.74808 10.50135 3 3
13 11.87705 10.75067 13 4
1 10.17850 10.58678 1 5
14 10.14741 11.07238 1 6
For instance, in each one of these rows - both "a" and "b" are smaller than 12.
Does anyone know why this is happening and can someone please show me how to fix this problem?
Thanks!
How about this way? As you tag while-loop, I tried using it.
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- 1
while(a < 12 | b < 12) {
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- i + 1
}
x <- c(a,b,i)
res <- rbind(res, x)
}
head(res)
[,1] [,2] [,3]
x 12.14232 12.08977 399
x 12.27158 12.01319 1695
x 12.57345 12.42135 302
x 12.07494 12.64841 600
x 12.03210 12.07949 82
x 12.34006 12.00365 782
dim(res)
[1] 100 3
Related
I just saw a YouTube video from Numberphile on the Yellowstone sequence (A098550). It's base on a sequence starting with 1 and 2, with subsequent terms generated by the rules:
no repeated terms
always pick the lowest integer
gcd(a_n, a_(n-1)) = 1
gcd(a_n, a_(n-2)) > 1
The first 15 terms would be: 1 2 3 4 9 8 15 14 5 6 25 12 35 16 7
A Q&D approach in R could be something like this, but understandably, this becomes very slow at attempts to make longer sequences. It also make some assumptions about the highest number that is possible within the sequence (as info: the sequence of 10,000 items never goes higher than 5000).
What can we do to make this faster?
library(DescTools)
a <- c(1, 2, 3)
p <- length(a)
# all natural numbers
all_ints <- 1:5000
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
next_a_set <- all_ints[which(!all_ints %in% a)]
# rule 3 - search the remaining set for numbers that have gcd == 1
next_a_option <- next_a_set[which(
sapply(
next_a_set,
function(x) GCD(a[n], x)
) == 1
)]
# rule 4 - search the remaining number for gcd > 1
next_a <- next_a_option[which(
sapply(
next_a_option,
function(x) GCD(a[n - 1], x)
) > 1
)]
# select the lowest
a <- c(a, min(next_a))
n <- n + 1
}
Here's a version that's about 20 times faster than yours, with comments about the changes:
# Set a to the final length from the start.
a <- c(1, 2, 3, rep(NA, 997))
p <- 3
# Define a vectorized gcd() function. We'll be testing
# lots of gcds at once. This uses the Euclidean algorithm.
gcd <- function(x, y) { # vectorized gcd
while (any(y != 0)) {
x1 <- ifelse(y == 0, x, y)
y <- ifelse(y == 0, 0, x %% y)
x <- x1
}
x
}
# Guess at a reasonably large vector to work from,
# but we'll grow it later if not big enough.
allnum <- 1:1000
# Keep a logical record of what has been used
used <- c(rep(TRUE, 3), rep(FALSE, length(allnum) - 3))
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
# nothing to do -- used already records that.
repeat {
# rule 3 - search the remaining set for numbers that have gcd == 1
keep <- !used & gcd(a[n], allnum) == 1
# rule 4 - search the remaining number for gcd > 1
keep <- keep & gcd(a[n-1], allnum) > 1
# If we found anything, break out of this loop
if (any(keep))
break
# Otherwise, make the set of possible values twice as big,
# and try again
allnum <- seq_len(2*length(allnum))
used <- c(used, rep(FALSE, length(used)))
}
# select the lowest
newval <- which.max(keep)
# Assign into the appropriate place
a[n+1] <- newval
# Record that it has been used
used[newval] <- TRUE
}
If you profile it, you'll see it spends most of its time in the gcd() function. You could probably make that a lot faster by redoing it in C or C++.
The biggest change here is pre-allocation and restricting the search to numbers that have not yet been used.
library(numbers)
N <- 5e3
a <- integer(N)
a[1:3] <- 1:3
b <- logical(N) # which numbers have been used already?
b[1:3] <- TRUE
NN <- 1:N
system.time({
for (n in 4:N) {
a1 <- a[n - 1L]
a2 <- a[n - 2L]
for (k in NN[!b]) {
if (GCD(k, a1) == 1L & GCD(k, a2) > 1L) {
a[n] <- k
b[k] <- TRUE
break
}
}
if (!a[n]) {
a <- a[1:(n - 1L)]
break
}
}
})
#> user system elapsed
#> 1.28 0.00 1.28
length(a)
#> [1] 1137
For a fast C++ algorithm, see here.
I know how to generate 100 random numbers in R (without replacement):
random_numbers = sample.int(100, 100, replace = FALSE)
I was now curious about learning how to generate 100 "non random" numbers (without replacement). The first comes to mind is to generate a random number, and the next number will be the old number + 1 with a probability of 0.5 or an actual random number with probability 0.5. Thus, these numbers are not "fully random".
This was my attempt to write this code for numbers in a range of 0 to 100 (suppose I want to repeat this procedure 100 times):
library(dplyr)
all_games <- vector("list", 100)
for (i in 1:100){
index_i = i
guess_sets <- 1:100
prob_i = runif(n=1, min=1e-12, max=.9999999999)
guess_i = ifelse(prob_i> 0.5, sample.int(1, 100, replace = FALSE), guess_i + 1)
guess_sets_i <- setdiff(guess_sets_i, guess_i)
all_games_i = as.list(index_i, guess_i, all_games_i)
all_games[[i]] <- all_games_i
}
all_games <- do.call("rbind", all_games)
I tried to make a list that stores all guesses such that the range for the next guess automatically excludes numbers that have already been guessed, but I get this error:
Error in sample.int(1, 100, replace = FALSE) :
cannot take a sample larger than the population when 'replace = FALSE'
Ideally, I am trying to get the following results (format doesn't matter):
index_1 : 5,6,51,4,3,88,87,9 ...
index_2 77,78,79,2,65,3,1,99,100,4...
etc.
Can someone please show me how to do this? Are there easier ways in R to generate "non-random numbers"?
Thank you!
Note: I think an extra line of logic needs to be added - Suppose I guess the number 100, after guessing the number 100 I must guess a new random number since 100+1 is not included in the original range. Also, if I guess the number 5, 17 then 4 - and after guessing 4, the loop tells me to guess 4+1, this is impossible because 5 has already been guessed. In such a case, I would also have to guess a new random number?
It would be tricky to make your algorithm very efficient in R... it doesn't lend itself nicely to vectorization. Here's how I'd write it directly as a for loop:
semirandom = function(n) {
safe_sample = function(x, ...) {
if(length(x) == 1) return(x)
sample(x, ...)
}
result = numeric(n)
result[1] = sample.int(n, size = 1)
for(i in 2:length(result)) {
if(runif(1) < .5 &&
result[i - 1] < n &&
!((result[i - 1] + 1) %in% result)) {
result[i] = result[i - 1] + 1
} else {
result[i] = safe_sample(x = setdiff(1:n, result), size = 1)
}
}
result
}
# generate 10 semirandom numbers 5 times
replicate(semirandom(10), n = 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 6 4 4 2 6
# [2,] 3 5 5 3 7
# [3,] 4 3 6 4 5
# [4,] 5 1 2 5 2
# [5,] 7 9 3 6 3
# [6,] 9 10 10 1 1
# [7,] 10 2 8 9 4
# [8,] 2 8 1 8 10
# [9,] 1 7 9 10 9
# [10,] 8 6 7 7 8
You get the error cannot take a sample larger than the population when 'replace = FALSE' because you attempt to extract 100 values from a vector of length one without replacement.
The following draws numbers between 1 and 100, draws each number not more than once, has a 50 percent chance of drawing the previous number + 1 and a 50 percent chance of drawing another random number, if the previous number + 1 has not been drawn yet, and a 100 percent chance to draw another random number, if the previous number + 1 has been drawn.
i <- sample.int(100, 1)
j <- i
for(x in 1:99) {
if((i + 1L) %in% j) {
i <- sample((1:100)[-j], 1L)
} else {
if(runif(1L) > 0.5 || i == 100L) {
i <- sample((1:100)[-j], 1L)
} else {
i <- i + 1L
}
}
j <- c(j, i)
}
Survey shows average score of 4.2 out of 5, with sample size of 14. How do I create a dataframe that provides a combination of results to achieve score of 4.2?
I tried this but it got too big
library(tidyverse)
n <- 14
avg <- 4.2
df <- expand.grid(rep(list(c(1:5)),n))
df <- df %>%
rowwise() %>%
mutate(avge = mean(c_across())) %>%
filter(ave >= 4)
The aim for this is, given the limited information above, I want to know the distribution of combinations of individual scores and see which combination is more likely to occur and how many low scores + high scores needed to have an average of that score above.
Thanks!
If you can tolerate doing this randomly, then
set.seed(42) # only so that you get the same results I show here
n <- 14
iter <- 1000000
scores <- integer(0)
while (iter > 0) {
tmp <- sample(1:5, size = n, replace = TRUE)
if (mean(tmp) > 4) {
scores <- tmp
break
}
iter <- iter - 1
}
mean(scores)
# [1] 4.142857
scores
# [1] 5 3 5 5 5 3 3 5 5 2 5 5 4 3
Notes:
The reason I use iter in there is to preclude the possibility of an "infinite" loop. While here it reacts rather quickly and is highly unlikely to go that far, if you change the conditions then it is possible your conditions could be infeasible or just highly improbable. If you don't need this, then remove iter and use instead while (TRUE) ...; you can always interrupt R with Escape (or whichever mechanism your IDE provides).
The reason I prefill scores with an empty vector and use tmp is so that you won't accidentally assume that scores having values means you have your average. That is, if the constraints are too tight, then you should find nothing, and therefore scores should not have values.
FYI: if you're looking for an average of 4.2, two things to note:
change the conditional to be what you need, such as looking for 4.2 ... but ...
looking for floating-point equality is going to bite you hard (see Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754), I suggest looking within a tolerance, perhaps
tol <- 0.02
# ...
if (abs(mean(tmp) - 4.2) < tol) {
scores <- tmp
break
}
# ...
where tol is some meaningful number. Unfortunately, using this seed (and my iter limit) there is no combination of 14 votes (of 1 to 5) that produce a mean that is within tol = 0.01 of 4.2:
set.seed(42)
n <- 14
iter <- 100000
scores <- integer(0)
tol <- 0.01
while (iter > 0) {
tmp <- sample(1:5, size = n, replace = TRUE)
# if (mean(tmp) > 4) {
if (abs(mean(tmp) - 4.2) < tol) {
scores <- tmp
break
}
iter <- iter - 1
}
iter
# [1] 0 # <-- this means the loop exited on the iteration-limit, not something found
scores
# integer(0)
if you instead set tol = 0.02 then you will find something:
tol <- 0.02
# ...
scores
# [1] 4 4 4 4 4 5 4 5 5 5 3 4 3 5
mean(scores)
# [1] 4.214286
You can try the code below
n <- 14
avg <- 4.2
repeat{
x <- sample(1:5, n, replace = TRUE)
if (sum(x) == round(avg * n)) break
}
and you will see
> x
[1] 5 5 5 5 5 5 4 5 5 4 1 5 1 4
> mean(x)
[1] 4.214286
Good morning,
I have the following problem.
My Data.frame "data" has the format:
Type amount
1 2
2 0
3 3
I would like to create a vector with the format:
1
1
3
3
3
This means I would like to transform my data.
I created a vector and wrote the following code for my transformation in R:
vector <- numeric(5)
for (i in 1:3){
k <- 1
while (k <= data[i,2]){
vector[k] <- data[i,1]
k <- k+1
}
}
The problem is, I get the following results and I have no Idea at which part I go wrong…
3
3
3
0
0
There might be many different ways in solving this particular problem in R but I am curious why my solution doesn't work. I am thankful for alternatives, but really would like to know what my mistake is.
Thank's for your help!
Try this solution:
df <- data.frame(type = c(1, 2, 3), amount = c(2, 0, 3))
result <- unlist(mapply(function(x, y) rep.int(x, y), df[, "type"], df[, "amount"]))
result
Output is following:
# [1] 1 1 3 3 3
Exaclty your code is buggy. Correct code should looks following:
df <- data.frame(type = c(1, 2, 3), amount = c(2, 0, 3))
vector <- numeric(5)
k <- 1
for (i in 1:3) {
j <- 1
while (j <= df[i, 2]) {
vector[k] <- df[i, 1]
k <- k + 1
j <- j + 1
}
}
vector
# [1] 1 1 3 3 3
Probably the fastest and most elegant way to obtain this result has been posted before in a comment by #akrun:
with(data, rep(Type, amount))
[1] 1 1 3 3 3
However, if you want to do this with for/while loops, it could be helpful to use a list for such cases, where the number of entries is not known at the beginning.
Here is an example with minimal modifications of your code:
my_list <- vector("list", 3)
for (i in 1:3) {
k <- 1
while (k <= data[i,2]){
my_list[[i]][k] <- data[i,1]
k <- k + 1
}
}
vector <- unlist(my_list)
#> vector
#[1] 1 1 3 3 3
The reason why your code didn't work was essentially that you were trying to put too much information into a single variable, k. It cannot serve as both, an index of your output vector, and as a counter for the individual entries in the first column of data; a counter which is reset to 1 each time the while loop has finished.
I am writing a function that will take the largest elements in a vector V.Size and output into a matrix N by N+1. My problem is when V.Size is smaller than N*(N+1). When this happens, the matrix starts by going to the top of the vector while I want it to output NAs.
For example:
# vector V.size is
V.size <- c(1,2,3,4,5,6)
# and N is
N <- 2
# then, the output matrix should be
c1 c2 c3
r1 6 5 4
r2 3 2 1
And when N*(N+1) > V.Size, I want V.Size to fill it until V.Size runs out then return NAs instead of starting over.
My attempt to solve this problem is by by searching for when an element is larger than the previous and replacing it with an NA. My attempted solution returns the error:
Error in if (is.na(m)[(i - 1), (y + 1)]) { : argument is of length zero
Here's my code:
# Function Name: one
# Input: V.Size (a vector) and N
# Output: Matrix size N by N+1
# Code:
one <- function(x,y) {
# sort the data, largest to smallest with N.A. last
temp <- sort(x, decreasing = TRUE, na.last = TRUE)
#creating the matrix
m <- matrix(head(temp, (y*(y+1))), # only takes elements that fit in the matrix
nrow = y, # number of rows = N
ncol = (y+1), # number of columns = N+1
byrow = TRUE) # filling it by row as instructed
if (length(x) < (y*(y+1))) { # if V.Size is smaller than the outputted matrix
for (i in seq_len(y)) { # for loop for columns
for (j in seq_len(y+1)) { # for loop for rows
if (m[i, j] > m[i,1]) { # if the element is larger than the first in the row
m[i, j] = NA # return NA
}
# HERE IS WHERE THINGS FAIL:
if (is.na(m)[(i-1), (y+1)]) { # if the last element in the previous row is NA
m[i, ] = NA # make current row NA
}
}
}
}
# print the output
m
}
# creating dummy data
V.Size <- c(1:10)
# choosing a dummy N
N = 5
one(V.Size, N)
I get the error: Error in if (is.na(m)[(i - 1), (y + 1)]) { : argument is of length zero
How about this?
V.size <- 1:6
N <- 3
matrix(sort(V.size, decreasing=TRUE)[1:(N*(N+1))], nrow=N, byrow=TRUE)
[,1] [,2] [,3] [,4]
[1,] 6 5 4 3
[2,] 2 1 NA NA
[3,] NA NA NA NA
I think the problem is in first time run of the loop. You shouldn't check for the condition when loop runs for first time. i.e. when i-1 = -1 or i=1 then no check. There whould no previous member for the first time run!!!