R replicate function without repetition of drawn numbers - r

I'm looking for a lotto-function, meaning the drawn numbers aren't repeated. If I try either
y <- replicate(39,sample(1:39,1,replace=FALSE))
or
y <- replicate(39,sample(1:39,1,replace=TRUE))
the drawn numbers are repeating.
How can I prevent this?

Try sample(1:39, 39, replace = FALSE). Check ?sample.

Don't use replicate for that. To get 39 draws without repeats, use
sample(1:39, size = 39, replace = FALSE) (or, making use of defaults,
sample(39)).

Work from the inside out.
sample(1:39, 1, replace = FALSE)
picks one number from 1:39 uniformly at random. The replace = FALSE serves no purpose as you are only drawing one number anyway.
Now
replicate(39, sample(1:39, 1, replace = FALSE))
just replicates that 39 times. So there's no reason to expect there to be no duplicates.
You don't say exactly what lotto game you want to simulate. But the usual one is a something like a Lotto 6/39 game where 6 numbers are drawn from 1:39. To do this use:
sample(1:39, 6, replace = FALSE)
If you want to simulate many plays, say 1000 of them, that's when you use replicate:
replicate(1000, sample(1:39, 6, replace = FALSE)

Related

Poisson Process algorithm in R (renewal processes perspective)

I have the following MATLAB code and I'm working to translating it to R:
nproc=40
T=3
lambda=4
tarr = zeros(1, nproc);
i = 1;
while (min(tarr(i,:))<= T)
tarr = [tarr; tarr(i, :)-log(rand(1, nproc))/lambda];
i = i+1;
end
tarr2=tarr';
X=min(tarr2);
stairs(X, 0:size(tarr, 1)-1);
It is the Poisson Process from the renewal processes perspective. I've done my best in R but something is wrong in my code:
nproc<-40
T<-3
lambda<-4
i<-1
tarr=array(0,nproc)
lst<-vector('list', 1)
while(min(tarr[i]<=T)){
tarr<-tarr[i]-log((runif(nproc))/lambda)
i=i+1
print(tarr)
}
tarr2=tarr^-1
X=min(tarr2)
plot(X, type="s")
The loop prints an aleatory number of arrays and only the last is saved by tarr after it.
The result has to look like...
Thank you in advance. All interesting and supportive comments will be rewarded.
Adding on to the previous comment, there are a few things which are happening in the matlab script that are not in the R:
[tarr; tarr(i, :)-log(rand(1, nproc))/lambda]; from my understanding, you are adding another row to your matrix and populating it with tarr(i, :)-log(rand(1, nproc))/lambda].
You will need to use a different method as Matlab and R handle this type of thing differently.
One glaring thing that stands out to me, is that you seem to be using R: tarr[i] and M: tarr(i, :) as equals where these are very different, as what I think you are trying to achieve is all the columns in a given row i so in R that would look like tarr[i, ]
Now the use of min is also different as R: min() will return the minimum of the matrix (just one number) and M: min() returns the minimum value of each column. So for this in R you can use the Rfast package Rfast::colMins.
The stairs part is something I am not familiar with much but something like ggplot2::qplot(..., geom = "step") may work.
Now I have tried to create something that works in R but am not sure really what the required output is. But nevertheless, hopefully some of the basics can help you get it done on your side. Below is a quick try to achieve something!
nproc <- 40
T0 <- 3
lambda <- 4
i <- 1
tarr <- matrix(rep(0, nproc), nrow = 1, ncol = nproc)
while(min(tarr[i, ]) <= T0){
# Major alteration, create a temporary row from previous row in tarr
temp <- matrix(tarr[i, ] - log((runif(nproc))/lambda), nrow = 1)
# Join temp row to tarr matrix
tarr <- rbind(tarr, temp)
i = i + 1
}
# I am not sure what was meant by tarr' in the matlab script I took it as inverse of tarr
# which in matlab is tarr.^(-1)??
tarr2 = tarr^(-1)
library(ggplot2)
library(Rfast)
min_for_each_col <- colMins(tarr2, value = TRUE)
qplot(seq_along(min_for_each_col), sort(min_for_each_col), geom="step")
As you can see I have sorted the min_for_each_col so that the plot is actually a stair plot and not some random stepwise plot. I think there is a problem since from the Matlab code 0:size(tarr2, 1)-1 gives the number of rows less 1 but I cant figure out why if grabbing colMins (and there are 40 columns) we would create around 20 steps. But I might be completely misunderstanding! Also I have change T to T0 since in R T exists as TRUE and is not good to overwrite!
Hope this helps!
I downloaded GNU Octave today to actually run the MatLab code. After looking at the code running, I made a few tweeks to the great answer by #Croote
nproc <- 40
T0 <- 3
lambda <- 4
i <- 1
tarr <- matrix(rep(0, nproc), nrow = 1, ncol = nproc)
while(min(tarr[i, ]) <= T0){
temp <- matrix(tarr[i, ] - log(runif(nproc))/lambda, nrow = 1) #fixed paren
tarr <- rbind(tarr, temp)
i = i + 1
}
tarr2 = t(tarr) #takes transpose
library(ggplot2)
library(Rfast)
min_for_each_col <- colMins(tarr2, value = TRUE)
qplot(seq_along(min_for_each_col), sort(min_for_each_col), geom="step")
Edit: Some extra plotting tweeks -- seems to be closer to the original
qplot(seq_along(min_for_each_col), c(1:length(min_for_each_col)), geom="step", ylab="", xlab="")
#or with ggplot2
df1 <- cbind(min_for_each_col, 1:length(min_for_each_col)) %>% as.data.frame
colnames(df1)[2] <- "index"
ggplot() +
geom_step(data = df1, mapping = aes(x = min_for_each_col, y = index), color = "blue") +
labs(x = "", y = "")
I'm not too familiar with renewal processes or matlab so bear with me if I misunderstood the intention of your code. That said, let's break down your R code step by step and see what is happening.
The first 4 lines assign numbers to variables.
The fifth line creates an array with 40 (nproc) zeros.
The sixth line (which doesnt seem to be used later) creates an empty vector with mode 'list'.
The seventh line starts a while loop. I suspect this line is supposed to say while the min value of tarr is less than or equal to T ...
or it's supposed to say while i is less than or equal to T ...
It actually takes the minimum of a single boolean value (tarr[i] <= T). Now this can work because TRUE and FALSE are treated like numbers. Namely:
TRUE == 1 # returns TRUE
FALSE == 0 # returns TRUE
TRUE == 0 # returns FALSE
FALSE == 1 # returns FALSE
However, since the value of tarr[i] depends on a random number (see line 8), this could lead to the same code running differently each time it is executed. This might explain why the code "prints an aleatory number of arrays ".
The eight line seems to overwrite the assignment of tarr with the computation on the right. Thus it takes the single value of tarr[i] and subtracts from it the natural log of runif(proc) divided by 4 (lambda) -- which gives 40 different values. These fourty different values from the last time through the loop are stored in tarr.
If you want to store all fourty values from each time through the loop, I'd suggest storing it in say a matrix or dataframe instead. If that's what you want to do, here's an example of storing it in a matrix:
for(i in 1:nrow(yourMatrix)){
//computations
yourMatrix[i,] <- rowCreatedByComputations
}
See this answer for more info about that. Also, since it's a set number of values per run, you could keep them in a vector and simply append to the vector each loop like this:
vector <- c(vector,newvector)
The ninth line increases i by one.
The tenth line prints tarr.
the eleveth line closes the loop statement.
Then after the loop tarr2 is assigned 1/tarr. Again this will be 40 values from the last time through the loop (line 8)
Then X is assigned the min value of tarr2.
This single value is plotted in the last line.
Also note that runif samples from the uniform distribution -- if you're looking for a Poisson distribution see: Poisson
Hope this helped! Let me know if there's more I can do to help.

Completely stumped about this error: Error in seq.default(start.at, NROW(data), by = by) : wrong sign in 'by' argument

When I run this code, it works for about 100 iterations of the for loop then throws this error:Error in seq.default(start.at, NROW(data), by = by) :
wrong sign in 'by' argument
Here is the data that I used, and here is my code...
library(igraph)
library(zoo)
#import network data as edgelist
fake.raw.data <- read.csv("fakedata.csv")
fake.raw.data <- fake.raw.data[,2:3]
as.matrix(fake.raw.data)
#create igraph object from edglist data
fgraph <- graph_from_data_frame(fake.raw.data, directed = TRUE)
#finding the shortest paths that go through "special chain"
POI <- list()
df.vertices <- get.data.frame(fgraph, what = "vertices")
list.vertices <- as.list(df.vertices[,1])
AverageEBForPath <- function(graph = fgraph, from, to, mode = "out", chain){
browser()
asp <- all_shortest_paths(graph, from = from, to = to, mode)$res
for(i in seq_along(asp)){
if(sum(rollapply(names(asp[[i]]), length(chain), identical, chain)) == 1){
print(names(asp[[i]]))
}
}
}
AverageEBForPath(from = 32, to = V(fgraph), chain = c(32, 15, 9))
If anybody could help that would be extremely appreciated. I have been working on this for days, and I am really stuck.
Looking through the code of rollapply, there's a bit where it works out where in the array to start the rolling. The code it uses is:
start.at <- if (partial < 0)
max(-min(width[[1]]), 0) + 1
else 1
Note that in the function itself, width is a list generated from the window width that you're trying to use and the alignment you want... Given that you're passing a window width of 3 and a default alignment of "centre", the width list the function has generated for the code above is a list of three integers: [-1, 0, 1]
Which means that, using the code above, it has decided that given you're after a centre aligned window of width 3, the place to start is the second value in the data (because max(-min(width[[1]]),0) + 1 in the above code evaluates to 2).
All very reasonable, but whilst all of the rest of the instances of asp[[i]] have either 2 or 3 vertices, asp[[100]] has only one vertex (as you rightly pointed out) - so it throws a bit of fit trying to find the second one in order to start rolling through it!
I'm not entirely sure what your function is eventually going to do, so the ball's a bit in your court to work out how best to handle this, I think you've got two options given what you're seeing:
Option 1
Use the partial = TRUE setting on your rollapply, which will just always start at the first vertex no matter what (see the code snippet above!)
Option 2
Use align="left" in your rollapply. In this case, the width list we saw in the rollapply function itself would be [0, 1, 2] for a window width of 3 and start.at would evaluate to 1.
Hope that rambling and convoluted attempt at an answer helps!

Issues with nested while loop in for loop for R

I am using R to code simulations for a research project I am conducting in college. After creating relevant data structures and generating data, I seek to randomly modify a proportion P of observations (in increments of 0.02) in a 20 x 20 matrix by some effect K. In order to randomly determine the observations to be modified, I sample a number of integers equal to P*400 twice to represent row (rRow) and column (rCol) indices. In order to guarantee that no observation will be modified more than once, I perform this algorithm:
I create a matrix, alrdyModded, that is 20 x 20 and initialized to 0s.
I take the first value in rRow and rCol, and check whether alrdyModded[rRow[1]][rCol[1]]==1; WHILE alrdyModded[rRow[1]][rCol[1]]==1, i randomly select new integers for the indices until it ==0
When alrdyModded[rRow[1]][rCol[1]]==0, modify the value in a treatment matrix with same indices and change alrdyModded[rRow[1]][rCol[1]] to 1
Repeat for the entire length of rRow and rCol vectors
I believe a good method to perform this operation is a while loop nested in a for loop. However, when I enter the code below into R, I receive the following error code:
R CODE:
propModded<-1.0
trtSize<-2
numModded<-propModded*400
trt1<- matrix(rnorm(400,0,1),nrow = 20, ncol = 20)
cont<- matrix(rnorm(400,0,1),nrow = 20, ncol = 20)
alrdyModded1<- matrix(0, nrow = 20, ncol = 20)
## data structures for computation have been intitialized and filled
rCol<-sample.int(20,numModded,replace = TRUE)
rRow<-sample.int(20,numModded,replace = TRUE)
## indices for modifying observations have been generated
for(b in 1:numModded){
while(alrdyModded1[rRow[b]][rCol[b]]==1){
rRow[b]<-sample.int(20,1)
rCol[b]<-sample.int(20,1)}
trt1[rRow[b]][rCol[b]]<-'+'(trt1[rRow[b]][rCol[b]],trtSize)
alrdyModded[rRow[b]][rCol[b]]<-1
}
## algorithm for guaranteeing no observation in trt1 is modified more than once
R OUTPUT
" Error in while (alrdyModded1[rRow[b]][rCol[b]] == 1) { :
missing value where TRUE/FALSE needed "
When I take out the for loop and run the code, the while loop evaluates the statement just fine, which implies an issue with accessing the correct values from the rRow and rCol vectors. I would appreciate any help in resolving this problem.
It appears you're not indexing right within the matrix. Instead of having a condition like while(alrdyModded1[rRow[b]][rCol[b]]==1){, it should read like this: while(alrdyModded1[rRow[b], rCol[b]]==1){. Matrices are indexed like this: matrix[1, 1], and it looks like you're forgetting your commas. The for-loop should be something closer to this:
for(b in 1:numModded){
while(alrdyModded1[rRow[b], rCol[b]]==1){
rRow[b]<-sample.int(20,1)
rCol[b]<-sample.int(20,1)}
trt1[rRow[b], rCol[b]]<-'+'(trt1[rRow[b], rCol[b]],trtSize)
alrdyModded1[rRow[b], rCol[b]]<-1
}
On a side note, why not make alrdyModded1 a boolean matrix (populated with just TRUE and FALSE values) with alrdyModded1<- matrix(FALSE, nrow = 20, ncol = 20) in line 7, and have the condition be just while(alrdyModded1[rRow[b], rCol[b]]){ instead?

Dynamically adding values to dynamically created vectors

I just started learning to code in R. I have a requirement where I have to keep adding unknown number of values to different vectors (number of vectors is not known). So, I tried to implement this using -
clust_oo = c()
clust_oo[k] = c(clust_oo[k],init_dataset[k,1])
Without the [k], the above code works but since i don't know the number of vectors/lists i have to use [k] as a differentiator. clust_oo[1] could have values say, 1,23,45 , clust_oo[2] could have other values 4, 40 and clust_oo[3] with values 44, 67, 455, 885. Where the values are added dynamically.
Is this the right way to proceed for this?
Try:
clust_oo = c()
for(i in 1:3)
clust_oo[length(clust_oo)+1] = i
clust_oo
[1] 1 2 3

Generating new random variable with for loop

A TA wants me to create a new random variable Y_n=sum(X_i), where X_i are n binomial random variables, with N = 4 and p = 1/3. This wasn't too bad; I just use the following for loop: for(i in 1:100){yn[i] <- c(sum(rbinom(i, 4, (1/3))))}. However, he then wants me to recreate Y_n for every tenth number from 1 to 10,000 (i.e., 10, 20, 30,...,9990,10000). I tried to use this code: yseq <- seq(10, 10000, by=10)
for(i in yseq){
Y2[i] <- c(sum(rbinom(i,4,(1/3))))}. It sorta works, but not really. It returns a list (I checked its class) with seemingly correct values, but a bunch of NAs. This has created two problems for me: 1) R won't let me reclass the list as a vector, and 2) R tells me that the list is length 1, which is a bunch of rubbish.
Can some please tell me where I am going wrong? I've said it before: programming is not my forte, but I am always doing my best to learn!
Thanks!

Resources