Generating new random variable with for loop - r

A TA wants me to create a new random variable Y_n=sum(X_i), where X_i are n binomial random variables, with N = 4 and p = 1/3. This wasn't too bad; I just use the following for loop: for(i in 1:100){yn[i] <- c(sum(rbinom(i, 4, (1/3))))}. However, he then wants me to recreate Y_n for every tenth number from 1 to 10,000 (i.e., 10, 20, 30,...,9990,10000). I tried to use this code: yseq <- seq(10, 10000, by=10)
for(i in yseq){
Y2[i] <- c(sum(rbinom(i,4,(1/3))))}. It sorta works, but not really. It returns a list (I checked its class) with seemingly correct values, but a bunch of NAs. This has created two problems for me: 1) R won't let me reclass the list as a vector, and 2) R tells me that the list is length 1, which is a bunch of rubbish.
Can some please tell me where I am going wrong? I've said it before: programming is not my forte, but I am always doing my best to learn!
Thanks!

Related

Replacing the first n values of each R dataframe column according to function

I'm trying to compare a "regular" data-set to a contaminated one, however I'm having trouble creating the contaminated data-set
Each list contains 25 data-frames, to each corresponding a size n; each data-frame contain m=850 samples of size n = {100, 200, ..., 2500} of an exponential distribution
I have tried replacing the first n/4 items of each sample for each data-frame.
The current way I am doing it adds extra entries to the contaminated data-frames, which I do not want - I merely wish to replace them.
However, if I switch c(j) with c(1:n/4), an error pops up saying replacement has 25 rows, data has 100.
What could I do better?
set.seed(915)
n_lst <- seq(from = 100, to = 2500, by=100)
m_lst <- seq(from=1, to=850, by=1)
l = list()
lCont = list()
i=1
for (n in n_lst) {
l[[i]] = lCont[[i]] = data.frame(replicate(850, rexp(n, 0.73)))
for (j in m_lst) {
lCont[[i]][c(j), c(1:n/4)] = rexp(n/4, 0.01)
}
i <-i+1
}
Bellow are the original list and the contaminated list (sorry about the formatting issues I was having trouble with the formatting verification)
Original List
Contaminated List
The main problem is that you are indexing using [columns, rows], which is backwards. R indexes data frames and matrices as [rows, columns]. Switching to lCont[[i]][1:(n / 4), j] will solve that.
Also note that : comes early in R's order of operations, you want 1:(n / 4), not 1:n / 4.
And a last comment, c() is only needed if you're combining more than one thing, like c(1:5, 12). c(j) is a long way to write j.

In R, is it possible to use a pair, tuple or equivalent in a matrix?

I am trying to create a matrix of coordinates(indexes) that I randomly pick one from using the sample function. I then use these to select a cell in another matrix. What is the best way to do this? The trouble is how to store these integers in the matrix so that they are easy to separate. Right now I have them stored as strings with a comma, that I then split. Someone suggested I use a pair, or a string, but I cannot seam to get these to work with a matrix. Thanks!
EDIT:What i currently have looks like this (changed a little to make sense out of context):
probs <- matrix(c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0),5,5)
cordsMat <- matrix("",5,5)
for (x in 1:5){
for (y in 1:5){
cordsMat[x,y] = paste(x,y,sep=",")
}
}
cords <- sample(cordsMat,1,,probs)
cordsVec <- unlist(strsplit(cords,split = ","))
cordX <- as.numeric(cordsVec[1])
cordY <- as.numeric(cordsVec[2])
otherMat[cordX,cordY]
It sort of works but i would also be interested for a better way, as this will get repeated a lot.
If you want to set the probabilities it can easily be done by providing it to sample
# creating the matrix
matrix(sample(rep(1:6, 15:20), 25), 5) -> other.mat
# set the probs vec
probs <- c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0)
# the coordinates matrix
mat <- as.matrix(expand.grid(1:nrow(other.mat),1:ncol(other.mat)))
# sampling a row randomly
sample(mat, 1, prob=probs) -> rand
# getting the value
other.mat[mat[rand,1], mat[rand,2]]
[1] 6

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Issues with nested while loop in for loop for R

I am using R to code simulations for a research project I am conducting in college. After creating relevant data structures and generating data, I seek to randomly modify a proportion P of observations (in increments of 0.02) in a 20 x 20 matrix by some effect K. In order to randomly determine the observations to be modified, I sample a number of integers equal to P*400 twice to represent row (rRow) and column (rCol) indices. In order to guarantee that no observation will be modified more than once, I perform this algorithm:
I create a matrix, alrdyModded, that is 20 x 20 and initialized to 0s.
I take the first value in rRow and rCol, and check whether alrdyModded[rRow[1]][rCol[1]]==1; WHILE alrdyModded[rRow[1]][rCol[1]]==1, i randomly select new integers for the indices until it ==0
When alrdyModded[rRow[1]][rCol[1]]==0, modify the value in a treatment matrix with same indices and change alrdyModded[rRow[1]][rCol[1]] to 1
Repeat for the entire length of rRow and rCol vectors
I believe a good method to perform this operation is a while loop nested in a for loop. However, when I enter the code below into R, I receive the following error code:
R CODE:
propModded<-1.0
trtSize<-2
numModded<-propModded*400
trt1<- matrix(rnorm(400,0,1),nrow = 20, ncol = 20)
cont<- matrix(rnorm(400,0,1),nrow = 20, ncol = 20)
alrdyModded1<- matrix(0, nrow = 20, ncol = 20)
## data structures for computation have been intitialized and filled
rCol<-sample.int(20,numModded,replace = TRUE)
rRow<-sample.int(20,numModded,replace = TRUE)
## indices for modifying observations have been generated
for(b in 1:numModded){
while(alrdyModded1[rRow[b]][rCol[b]]==1){
rRow[b]<-sample.int(20,1)
rCol[b]<-sample.int(20,1)}
trt1[rRow[b]][rCol[b]]<-'+'(trt1[rRow[b]][rCol[b]],trtSize)
alrdyModded[rRow[b]][rCol[b]]<-1
}
## algorithm for guaranteeing no observation in trt1 is modified more than once
R OUTPUT
" Error in while (alrdyModded1[rRow[b]][rCol[b]] == 1) { :
missing value where TRUE/FALSE needed "
When I take out the for loop and run the code, the while loop evaluates the statement just fine, which implies an issue with accessing the correct values from the rRow and rCol vectors. I would appreciate any help in resolving this problem.
It appears you're not indexing right within the matrix. Instead of having a condition like while(alrdyModded1[rRow[b]][rCol[b]]==1){, it should read like this: while(alrdyModded1[rRow[b], rCol[b]]==1){. Matrices are indexed like this: matrix[1, 1], and it looks like you're forgetting your commas. The for-loop should be something closer to this:
for(b in 1:numModded){
while(alrdyModded1[rRow[b], rCol[b]]==1){
rRow[b]<-sample.int(20,1)
rCol[b]<-sample.int(20,1)}
trt1[rRow[b], rCol[b]]<-'+'(trt1[rRow[b], rCol[b]],trtSize)
alrdyModded1[rRow[b], rCol[b]]<-1
}
On a side note, why not make alrdyModded1 a boolean matrix (populated with just TRUE and FALSE values) with alrdyModded1<- matrix(FALSE, nrow = 20, ncol = 20) in line 7, and have the condition be just while(alrdyModded1[rRow[b], rCol[b]]){ instead?

Select random element in a list of R?

a<-c(1,2,0,7,5)
Some languages have a picker -function -- choose one random number from a -- how in R?
# Sample from the vector 'a' 1 element.
sample(a, 1)
the above answers are technically correct:
sample(a, 1)
however, if you would like to repeat this process many times, let's say you would like to imitate throwing a dice, then you need to add:
a <- c(1,2,3,4,5,6)
sample(a, 12, replace=TRUE)
Hope it helps.
Be careful when using sample!
sample(a, 1) works great for the vector in your example, but when the vector has length 1 it may lead to undesired behavior, it will use the vector 1:a for the sampling.
So if you are trying to pick a random item from a varying length vector, check for the case of length 1!
sampleWithoutSurprises <- function(x) {
if (length(x) <= 1) {
return(x)
} else {
return(sample(x,1))
}
}
This method doesn't produce an error when your vector is length one, and it's simple.
a[sample(1:length(a), 1)]
Read this article about generating random numbers in R.
http://blog.revolutionanalytics.com/2009/02/how-to-choose-a-random-number-in-r.html
You can use sample in this case
sample(a, 1)
Second attribute is showing that you want to get only one random number.
To generate number between some range runif function is useful.
An alternative is to select an item from the vector using runif. i.e
a <- c(1,2,0,7,5)
a[runif(1,1,6)]
Lets say you want a function that picks one each time it is run (useful in a simulation for example). So
a <- c(1,2,0,7,5)
sample_fun_a <- function() sample(a, 1)
runif_fun_a <- function() a[runif(1,1,6)]
microbenchmark::microbenchmark(sample_fun_a(),
runif_fun_a(),
times = 100000L)
Unit: nanoseconds
sample_fun_a() - 4665
runif_fun_a() - 1400
runif seems to be quicker in this example.

Resources