This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
The experiment involves rolling a fair die and getting x say, then tossing a fair coin x number of times and recording the number of tails. I need to do this experiment 50 times and record the outcomes in a vector, (which I'll then use to plot a histogram.)
This is my code so far:
for (i in 1:100)
{X <- sample(6,1,replace=TRUE,c(1,1,1,1,1,1)/6)
Y <- sample(2,1,replace=TRUE,c(1,1)/2)}
Youtcomes <- c(sum(Y))
Youtcomes
But instead of giving me a vector with 100 elements, I keep getting just a single number. Where am I going wrong?
Note: I have to use a for loop.
Use the fact that R is vectorized. You can then use a binomial distribution to replicate the coin toss.
heads <- rbinom(size = sample(6,100, replace = TRUE), n=100, prob = 0.5)
sum(heads)
Perhaps I have missed something, but what is wrong with one call to sample() to do the 100 rolls of the dice, and then plug that into rbinom() to do the coin tosses? We pass the output from sample() to the size argument
> set.seed(1)
> rbinom(100, size = sample(6, 100, replace = TRUE), prob = 0.5)
[1] 1 1 1 6 1 2 2 2 3 1 2 1 2 1 1 0 3 1 1 3 6 1 2 0 2 1 1 1 2 2 2 1 0 1 4 3 3
[38] 1 5 2 3 2 2 1 3 2 0 2 1 4 2 3 1 1 1 0 1 1 1 1 2 2 1 2 3 1 0 2 1 2 2 4 2 1
[75] 1 5 3 2 3 5 1 2 3 1 4 0 3 1 2 1 1 0 1 5 2 3 0 2 2 3
Discalimer: (very inefficient solution see mnel/Gavin's solution)
As you can read the many, many, .. MANY comments underneath each of the answers, while this answer attempts to answer OP's specific question(however inefficient his requirements maybe), in the spirit of maintaining decorum of the forum, some have (rightly) pointed out that the question is in bad taste and my answer doesn't do justice to the forum requirements. I accept all criticism and leave the answer here only for obvious reasons (marked as answer, continuity). I suggest you look at mnel/Gavin's answer for a vectorised solution to this specific problem. If you're interested in looking at an implementation of for-loop, then refer to the bottom of this post, but I suggest you look at it to know the structure of for-loop, but not implement a for-loop to this specific problem. Thank you.
Your code is riddled with quite a few problems, apart from the main problem #Joshua already mentioned:
First, you rewrite every time the values of X and Y inside the loop so, at the end of the loop, there is only the last value of Y that is being summed up.
Second, your code for Y is not correct. You say, you have to get x amount of coin tosses, Yet, you use sample(2, 1, ...). The 1 must be replaced with X which equals the number from the die roll.
Try out this code instead:
Youtcomes <- sapply(1:100, function(x) {
X <- sample(1:6, 1, replace=TRUE, rep(1,6)/6)
Y <- sample(c("H", "T"), X, replace=TRUE, rep(1,2)/2)
sum(Y == "T")
})
Here, we loop over 100 times, and each time, sample values between 1 and 6 and store in X. Then, we sample either head (H) or tail (T) X number of times and store in Y.
Now, sum(Y == "T") gives the sum for current value of x (1 <= x <= 100). So, at the end, Youtcomes will be your set of simulated Y == Tail values.
Then, you can do a hist(Youtcomes).
Edit: If its a for-loop solution that's desired then,
# always assign the variable you'll index inside for-loop
# else the object will keep growing every time and a copy of
# entire object is made for every i, which makes it extremely
# slow/inefficient.
Youtcomes <- rep(0, 100)
for (i in 1:100) {
X <- sample(1:6, 1, replace=TRUE, rep(1,6)/6)
Y <- sample(c("H", "T"), X, replace=TRUE, rep(1,2)/2)
# assign output inside the loop with [i] indexing
Youtcomes[i] <- sum(Y == "T")
# since Youtcomes is assigned a 100 values of 0's before
# the values will replace 0' at each i. Thus the object
# is not copied every time. This is faster/efficient.
}
Arun beat me to it. But another of the many many ways could be (if I understand your desired outcome correctly..
X <- sample(6,100,replace=TRUE,c(1,1,1,1,1,1)/6)
Y <- lapply(X , function(x){ res <- sample( c( "H" , "T" ) , x , replace=TRUE , c(1,1)/2 ) ; table( res ) } )
You want to histogram the results....
res <- unlist(Y)
hist( res[names( res )=="T"] )
Related
100 people are watching a theater.At the end of the show all of them are visiting the vesting room in order to take their coats.The man working on the vesting room give back people's coat totally at random.The participants that they will pick the right coat leave.The other that have picked the wrong one, give back the coat and the man again randomly gives back the coat.The process ends when all the customers of the theater take back their right coat.
I want to simulate in R this martingale process in order to find the expected time that this process will end.
But I don't know how .Any help ?
Something like:
# 100 customers
x = seq(1,100,by=1);x
# random sample from x
y = sample(x,100,replace=FALSE)
x==y
# for the next iteration exclude those how are TRUE and run it again until everyone is TRUE
The expected time is how many iterations where needed .
Or something like this :
n = 100
X = seq(1,100,by=1)
martingale = rep(NA,n)
iterations = 0
accept = 0
while (X != n) {
iterations = iterations + 1
y = sample(1:100,100,replace=FALSE)
if (X = y){
accept = accept + 1
X = X+1
martingale [X] = y
}
}
accept
iterations
One way to do this is as follows (using 10 people as an example, the print statement is unnecessary, just to show what's done in each iteration):
set.seed(0)
x <- 1:10
count <- 0
while(length(x) > 0){
x <- x[x != sample(x)]
print(x)
count <- count + 1
}
# [1] 1 2 3 4 5 6 7 9 10
# [1] 3 4 5 6 7 9
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 6
#
count
# [1] 10
For each step in the loop, it removes the values of x where the customers have been randomly allocated their coat, until there are none left.
To use this code to get the expected time taken for 100 people, you could extend it to:
set.seed(0)
nits <- 1000 #simulate the problem 1000 times
count <- 0
for (i in 1:nits){
x <- 1:100
while(length(x) > 0){
x <- x[x != sample(x)]
count <- count + 1/nits
}
}
count
# [1] 99.901
I hypothesise without proof that the expected time for n people is n iterations - it seems pretty close when I tried with 50, 100 or 200 people.
I didn't follow your discussion above and I'm not entirely sure if that's what you want, but my rationale was as follows:
You have N people and queue them.
In the first round the first person has a chance of 1/N to get their clothes right.
At this point you have two options. Eitehr person 1 gets their clothes right or not.
If person 1 gets their clothes right, then person 2 has a chance of 1/(N-1) to get their clothes right. If person 1 didn't get the correct clothes, person 1 remains in the pool (at the end), and person 2 also has a 1/N probability to get their clothes right.
You continue to assign thes probabilities until all N persons have seen the clerk once. Then you sort out those who have the right clothes and repeat at step 1 until everyone has their clothes right.
For simulation purposes, you'd of course repeat the whole thing 1000 or 10000 times.
If I understand you correctly, you are interstes in the number of iterations, i.e. how often does the clerk have to go through the whole queue (or what remains of it) until everyone has their clothes.
library(tidyverse)
people <- 100
results <- data.frame(people = 1:people,
iterations = NA)
counter <- 0
finished <- 0
while (finished < people)
{
loop_people <- results %>%
filter(is.na(iterations)) %>%
pull(people)
loop_prob <- 1/length(loop_people)
loop_correct <- 0
for (i in 1:length(loop_people))
{
correct_clothes_i <- sample(c(0,1), size = 1, prob = c(1-loop_prob, loop_prob))
if (correct_clothes_i == 1)
{
results[loop_people[i], 2] <- counter + 1
loop_correct <- loop_correct + 1
loop_prob <- 1/(length(loop_people) - loop_correct)
}
}
counter <- counter + 1
finished <- length(which(!is.na(results$iterations)))
}
max(results$iterations)
[1] 86
head(results)
people iterations
1 1 7
2 2 42
3 3 86
4 4 67
5 5 2
6 6 9
The results$iterations column contains the iteration number where each person has gotten their clothes right, thus max(results$iterations) gives you the total number of loops.
I have no proof, but empirically and intuitively the number of required iterations should approach N.
I'd like to use uniform distribution to randomly assign value 1 or 2 for five groups(generate 5 random uniform distribution), with each group containing 10 samples.
I try to write:
for(i in 1:5){
rf <- runif(10)
result[rf<=0.5]=1
result[rf>0.5]=2
}
However this will replace the previously assigned values when the loop goes on.
The code produces only 10 results:
1 2 1 2 2 1 1 1 2 1
But I want a total of 50 randomized values:
1 2 1 2 ...... 2 1 1
How to do this? Thank you
Since, you are working on random number generated from same distribution every time, you can better generate 50 numbers in once, and assign value using ifelse function.
Try this:
a <- ifelse(runif(50) <= 0.5, 1, 2)
dim(a) <- c(10,5) #if result in matrix
To add to Gregor Thomas' advice, sample... You can also covert the stream into a matrix of 5 columns (groups) of 10.
nums <- sample(1:2, 50, replace = TRUE)
groups <- matrix(nums, ncol = 5)
I would like to use the vector:
time.int<-c(1,2,3,4,5) #vector to be use as a "guide"
and the database:
time<-c(1,1,1,1,5,5,5)
value<-c("s","s","s","t","d","d","d")
dat1<- as.data.frame(cbind(time,value))
to create the following vector, which I can then add to the first vector "time.int" into a second database.
freq<-c(4,0,0,0,3) #wished result
This vector is the sum of the events that belong to each time interval, there are four 1 in "time" so the first value gets a four and so on.
Potentially I would like to generalize it so that I can decide the interval, for example saying sum in a new vector the events in "times" each 3 numbers of time.int.
EDIT for generalization
time.int<-c(1,2,3,4,5,6)
time<-c(1,1,1,2,5,5,5,6)
value<-c("s","s","s","t", "t","d","d","d")
dat1<- data.frame(time,value)
let's say I want it every 2 seconds (every 2 time.int)
freq<-c(4,0,4) #wished result
or every 3
freq<-c(4,4) #wished result
I know how to do that in excel, with a pivot table.
sorry if a duplicate I could not find a fitting question on this website, I do not even know how to ask this and where to start.
The following will produce vector freq.
freq <- sapply(time.int, function(x) sum(x == time))
freq
[1] 4 0 0 0 3
BTW, don't use the construct as.data.frame(cbind(.)). Use instead
dat1 <- data.frame(time,value))
In order to generalize the code above to segments of time.int of any length, I believe the following function will do it. Note that since you've changed the data the output for n == 1 is not the same as above.
fun <- function(x, y, n){
inx <- lapply(seq_len(length(x) %/% n), function(m) seq_len(n) + n*(m - 1))
sapply(inx, function(i) sum(y %in% x[i]))
}
freq1 <- fun(time.int, time, 1)
freq1
[1] 3 1 0 0 3 1
freq2 <- fun(time.int, time, 2)
freq2
[1] 4 0 4
freq3 <- fun(time.int, time, 3)
freq3
[1] 4 4
We can use the table function to count the event number and use merge to create a data frame summarizing the information. event_dat is the final output.
# Create example data
time.int <- c(1,2,3,4,5)
time <- c(1,1,1,1,5,5,5)
# Count the event using table and convert to a data frame
event <- as.data.frame(table(time))
# Convert the time.int to a data frame
time_dat <- data.frame(time = time.int)
# Merge the data
event_dat <- merge(time_dat, event, by = "time", all = TRUE)
# Replace NA with 0
event_dat[is.na(event_dat)] <- 0
# See the result
event_dat
time Freq
1 1 4
2 2 0
3 3 0
4 4 0
5 5 3
x <- 1
y <- 1
for (y in 1:2){
for (x in 1:2){
z <- x+y
zresults <- data.frame(x, y, z)
}
}
Hello together,
sorry for my dump question, but I am new to R and this is actually my first attempt to code a little bit.
I created a for-loop with the indizes x and y and I want to save the output values (z) together with the corresponding x and y values in a data.frame.
The code posted it is obviously wrong but I'm not getting it.
The data.frame should look like that:
x y z
1 1 1 2
2 2 1 3
3 1 2 3
4 2 2 4
Thank you guys a lot in advance!
Greetings from Germany
Here's one way to do what you want to do:
zresults <- expand.grid(x=1:2,y=1:2);
zresults$z <- zresults$x + zresults$y;
zresults;
## x y z
## 1 1 1 2
## 2 2 1 3
## 3 1 2 3
## 4 2 2 4
Notes on your attempt:
The initial assignments to x and y are not necessary. The values are overwritten on the first iteration of each respective loop with the first value of the RHS vector (1 in each case). Also worth noting is that, unlike languages like C/C++ and Java, in R you don't have to declare variables; any variable name can be assigned a value at any time.
In your inner loop you're assigning zresults. After the first iteration, you are overwriting the previous value that existed for zresults. If you want to "build up" a data.frame one row at a time, you can use the following solutions, although note that performance will not be ideal with these approaches:
zresults[nrow(zresults)+1L,] <- c(x,y,z);
or
zresults <- rbind(zresults,c(x,y,z));
Also note that zresults would have to be initialized first, prior to the build-up loop; for example:
zresults <- data.frame(x=integer(),y=integer(),z=integer());
In general, try to avoid for-loops in R. Instead, vectorization is preferred. There are many good sources on this; for example, see http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html and http://alyssafrazee.com/vectorization.html.
Here is another solution
x = 1
y = 1
result = NULL
for (y in 1:2) {
for (x in 1:2) {
z = x + y
if (is.null(result)) {
result = data.frame(x,y,z)
} else {
result = rbind(result, data.frame(x,y,z))
}
}
}
result
I have Valence Category for word stimuli in my psychology experiment.
1 = Negative, 2 = Neutral, 3 = Positive
I need to sort the thousands of stimuli with a pseudo-randomised condition.
Val_Category cannot have more than 2 of the same valence stimuli in a row i.e. no more than 2x negative stimuli in a row.
for example - 2, 2, 2 = not acceptable
2, 2, 1 = ok
I can't sequence the data i.e. decide the whole experiment will be 1,3,2,3,1,3,2,3,2,2,1 because I'm not allowed to have a pattern.
I tried various packages like dylpr, sample, order, sort and nothing so far solves the problem.
I think there's a thousand ways to do this, none of which are probably very pretty. I wrote a small function that takes care of the ordering. It's a bit hacky, but it appeared to work for what I tried.
To explain what I did, the function works as follows:
Take the vector of valences and samples from it.
If sequences are found that are larger than the desired length, then, (for each such sequence), take the last value of that sequence at places it "somewhere else".
Check if the problem is solved. If so, return the reordered vector. If not, then go back to 2.
# some vector of valences
val <- rep(1:3,each=50)
pseudoRandomize <- function(x, n){
# take an initial sample
out <- sample(val)
# check if the sample is "bad" (containing sequences longer than n)
bad.seq <- any(rle(out)$lengths > n)
# length of the whole sample
l0 <- length(out)
while(bad.seq){
# get lengths of all subsequences
l1 <- rle(out)$lengths
# find the bad ones
ind <- l1 > n
# take the last value of each bad sequence, and...
for(i in cumsum(l1)[ind]){
# take it out of the original sample
tmp <- out[-i]
# pick new position at random
pos <- sample(2:(l0-2),1)
# put the value back into the sample at the new position
out <- c(tmp[1:(pos-1)],out[i],tmp[pos:(l0-1)])
}
# check if bad sequences (still) exist
# if TRUE, then 'while' continues; if FALSE, then it doesn't
bad.seq <- any(rle(out)$lengths > n)
}
# return the reordered sequence
out
}
Example:
The function may be used on a vector with or without names. If the vector was named, then these names will still be present on the pseudo-randomized vector.
# simple unnamed vector
val <- rep(1:3,each=5)
pseudoRandomize(val, 2)
# gives:
# [1] 1 3 2 1 2 3 3 2 1 2 1 3 3 1 2
# when names assigned to the vector
names(val) <- 1:length(val)
pseudoRandomize(val, 2)
# gives (first row shows the names):
# 1 13 9 7 3 11 15 8 10 5 12 14 6 4 2
# 1 3 2 2 1 3 3 2 2 1 3 3 2 1 1
This property can be used for randomizing a whole data frame. To achieve that, the "valence" vector is taken out of the data frame, and names are assigned to it either by row index (1:nrow(dat)) or by row names (rownames(dat)).
# reorder a data.frame using a named vector
dat <- data.frame(val=rep(1:3,each=5), stim=rep(letters[1:5],3))
val <- dat$val
names(val) <- 1:nrow(dat)
new.val <- pseudoRandomize(val, 2)
new.dat <- dat[as.integer(names(new.val)),]
# gives:
# val stim
# 5 1 e
# 2 1 b
# 9 2 d
# 6 2 a
# 3 1 c
# 15 3 e
# ...
I believe this loop will set the Valence Category's appropriately. I've called the valence categories treat.
#Generate example data
s1 = data.frame(id=c(1:10),treat=NA)
#Setting the first two rows
s1[1,"treat"] <- sample(1:3,1)
s1[2,"treat"] <- sample(1:3,1)
#Looping through the remainder of the rows
for (i in 3:length(s1$id))
{
s1[i,"treat"] <- sample(1:3,1)
#Check if the treat value is equal to the previous two values.
if (s1[i,"treat"]==s1[i-1,"treat"] & s1[i-1,"treat"]==s1[i-2,"treat"])
#If so draw one of the values not equal to that value
{
a = 1:3
remove <- s1[i,"treat"]
a=a[!a==remove]
s1[i,"treat"] <- sample(a,1)
}
}
This solution is not particularly elegant. There may be a much faster way to accomplish this by sorting several columns or something.