Probability using R - r

I have set up a pair of die rolling game with codes below but the final part of the code is not working that is to construct a code that can calculate the winning percentage by taking the average of the throws vector. The player wins if a double sixes come up.
throws <- NULL
for( i in 1:24 ){
Die1 <-sample(c(1,2,3,4,5,6),size=1,replace = TRUE,prob = NULL)
Die2 <-sample(c(1,2,3,4,5,6),size=1,replace = TRUE,prob = NULL)
throw <- Die1+Die2
throws[i] <- throw
}
throws
game <- any( throws == 12 )
game
for (i in 24){
if (Die1==6 & Die2==6){
throws/2 * 100}

R is vectorized and your problem can be solved with vectorized instructions only.
To repeat the same code a certain number of times, use replicate, not a loop (neither for nor *apply loops);
now that throws are stored in a matrix, use the optimized colSums, it's coded in C, it's simple and it's fast;
a proportion can be computed as the mean value of a logical vector.
set.seed(2023) # make the results reproducible
nthrows <- 24L
throws <- replicate(nthrows, sample(6L, 2L, replace = TRUE))
throws
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] 5 3 4 1 5 1 5 3 4 1 6 6 5 2
#> [2,] 1 2 2 1 1 5 2 5 5 1 2 6 1 6
#> [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
#> [1,] 6 5 6 6 6 2 3 6 5 2
#> [2,] 1 4 1 4 6 2 4 1 6 4
game <- any(colSums(throws) == 12L)
game
#> [1] TRUE
mean(colSums(throws) == 12L) * 100
#> [1] 8.333333
Created on 2023-02-16 with reprex v2.0.2
Edit
The following code gives the matrix throws row and column names. The rest of the code in the original answer remains the same.
set.seed(2023) # make the results reproducible
nthrows <- 24L
throws <- replicate(nthrows, sample(6L, 2L, replace = TRUE))
dimnames(throws) <- list(c("Die1", "Die2"),
paste0("throw", seq.int(nthrows)))
throws
#> throw1 throw2 throw3 throw4 throw5 throw6 throw7 throw8 throw9 throw10
#> Die1 5 3 4 1 5 1 5 3 4 1
#> Die2 1 2 2 1 1 5 2 5 5 1
#> throw11 throw12 throw13 throw14 throw15 throw16 throw17 throw18 throw19
#> Die1 6 6 5 2 6 5 6 6 6
#> Die2 2 6 1 6 1 4 1 4 6
#> throw20 throw21 throw22 throw23 throw24
#> Die1 2 3 6 5 2
#> Die2 2 4 1 6 4
Created on 2023-02-17 with reprex v2.0.2

Related

Wrong number of data frames when identifying all possible combinations for a certain variable

I have the following data frame
Name <- c("Jhon", "Lee", "Suzan", "Abhinav",
"Brain", "Ron","Cat","Mike","Bob","Sue","Carl")
Vote <- rep(letters[1:21],each=10, len=230)
z <- as.data.frame (cbind(Name, Vote))
I want to create a list of data frames which represent all possible combinations of 6 names with their respective votes (out of 11 that I have) and which include as well the 5 other names appended.
The following gives me all possible combinations of 6 names, which is 462
comb<-combn(unique(as.character(z$Name)), 6)
comb has 462 columns, so it is the correct output.
The following code creates the list of all data frames across the combinations.
combdf <- apply(comb, 2, function(vec) z[ z$Name %in% vec, ] )
The following code should create the output that I want
output <- z %>%
pull(Name) %>%
unique %>%
combn(., 3, FUN = function(vec)
z %>%
filter(Name %in% vec) %>%
bind_rows(z %>%
filter(!Name %in% vec) %>%
rename(Name2 = Name, Vote2 = Vote)) %>%
mutate(across(c(Name2, Vote2),
~ .[order(is.na(.))])), simplify = FALSE)
My problem is that output has 165 data frames and I expect 462. In addition, each data frame in combdf , if I am not wrong, should have 230 rows (as does my original data frame - z). However, this is not the case. For ex., number one has 226, number four has 229, number 18 has 228 (checked at random).
The m in combn determines the number of combinations as well
>combn(1:6, 2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5
[2,] 2 3 4 5 6 3 4 5 6 4 5 6 5 6 6
> combn(1:6, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 4
[2,] 2 2 2 2 3 3 3 4 4 5 3 3 3 4 4 5 4 4 5 5
[3,] 3 4 5 6 4 5 6 5 6 6 4 5 6 5 6 6 5 6 6 6
Note the number of column difference. Similarly, in the OP's post, the 'combdf' was created with m = 6, while in the tidyverse code, the m used is 3. Thus is makes a difference from 462 to 165

How to count the number of columns with certain elements in it?

I have a matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 4
[2,] 2 2 2 2 3 3 3 4 4 5 3 3 3 4 4 5 4 4 5 5
[3,] 3 4 5 6 4 5 6 5 6 6 4 5 6 5 6 6 5 6 6 6
I need to count the number of columns with 2 and 3 in it. Eyeballing it, it would be columns 1,11,12,13. That is, 4.
For matrices, this one also works (assume mat is your matrix):
Old (this will not work if any column has two or more 2s or 3s)
sum(colSums(mat == 2L | mat == 3L) > 1L)
Revised (this one should work)
sum(colSums(mat == 2L) > 0L & colSums(mat == 3L) > 0L)
A base R option
sum(sapply(as.data.frame(M),function(v) all(c(2,3)%in%v)))
or a method by Allan Cameron
sum(apply(M, 2, function(v) all(2:3 %in% v)))
where M is the matrix

find max value's column index and row index individually in R

I find people use
which(matrix==max(matrix, na.rm=FALSE))
to show both row and column index.
But my question is how do I extract row index and column index individually and then return these two values into another parameters?
like matrix=
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 6 7 7 2 4 3 7 1 4
[2,] 1 9 8 7 2 6 10 9 5 2
[3,] 7 10 8 4 10 5 4 8 4 4
[4,] 4 3 1 1 3 3 9 7 4 2
[5,] 1 8 1 9 9 8 1 3 7 7
[6,] 2 6 7 5 6 10 4 6 15 1
the max value is matrix[6,9]=15 how could I find row =6 and column = 9 separately and return 6 to a parameter:A, 9 to parameter:B
Thank you guys very much.
For a large matrix which.max should be more efficient than which. So, for a matrix m, we can use
A = row(m)[d <- which.max(m)]
B = col(m)[d]
Maybe a roundabout way but if the matrix is called "mat":
colmax <- {which(mat == max(mat)) %/% nrow(mat)} + 1
rowmax <- which(mat == max(mat)) %% nrow(mat)

List of all combinations of a minimum value using combn

Here is my data:
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 2 3 5
[3,] 2 3 6
[4,] 2 4 5
[5,] 2 4 6
[6,] 2 4 2
[7,] 2 4 4
[8,] 2 4 9
[9,] 2 4 10
[10,] 2 4 3
How would I find all combinations of column 3 that are greater than 25? I am struggling how to use the combn function as the help function doesn't seem too intuitive.
If you want a non-loop version:
x <- read.table(text="2 3 4
2 3 5
2 3 6
2 4 5
2 4 6
2 4 2
2 4 4
2 4 9
2 4 10
2 4 3",stringsAsFactors=FALSE, header=FALSE)
res <- Map(combn, list(x[,3]), seq_along(x[,3]), simplify = FALSE)
unlist(res, recursive = FALSE)[lapply(unlist(res, recursive = FALSE),sum)>=25]
[[1]]
[1] 6 9 10
[[2]]
[1] 6 9 10
[[3]]
[1] 4 5 6 10
...
[[613]]
[1] 4 6 5 6 2 4 9 10 3
[[614]]
[1] 5 6 5 6 2 4 9 10 3
[[615]]
[1] 4 5 6 5 6 2 4 9 10 3
EDIT
To return rownames instead of the number vector:
rownames(x) <- paste0("row",1:10)
res <- list(Map(combn, list(x[,3]), seq_along(x[,3]), simplify = FALSE),
Map(combn, list(rownames(x)), seq_along(rownames(x)), simplify = FALSE))
unlist(res[[2]], recursive = FALSE)[lapply(unlist(res[[1]], recursive = FALSE),sum)>=25]
[[1]]
[1] "row3" "row8" "row9"
[[2]]
[1] "row5" "row8" "row9"
[[3]]
[1] "row1" "row2" "row3" "row9"
...
[[613]]
[1] "row1" "row3" "row4" "row5" "row6" "row7" "row8" "row9" "row10"
[[614]]
[1] "row2" "row3" "row4" "row5" "row6" "row7" "row8" "row9" "row10"
[[615]]
[1] "row1" "row2" "row3" "row4" "row5" "row6" "row7" "row8" "row9" "row10"
EDIT2 To get the elements of the list that match the minimum sum, in this case 25. This gives you the 42 combinations that sum to 25.
res <- Map(combn, list(x[,3]), seq_along(x[,3]), simplify = FALSE)
res3 <- unlist(res, recursive = FALSE)[lapply(unlist(res, recursive = FALSE),sum)>=25]
res3[which(rapply(res3,sum)==min(rapply(res3,sum)))]
To get the corresponding rownames as asked before:
rownames(x) <- paste0("row",1:10)
res4 <- list(Map(combn, list(x[,3]), seq_along(x[,3]), simplify = FALSE),
Map(combn, list(rownames(x)), seq_along(rownames(x)), simplify = FALSE))
unlist(res4[[2]], recursive = FALSE)[lapply(unlist(res4[[1]], recursive = FALSE),sum)>=25][which(rapply(res3,sum)==min(rapply(res3,sum)))]
The following should work for a fixed length; for all combinations with variable length one would need something more advanced (EDIT: see #PLapointe's post (which should be the accepted answer) or just a simple loop):
x <- c(4, 5, 6, 5, 6, 2, 4, 9, 10, 3)
res <- combn(x, 3)
This will return a matrix that looks like this (I only show the first entries):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
[1,] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[2,] 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 5 5 5 5 5 5 6 6
[3,] 6 5 6 2 4 9 10 3 5 6 2 4 9 10 3 6 2 4 9 10 3 2 4
From there, you can then just select the combinations where the column sum is larger than your threshold:
res[, colSums(res) >= 25]
This will then give
[,1] [,2]
[1,] 6 6
[2,] 9 9
[3,] 10 10
As you now have duplicate entries (not sure if they are desired or not), you can simply do the following (or a simple loop):
res2 <- combn(unique(x), 3)
res2[, colSums(res2) >= 25]
which would then return
[1] 6 9 10

How to calculate the number of triplets in the rows of matrix in r?

I have this matrix:
m
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 3 1 6 8 8 8
[2,] 2 2 5 7 9 7 4
[3,] 1 2 3 4 5 6 7
[4,] 1 2 3 4 5 6 7
and I want to calculate the number of triples in each column.
So I expect a vector such as: [1,0,0,0] as the result since only the first row contains three adjacent identical matrices.
Is there any function in R to accomplish this, and that doesn't involve writing a long function?
OK, I am risking here, but, reflecting the comments, and also because it doesn't make much sense to split the question in two (debatable), let me ask what I am really after: Detecting 4 triplets (or the absence thereof) in each row of a matrix such as:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,] 0 1 2 3 8 4 4 5 6 7 7 7 8 8 8 9 9 9
[2,] 0 1 2 0 2 3 3 3 4 5 5 5 6 7 7 7 8 9
[3,] 0 1 1 1 2 7 2 3 4 4 4 5 6 7 7 7 8 9
[4,] 0 1 1 1 2 3 4 9 4 5 5 5 6 6 6 7 8 9
[5,] 0 0 0 1 1 1 2 3 4 5 6 6 6 7 8 8 8 9
[6,] 0 1 2 3 4 5 5 5 6 6 6 7 8 8 8 9 9 9
[7,] 0 1 2 3 3 3 4 5 5 5 6 6 6 7 8 9 9 9
[8,] 0 1 2 3 4 5 5 5 6 6 6 7 7 7 8 9 9 9
We can use data.table
library(data.table)
apply(m, 1, function(x) any(tabulate(rleid(x))==3))
#[1] TRUE FALSE FALSE FALSE
If we need to find whether there are 4 triplets in a row (based on the new dataset)
apply(m1, 1, function(x) sum(tabulate(rleid(x))==3))==4
#[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
data
library(psych)
m <- `dimnames<-`(as.matrix(read.clipboard()), NULL)
m1 <- `dimnames<-`(as.matrix(read.clipboard()), NULL)
NOTE: The datasets were read after copying each of the data output showed in the OP's post and using read.clipboard from psych.
One solution is to use the lag operator from dplyr package as follows:
apply(m, 1, function(x) any((x == lag(x)) & (x == lag(x, 2))))
A more general sequence of numbers perhaps can be calculated as follows:
apply(m, 1, function(x) all(diff(which(diff(x) == 0)) == 1) & (length(which(diff(x) == 0)) == 2))
Where that last 2 is the (n - 1) where n = 3 in this case. You can also optimize it some by not computing that which(diff(x....) part twice.
Output for your example is:
[1] TRUE FALSE FALSE FALSE
Seems like there is this function in base called rle that computes run lengths of each value in a vector. You can use it as follows:
apply(m, 1, function(x) any(rle(x)$lengths == 3))
Giving you the same output:
[1] TRUE FALSE FALSE FALSE

Resources