Creating a Basic R Dice Rolling Function to Sum Dice Values - r

I'm trying to write a function that combines up to 4 (fair 6 sided) dice rolls to create a specific value (named 'target.mountain') as many times as possible given the numbers shown on the dice.
Then return these values along with any that aren't used in said combination. If the other numbers that aren't used to form the 'target.mountain' can sum to be within the range (5-10) then do so.
So as an example say I roll 4,3,2,5 and my target.mountain value is 9
I would do
4 + 5 -> 9 and as 2 + 3 = 5 my function would return 9, 5
Another example could be
Roll = (2,3,6,4) --> (6 + 3), (4 + 2) --> 9, 6
Once these values have been found then list so it appears like
[1] 9, 5 (example 1)
[1] 9, 6 (example 2)
How do I go about doing this?
If you have ever played the board game 'Mountain Goats' then that may shed some light on how I need the dice to work as I just cannot figure it out!

Let's make the problem a bit harder, say 5 dice.
library(tidyverse)
rolls <- sample(1:6,replace = TRUE, size = 5)
target.mountain <- 7
#Make all possible combinations of the dice:
map_dfr(seq_along(rolls),~ combn(seq_along(rolls),.x,simplify = FALSE) %>%
map(~tibble(dice = list(.), sum = sum(rolls[.]), rolls = list(rolls[.]),length = length(.)))) %>%
#filter to only those combinations which equal the target
filter(sum == target.mountain) %>%
#Now make all possible combinations of the sets that equal the target
{map2(.x = list(.), .y = nrow(.) %>% map(.x = seq(.), .f = combn,x=.,simplify = FALSE) %>% unlist(recursive = FALSE),
~.x[unlist(.y),])} %>%
#Subset to non-overlapping sets
subset(map_lgl(.,~length(reduce(.x$dice,union))==length(unlist(.x$dice)))) -> part1
map(part1, as.data.frame)
#[[1]]
# dice sum rolls length
#1 1, 3 7 3, 4 2
#
#[[2]]
# dice sum rolls length
#1 4, 5 7 6, 1 2
#
#[[3]]
# dice sum rolls length
#1 2, 3, 5 7 2, 4, 1 3
#
#[[4]]
# dice sum rolls length
#1 1, 3 7 3, 4 2
#2 4, 5 7 6, 1 2
From here you can apply whatever rules you want:
part1 %>%
#subset to the largest number of sets
subset(map_dbl(.,nrow) == max(map_dbl(.,nrow))) %>%
#subset to the fewest number of total dice
subset(map_dbl(.,~sum(.x$length)) == min(map_dbl(.,~sum(.x$length)))) %>%
#if there are still ties, pick the first
`[[`(1) -> part2
as.data.frame(part2)
# dice sum rolls length
#1 1, 3 7 3, 4 2
#2 4, 5 7 6, 1 2

possible solution to the problem
target.mountain = 9
dice <- c(4,3,2,5)
library(tidyverse)
fn <- function(target.mountain, dice){
fltr <- map(seq_along(dice), ~combn(dice, .x, sum) == target.mountain)
out <- map(seq_along(dice), ~combn(dice, .x))
sum_target <- map2(out, fltr, ~.x[, .y]) %>%
purrr::discard(.x = ., function(x) length(x) == 0) %>%
keep(.x = ., .p = function(x) length(x) == min(lengths(.))) %>%
flatten_dbl()
no_sum_target <- dice[!(dice %in% sum_target)]
result <- toString(c(sum(sum_target), no_sum_target))
return(result)
}
fn(target.mountain = target.mountain, dice = dice)
#> [1] "9, 3, 2"
Created on 2021-03-29 by the reprex package (v1.0.0)

Related

R: pass multiple arguments to accumulate/reduce

This is related to R: use the newly generated data in the previous row
I realized the actual problem I was faced with is a bit more complicated than the example I gave in the thread above - it seems I have to pass 3 arguments to the recursive calculation to achieve what I want. Thus, accumulate2 or reduce may not work. So I open a new question here to avoid possible confusion.
I have the following dataset grouped by ID:
ID <- c(1, 2, 2, 3, 3, 3)
pw <- c(1:6)
add <- c(1, 2, 3, 5, 7, 8)
x <- c(1, 2, NA, 4, NA, NA)
df <- data.frame(ID, pw, add, x)
df
ID pw add x
1 1 1 1 1
2 2 2 2 2
3 2 3 3 NA
4 3 4 5 4
5 3 5 7 NA
6 3 6 8 NA
Within each group for column x, I want to keep the value of the first row as it is, while fill in the remaining rows with lagged values raised to the power stored in pw, and add to the exponent the value in add. I want to update the lagged values as I proceed. So I would like to have:
ID pw add x
1 1 1 1 1
2 2 2 2 2
3 2 3 3 2^3 + 3
4 3 4 5 4
5 3 5 7 4^5 + 7
6 3 6 8 (4^5 + 7)^6 + 8
I have to apply this calculation to a large dataset, so it would be perfect if there is a fast way to do this!
If we want to use accumulate2, then specify the arguments correctly i.e. it takes two input arguments as 'pw' and 'add' and an initialization argument which would be the first value of 'x'. As it is a grouped by 'ID', do the grouping before we do the accumulate2, extract the lambda default arguments ..1, ..2 and ..3 respectively in that order and create the recursive function based on this
library(dplyr)
library(purrr)
out <- df %>%
group_by(ID) %>%
mutate(x1 = accumulate2(pw[-1], add[-1], ~ ..1^..2 + ..3,
.init = first(x)) %>%
flatten_dbl ) %>%
ungroup
out$x1
#[1] 1 2 11
#[4] 4 1031 1201024845477409792
With more than 3 arguments, a for loop would be better
# // initialize an empty vector
out <- c()
# // loop over the `unique` ID
for(id in unique(df$ID)) {
# // create a temporary subset of data based on that id
tmp_df <- subset(df, ID == id)
# // initialize a temporary storage output
tmp_out <- numeric(nrow(tmp_df))
# // initialize first value with the first element of x
tmp_out[1] <- tmp_df$x[1]
# // if the number of rows is greater than 1
if(nrow(tmp_df) > 1) {
// loop over the rows
for(i in 2:nrow(tmp_df)) {
#// do the recursive calculation and update
tmp_out[i] <- tmp_out[i - 1]^ tmp_df$pw[i] + tmp_df$add[i]
}
}
out <- c(out, tmp_out)
}
out
#[1] 1 2 11
#[4] 4 1031 1201024845477409792
In base R we could use the following solution for more than two arguments.
In this solution I first subset the original data set on ID values
Then I chose row id values through seq_len(nrow(tmp))[-1] omitting the first row id since it was provided by init
In anonymous function I used in Reduce, b argument represents accumulated/ previous value starting from init and c represents new/current values of our vector which is row numbers
So in every iteration our previous value (starting from init) will be raised to the power of new value from pw and will be summed by new value from add
cbind(df[-length(df)], unlist(lapply(unique(df$ID), function(a) {
tmp <- subset(df, df$ID == a)
Reduce(function(b, c) {
b ^ tmp$pw[c] + tmp$add[c]
}, init = tmp$x[1],
seq_len(nrow(tmp))[-1], accumulate = TRUE)
}))) |> setNames(c(names(df)))
ID pw add x
1 1 1 1 1.000000e+00
2 2 2 2 2.000000e+00
3 2 3 3 1.100000e+01
4 3 4 5 4.000000e+00
5 3 5 7 1.031000e+03
6 3 6 8 1.201025e+18
Data
structure(list(ID = c(1, 2, 2, 3, 3, 3), pw = 1:6, add = c(1,
2, 3, 5, 7, 8), x = c(1, 2, NA, 4, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
Base R, not using Reduce() but rather a while() Loop:
# Split-apply-combine while loop: res => data.frame
res <- do.call(rbind, lapply(with(df, split(df, ID)), function(y){
# While there are any NAs in x:
while(any(is.na(y$x))){
# Store the index of the first NA value: idx => integer scalar
idx <- with(y, head(which(is.na(x)), 1))
# Calculate x at that index using the business rule provided:
# x => numeric vector
y$x[idx] <- with(y, x[(idx-1)] ** pw[idx] + add[idx])
}
# Explicitly define the return object: y => GlobalEnv
y
}
)
)
OR recursive function:
# Recursive function: estimation_func => function()
estimation_func <- function(value_vec, exponent_vec, add_vec){
# Specify the termination condition; when all elements
# of value_vec are no longer NA:
if(all(!(is.na(value_vec)))){
# Return value_vec: numeric vector => GlobalEnv
return(value_vec)
# Otherwise recursively apply the below:
}else{
# Store the index of the first na value: idx => integer vector
idx <- Position(is.na, value_vec)
# Calculate the value of the value_vec at that index;
# using the provided business logic: value_vec => numeric vector
value_vec[idx] <- (value_vec[(idx-1)] ** exponent_vec[idx]) + add_vec[idx]
# Recursively apply function: function => Local Env
return(estimation_func(value_vec, exponent_vec, add_vec))
}
}
# Split data.frame into a list on ID;
# Overwrite x values, applying recursive function;
# Combine list into a data.frame
# res => data.frame
res <- data.frame(
do.call(
rbind,
Map(function(y){y$x <- estimation_func(y$x, y$pw, y$add); y}, split(df, df$ID))
), row.names = NULL
)

R Looping through two vectors

Good day,
I need a function that creates increasing ID's for two parameters. I came up with this function which works fine, but I want it to be vectorized and I cannot seem to avoid a Big O factor of N². Are there any 'better' ways to do this?
Standard function:
threshold <- 3
calculateID <- function(p, r) {
return((p-1) * threshold + r)
}
calculateID(1, 1) #returns 1
calculateID(1, 2) #returns 2
calculateID(1, 3) #returns 3
calculateID(2, 1) #returns 4
#.....
calculateID(5, 3) #returns 15
Vectorized function, I would like to give the two parameters as vectors so the function only has to be called once:
threshold <- 3
calculateIDVectorized <- function(p, r) {
return(unlist(
lapply(p, function(x) {
lapply(r, function(y) {
(x-1) * threshold + y
})
})
))
}
calculateIDVectorized(c(1, 2, 3, 4, 5), c(1, 2, 3)) # should return 1-15
To clarify: I want that every p and r argument is used so you should always get a result of length(p * r)
You can use outer:
calculateIDVectorized <- function(p, r) as.vector(t(outer(p, r, calculateID)))
calculateIDVectorized(c(1, 2, 3, 4, 5), c(1, 2, 3))
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Since the OP was interested in fast computation, I compared the solutions:
library(microbenchmark)
p <- c(1:500) # using larger data set
r <- c(1:20)
threshhold = length(r) # parameterizing threshold
m = microbenchmark(
tidy= crossing(p, r) %>%
rowwise %>%
transmute(out = calculateID(p, r)) %>%
pull(out),
dcv = do.call(Vectorize(calculateID),unname(rev(expand.grid(r,p)))),
numbering = rev(expand.grid(r,p)) %>%
arrange(Var2, Var1) %>%
transmute(out = row_number()) %>%
pull(out),
hybrid = rev(expand.grid(r,p)) %>%
rowwise() %>%
transmute(out = calculateID(Var2, Var1)) %>%
pull(out),
outer = as.vector(t(outer(p, r, calculateID))),
outer_c = c(t(outer(p, r, calculateID))),
david = rep((p - 1), each = length(r)) * threshold + r
)
m
# Unit: microseconds
# expr min lq mean median uq max neval
# tidy 45441.869 47370.776 52123.6770 49482.1970 54158.4285 116780.840 100
# dcv 16259.935 17156.225 19641.6731 17897.8885 21576.0865 55489.586 100
# numbering 5947.147 6379.337 7127.5125 6576.3560 6952.3205 12005.854 100
# hybrid 44124.099 45856.210 51531.9480 47642.5405 52225.0600 175778.380 100
# outer 106.655 120.711 141.1137 128.9665 143.2465 265.072 100
# outer_c 117.811 137.446 152.5958 142.1315 155.9650 327.101 100
# david 223.125 230.711 257.5622 241.8675 260.6100 920.164 100
So it looks like the options using outer() are fastest with as.vector() edging out c(). #DavidArenburg's solution is also right up with the solutions using outer().
I added a hybrid option using dplyr::transmute() because rev(expand.grid()) was significantly faster thatn crossing(), which appears to be marginally faster than the straight dplyr route, but still not as fast as the do.call(Vectorize... or the others.
another option (added above) would be to arrange the data frame and create id's using dplyr::row_number() or 1:nrow(). This option would work if all the combinations for p and r are present and unique, but would fail with non-sequential values.
Another base R option using do.call + Vectorize + expand.grid
> do.call(Vectorize(calculateID),unname(rev(expand.grid(r,p))))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data
p <- c(1, 2, 3, 4, 5)
r <- c(1, 2, 3)
An option with tidyverse
library(dplyr)
library(tidyr)
crossing(p, r) %>%
rowwise %>%
transmute(out = calculateID(p, r)) %>%
pull(out)
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Recoding a nominal variable with multiple categories into a dummy variable

I am trying to condense a party ID seven point scale variable (pid_x) from the ANES 2012 data to a dummy variable (democrat = 1 and republican = 0). This entails removing all missing values and excluding independents (4). I can remove NAs, but how would I filter out independents and properly mutate the new variable? Yes, I am very new to R. Much appreciated!
The code below provides the following error:
"Error: Problem with mutate() input party_id_recode.
x Can't recycle ..1 (size 2054) to match ..2 (size 3).
i Input party_id_recode is `ifelse(pid_x == 1:3, 1, ifelse(pid_x == 5:7, 0))"
library(tidyverse)
anesnew <- anes %>%
na.omit(anes$pid_x) %>%
mutate(party_id_recode = ifelse(pid_x == 1:3, 1,
ifelse(pid_x == 5:7, 0)))
Reproducible data and expected output would be very useful, but it looks like your ifelse() statement hasn't been constructed properly, and could be simplified:
anesnew <- anes %>%
filter(!is.na(pid_x), pid_x != 4) %>%
mutate(party_id_recode = case_when(pid_x < 4 ~ 1,
pid_x > 4 ~ 0))
With the following sample data:
anes <- tibble(pid_x = c(1, 2, 3, 4, 5, 6, 7, NA))
The results are:
# A tibble: 6 x 2
pid_x party_id_recode
<dbl> <dbl>
1 1 1
2 2 1
3 3 1
4 5 0
5 6 0
6 7 0

Iterate over combinations from one row in each index by group in r

I have a dataset (example) as below,
data <- data.frame(pc = c("A","A","A","A","A","A", "B","B","B","B"), #categorical
index = c(1, 1, 2, 2, 2, 3, 4, 5, 5, 5), #categorical
g= c(1, 2, 4, 3, 6, 7, 8, 5, 9, 3), #numeric
h= c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4)) #categorical
I want to group by 'pc', iterate over all combinations based on 'index' to get the summation of values in 'g' and number of categories in 'h' columns, and keep the rows of the combination that yields the highest summation value from 'g' + number of categories from 'h'.
For example, in pc=A group, index=1 has two rows, index=2 has three, index=3 has one, so in total I have 2x3x1= 6 combinations (each combination has three rows, one with index=1, one with index=2, one with index=3). I want to keep the rows (one row from each unique index) that yields the highest (summation value from 'g' + number of categories from 'h'). The number of index and length of each index are all different in each pc group.
Just an example to visualise the combination for pc=A group,
combination sum_of_values_in_g number_of_categories_in_h
#1 12 2
#2 11 3
#3 14 3
#4 13 2
#5 12 3
#6 15 3
My desired result in this example will be
pc index g h
A 1 2 1
A 2 6 2
A 3 7 3
B 4 8 3
B 5 9 3
I have done some research on how to get combinations
(Iterate over unique combination of groups in a data frame, How to iterate through all combinations of columns and apply function by group in R? and
Combinations by group in R)..
but I couldn't figure out how to get the right combination in each group and run further operation in each combination... Any input or direction will be appreciated!
Here is a brute force solution. The run time could be really long given a large dataset.
We need functions from these packages:
library(tidyr)
library(dplyr)
library(purrr)
This is the first step, we need a function to first split your data into several groups (split(transpose(df), df[[split_by]])), then find all possible row combinations across them (cross(...)), and finally merge each of them into a single dataframe (lapply(..., bind_rows)).
perm_all <- function(df, split_by){
lapply(cross(split(transpose(df), df[[split_by]])), bind_rows)
}
(transpose turns an n-row dataframe into an n-element list of single-row dataframes)
This is the second step, we loop through all dataframes in that list to see which one satisfies your requirements.
which_max <- function(ls_of_df, numer, categ) {
test_stats <- vapply(
ls_of_df,
function(df) {
temp <- length(unique(df[[categ]]))
c(sum(df[[numer]]) + temp, temp)
},
double(2L)
)
# You could have multiple maxima for those sums
out <- which(test_stats[1L, ] == max(test_stats[1L, ]))
# but after the second test (i.e. find the greatest number of categories), you should have one and only one combination left
out[[which.max(test_stats[2L, out])]]
}
Now, we use a single function to perform these two steps.
max_of_all_perm <- function(df, group_var, numer, categ) {
l <- perm_all(df, group_var)
l[[which_max(l, numer, categ)]]
}
And run it across all groups defined by pc
data %>%
nest(data = -pc) %>%
mutate(data = lapply(data, max_of_all_perm, "index", "g", "h")) %>%
unnest(data)
Output
# A tibble: 5 x 4
pc index g h
<chr> <dbl> <dbl> <dbl>
1 A 1 2 1
2 A 2 6 2
3 A 3 7 3
4 B 4 8 3
5 B 5 9 3

Function to find sub ID's of an ID in a data frame

I have a data frame that contains two columns, an ID column and a column with sub ID's that are related to the corresponding ID. The sub ID's can again have sub ID's (in this case the previous sub ID is now an ID).
library(tibble)
df <- tibble(id = c(1, 1, 2, 2, 3, 7), sub_id = c(2, 3, 4, 5, 6, 8))
df
# A tibble: 6 x 2
id sub_id
<dbl> <dbl>
1 1 2
2 1 3
3 2 4
4 2 5
5 3 6
6 7 8
I would like to write a function that finds all sub ID's that are related to an ID. It should return a vector with all sub ID's.
find_all_sub_ids <- function (data, id) {
data %>% ...
}
find_all_sub_ids(df, id = 1)
[1] 2 3 4 5 6
find_all_sub_ids(df, id = 2)
[1] 4 5
find_all_sub_ids(df, id = 9)
[1] NULL
This is very different from everything I have done in R so far and it was hard for me to phrase a good title for this question. So it is possible that with the right phrasing I could have already found an answer by just googling.
My first intuition for solving this was while loops. Since I also do not know how many sublevels there could be the function should continue until all are found. I never used while loops though and don't really know how I could implement them here.
Maybe someone knows a good solution for this problem. Thanks!
Edit: Forgot to assign the tibble to df and to use this argument in the function call.
With igraph:
library(igraph)
g <- graph_from_data_frame(d, directed = TRUE)
find_all_subs <- function(g,id){
#find child nodes, first one being origin
r <- igraph::subcomponent(g,match(id, V(g)$name),"out")$name
#remove origin
as.numeric(r[-1])
}
find_all_subs(g,1)
[1] 2 3 4 5 6
find_all_subs(g,2)
[1] 5 6
I think it's easiest to formulate this as a graph problem.
Your data.frame describes a directed graph (vertices going from id to sub_id), and you are interested in which nodes are reachable from a certain vertex.
Using tidygraph, this can be achieved as such:
library(tidyverse)
library(tidygraph)
df <- tibble(id = c(1, 1, 2, 2, 3, 7), sub_id = c(2, 3, 4, 5, 6, 8))
find_all_sub_ids <- function (id) {
if (!(id %in% df$id)) {
return(NULL)
}
grph <- df %>%
as_tbl_graph(directed = TRUE)
id <- which(grph %>% pull(name) == as.character(id))
grph %>%
activate(nodes) %>%
mutate(reachable = !is.na(bfs_dist(id))) %>%
as_tibble() %>%
filter(reachable) %>%
pull(name) %>%
as.numeric()
}
We see which nodes are reachable (they have a non-NA distance to your given node), we use bfs_dist (see here for explanation).
This gives
> find_all_sub_ids(1)
[1] 1 2 3 4 5 6
> find_all_sub_ids(2)
[1] 2 4 5
> find_all_sub_ids(9)
NULL
The advantage of such an approach is that it can search many levels deep without you needing to write a loop explicitly.
Edit
There was a bug in my code, tidygraph::bfs_dist uses a differend id than I expected. Fixed it now.
On the new example:
> find_all_sub_ids(10)
[1] 10 200 300
I did it using a dataframe. The following works.
x= c(1,1,2,2,3,7)
y = c(2, 3, 4, 5, 6, 8)
df <- data.frame(cbind(x,y))
colnames(df) =c('id', 'sub_id')
find_all_sub_ids <- function (df, id_requested) {
si <- df[df$id==id_requested,]$sub_id
return(si)
}
find_all_sub_ids(df,id=2)
[1] 4 5

Resources