How to do map2_if or pmap_if in purrr - r

I am trying to map over a function with multiple arguments and at the same time only apply the function based on a condition with the purrr package. I can map over multiple argument with map2 or the pmap function and I can map based on a condition with the map_if function.
The following code is an basic example of what I want to do but the function map2_if doesn't exist yet. I know that for this specific problem there is a
way to solve it without functional programming but my goal is to apply it to
more complex functions.
x = rnorm(n = 5, mean = 0, sd = 1)
y = rnorm(n = 5, mean = 1, sd = 2)
func = function(x, y){x + y}
map2_if(.x = x, .y = y, .p = (x > 0 & y > 0), func)

map2_if can't really exist, because map_if returns the element unchanged if it doesn't satisfy the predicate, and if there are two inputs, what map2_if should return if the predicate is not satisfied is unclear.
Instead, it's simpler to put the data in a data frame and break the task down into steps:
library(tidyverse)
set.seed(0)
data_frame(x = rnorm(n = 5, mean = 0, sd = 1),
y = rnorm(n = 5, mean = 1, sd = 2)) %>%
filter(x > 0, y > 0) %>%
mutate(z = map2_dbl(x, y, `+`))
#> # A tibble: 3 x 3
#> x y z
#> <dbl> <dbl> <dbl>
#> 1 1.3297993 0.4105591 1.740358
#> 2 1.2724293 0.9884657 2.260895
#> 3 0.4146414 5.8093068 6.223948
If func is vectorized (like +), you don't even really need map2_dbl.

Related

{purrr}: how to replace a simple loop with `purrr::map()` while using up all arguments of a function?

Hi all, I've read the help file and quite a few questions on the topic here but couldn't find an answer. I believe it is a very simple question, so I am likely missing something obvious.
I would like to replace such a basic for loop implementation, such as
lst <- list()
for (i in 1:10) {
lst[[i]] <- rnorm(n = 3, mean = 5, sd = 3)
}
lst
with purrr::map(), while using up all arguments of the function (here: rnorm())
The following doesn't work and returns an unused argument error.
1:10 %>%
map(rnorm, n = 3, mean = 5, sd = 3)
# Error in .f(.x[[i]], ...) : unused argument (.x[[i]])
The following runs, but map passed the arguments 1:10 to rnorm(sd = .x), so
the result is not what expected.
library(purrr)
1:3 %>%
map(rnorm, n = 3, mean = 5)
#> [[1]]
#> [1] 5.133702 6.456135 5.041438
#>
#> [[2]]
#> [1] 5.722486 4.614772 2.640809
#>
#> [[3]]
#> [1] 1.445749 5.826666 6.096497
Is there a simple way of achieving that?
I appreciate your time and attention.
You can do this with for loop and/or map but I think this is more of replicate or purrr::rerun problem.
In base R :
replicate(10, rnorm(n = 3, mean = 5, sd = 3), simplify = FALSE)
Or using purrr :
purrr::rerun(10, rnorm(n = 3, mean = 5, sd = 3))
There are two ways you can do this. The first is an anonymous function and the second is a formula. Refer to the examples in ?map() for more information.
Anonymous Function
library(purrr)
1:10 %>%
map(function(x) rnorm(n = 3, mean = 5, sd = 3))
Formula
1:10 %>%
map(~ rnorm(n = 3, mean = 5, sd = 3))

How to automatically set-up and add functions to a model in R?

I am setting up a model, and I am trying to reduce the amount of writing I have to do.
Concretely, I am using the coala R-package to do coalescent simulations, and I am trying to easily implement a stepping-stone migration model.
A reproducible example: 4 linearly distributed populations exchange migrants according to stepping-stone pattern (only the adjacent populations).
model <- coal_model(sample_size = c(5, 5, 5, 5),
loci_number = 1,
loci_length = 10,
ploidy = 1) +
feat_mutation(rate = mut_rate, # e.g. 0.1
model = "HKY",
base_frequencies = c(0.25,0.25,0.25,0.25),
tstv_ratio = 4) +
feat_migration(mig_rate, 1, 2) + # mig_rate can be e.g. 0.5
feat_migration(mig_rate, 2, 1) +
feat_migration(mig_rate, 2, 3) +
feat_migration(mig_rate, 3, 2) +
feat_migration(mig_rate, 3, 4) +
feat_migration(mig_rate, 4, 3) +
sumstat_dna(name = "dna", transformation = identity)
This example works, but the downside is that I have to write many 'feat_migration' lines, although there is a clear pattern that could be automated. It is fine for a small number of populations, but I want to do a large simulation with about 70 populations. Does someone has a good idea how to automate this? The documentation has not helped me so far.
I tried two things that didn't work:
feat_migration(mig_rate, c(1,2,2,3,3,4), c(2,1,3,2,4,3))
and something like this:
migration_model <- function(){
for(i in 1:n_pops){
feat_migration(mig_rate, i, i+1) +
feat_migration(mig_rate, i+1, i))
}
In the latter case, I don't really know how I can correctly create and parse all functions correctly into my model.
Good ideas are very welcome! :)
Consider the higher-order functions: Map (wrapper to mapply) and Reduce to build a list of function calls and add them iteratively into model. Specifically, Reduce helps for function accumulating needs where result of each iteration needs to be passed into the next iteration to reduce to a single final result.
n_pops <- 4
start_pts <- as.vector(sapply(seq(n_pops-1), function(x) c(x, x+1)))
start_pts
# [1] 1 2 2 3 3 4
end_pts <- as.vector(sapply(seq(n_pops-1), function(x) c(x+1, x)))
end_pts
# [1] 2 1 3 2 4 3
# LIST OF feat_migration()
feats <- Map(function(x, y) feat_migration(mig_rate, x, y), start_pts, end_pts)
# LIST OF FUNCTIONS
funcs <- c(coal_model(sample_size = c(5, 5, 5, 5),
loci_number = 1,
loci_length = 10,
ploidy = 1),
feat_mutation(rate = mut_rate, # e.g. 0.1
model = "HKY",
base_frequencies = c(0.25,0.25,0.25,0.25),
tstv_ratio = 4),
feats,
sumstat_dna(name = "dna", transformation = identity)
)
# MODEL CALL
model <- Reduce(`+`, funcs)
As an aside, the functional form for ggplot + calls is Reduce:
gp <- ggplot(df) + aes_string(x='Time', y='Data') +
geom_point() + scale_x_datetime(limits=date_range)
# EQUIVALENTLY
gp <- Reduce(ggplot2:::`+.gg`, list(ggplot(df), aes_string(x='Time', y='Data'),
geom_point(), scale_x_datetime(limits=date_range)))
The answer is a slight edit by the solution proposed by Parfait. The model initializes without errors, and can be run in the simulator without errors.
n_pops <- 4
start_pts <- as.vector(sapply(seq(n_pops-1), function(x) c(x, x+1)))
end_pts <- as.vector(sapply(seq(n_pops-1), function(x) c(x+1, x)))
# LIST OF feat_migration()
feats <- Map(function(x, y) feat_migration(mig_rate, x, y), start_pts, end_pts)
# LIST OF FUNCTIONS
funcs <- c(list(coal_model(sample_size = c(5, 5, 5, 5),
loci_number = 1,
loci_length = 10,
ploidy = 1),
feat_mutation(rate = mut_rate, # e.g. 0.1
model = "HKY",
base_frequencies = c(0.25,0.25,0.25,0.25),
tstv_ratio = 4),
sumstat_dna(name = "dna", transformation = identity)),
feats)
)
# MODEL CALL
model <- Reduce(`+`, funcs)

using the uniroot function with dplyr pipes

I'm trying to utilize the uniroot function inside a piping scheme. I have root data by depth, and I fit a model for each crop-year set and put the fitted parameter (A in this example) into a tibble. A simplified dataset is below:
mydat <- tribble(
~crop, ~year, ~A,
"corn", 2011, 4,
"corn", 2012, 8.5,
"soy", 2011, 4.2
)
I want to add a column that tells me the x value of my function at y = 0.5. The following code works as a stand-alone.
myfunc <- function(x, y, A) {2 + A * x - y}
uniroot(myfunc, y = 0.5, A = 4, lower = 0, upper = 10, extendInt = "yes")
If I try to put it into a piping scheme using dplyr's mutate or do, it doesn't work.
mydat %>%
mutate(x50 = uniroot(myfunc, y = 0.5, A = .$A, lower = 0, upper = 10,
extendInt = "yes"))
mydat %>%
do(x50 = uniroot(myfunc, y = 0.5, A = .$A, lower = 0, upper = 10,
extendInt = "yes"))
The uniroot function is not vectorised over its arguments. Functions like sqrt are:
> sqrt(c(1,2,3))
[1] 1.000000 1.414214 1.732051
but uniroot isnt:
> uniroot(myfunc, y = 0.5, A = c(1,2,3), lower = 0, upper = 10, extendInt = "yes")
Error in uniroot(myfunc, y = 0.5, A = c(1, 2, 3), lower = 0, upper = 10, :
did not succeed extending the interval endpoints for f(lower) * f(upper) <= 0
In addition: Warning messages:
1: In if (is.na(f.lower)) stop("f.lower = f(lower) is NA") :
the condition has length > 1 and only the first element will be used
2: In if (is.na(f.upper)) stop("f.upper = f(upper) is NA") :
the condition has length > 1 and only the first element will be used
and mutate relies on having vectorised computation.
Use lapply to iterate over any vector and call a function like this:
> lapply(mydat$A, function(a){uniroot(myfunc, y = 0.5, A = a, lower = 0, upper = 10, extendInt = "yes")$root})
[[1]]
[1] -0.375
[[2]]
[1] -0.1764706
[[3]]
[1] -0.3571429
Then use standard R functions to put that data back in your data frame if that's where you want it.
You could use purrr::map to build a list column with the results (coercing it to a data.frame), then tidyr::unnest to spread it out into columns...
library(tibble)
library(dplyr)
library(purrr)
library(tidyr)
mydat <- tribble(
~crop, ~year, ~A,
"corn", 2011, 4,
"corn", 2012, 8.5,
"soy", 2011, 4.2
)
myfunc <- function(x, y, A) {2 + A * x - y}
mydat %>%
mutate(x50 = map(A, function(x) {
as.data.frame(uniroot(myfunc, y = 0.5, A = x, lower = 0, upper = 10,
extendInt = "yes"))
})) %>%
unnest()
# # A tibble: 3 x 8
# crop year A root f.root iter init.it estim.prec
# <chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl>
# 1 corn 2011. 4.00 -0.375 0. 20 19 52439.
# 2 corn 2012. 8.50 -0.176 2.22e-16 20 18 0.0000610
# 3 soy 2011. 4.20 -0.357 2.22e-16 21 19 0.0000610
The solution with dplyr is
data |>
rowwise() |>
mutate(var_name = uniroot(f, c(lower_limit, upper_limit), vars_from_data)$root)

Achieving t random variables with each different df and ncp in R?

I'm trying to generate 5 random t variates using rt(), with each of the 5 having a particular df (respectively, from 1 to 5) and a particular ncp (respectively, seq(0, 1, l = 5)). So, 5 random t-variables each having a different df and a different ncp.
To achieve the above, I tried the below with no success. What could be the efficient R code to achieve what I described above?
vec.rt = Vectorize(function(n, df, ncp) rt(n, df, ncp), c("n", "df", "ncp"))
vec.rt(n = 5, df = 1:5, ncp = seq(0, 1, l = 5))
Or
mapply(FUN = rt, n = 5 , df = 1:5, ncp = seq(0, 1, l = 5))
Notice for:
rt(n = 5, df = 1:5, ncp = seq(0, 1, l = 5))
R gives the following warning:
Warning message:
In if (is.na(ncp)) { :
the condition has length > 1 and only the first element will be used
Rephrasing your question helps to find an answer: you want sample of length 1 (n = 1) from 5 random variables each having different parameters.
mapply(FUN = rt, n = 1 , df = 1:5, ncp = seq(0, 1, l = 5))

how to apply functions on data frame in r

How can i apply the following function rt on each and every value l in df.
x and y have the following values.
x<-9
y<-1
rt<-function(x,y,l) min(x,max(0,l-y))
df
a b c
5 6 7
1 4 1
2 4 3
Probably simplest if you'd like to stick with dataframes is to use apply with the MARGIN parameter set to c(1,2), which makes it apply the function by both rows and columns (i.e., to every cell).
x <- 9
y <- 1
rt <- function(x, y, l) min(x, max(0, l-y))
df <- data.frame(a = c(5, 1, 2),
b = c(6, 4, 4),
c = c(7, 1, 3))
rt_df <- as.data.frame(apply(df, c(1,2), rt, x = x, y = y))

Resources