using the uniroot function with dplyr pipes - r

I'm trying to utilize the uniroot function inside a piping scheme. I have root data by depth, and I fit a model for each crop-year set and put the fitted parameter (A in this example) into a tibble. A simplified dataset is below:
mydat <- tribble(
~crop, ~year, ~A,
"corn", 2011, 4,
"corn", 2012, 8.5,
"soy", 2011, 4.2
)
I want to add a column that tells me the x value of my function at y = 0.5. The following code works as a stand-alone.
myfunc <- function(x, y, A) {2 + A * x - y}
uniroot(myfunc, y = 0.5, A = 4, lower = 0, upper = 10, extendInt = "yes")
If I try to put it into a piping scheme using dplyr's mutate or do, it doesn't work.
mydat %>%
mutate(x50 = uniroot(myfunc, y = 0.5, A = .$A, lower = 0, upper = 10,
extendInt = "yes"))
mydat %>%
do(x50 = uniroot(myfunc, y = 0.5, A = .$A, lower = 0, upper = 10,
extendInt = "yes"))

The uniroot function is not vectorised over its arguments. Functions like sqrt are:
> sqrt(c(1,2,3))
[1] 1.000000 1.414214 1.732051
but uniroot isnt:
> uniroot(myfunc, y = 0.5, A = c(1,2,3), lower = 0, upper = 10, extendInt = "yes")
Error in uniroot(myfunc, y = 0.5, A = c(1, 2, 3), lower = 0, upper = 10, :
did not succeed extending the interval endpoints for f(lower) * f(upper) <= 0
In addition: Warning messages:
1: In if (is.na(f.lower)) stop("f.lower = f(lower) is NA") :
the condition has length > 1 and only the first element will be used
2: In if (is.na(f.upper)) stop("f.upper = f(upper) is NA") :
the condition has length > 1 and only the first element will be used
and mutate relies on having vectorised computation.
Use lapply to iterate over any vector and call a function like this:
> lapply(mydat$A, function(a){uniroot(myfunc, y = 0.5, A = a, lower = 0, upper = 10, extendInt = "yes")$root})
[[1]]
[1] -0.375
[[2]]
[1] -0.1764706
[[3]]
[1] -0.3571429
Then use standard R functions to put that data back in your data frame if that's where you want it.

You could use purrr::map to build a list column with the results (coercing it to a data.frame), then tidyr::unnest to spread it out into columns...
library(tibble)
library(dplyr)
library(purrr)
library(tidyr)
mydat <- tribble(
~crop, ~year, ~A,
"corn", 2011, 4,
"corn", 2012, 8.5,
"soy", 2011, 4.2
)
myfunc <- function(x, y, A) {2 + A * x - y}
mydat %>%
mutate(x50 = map(A, function(x) {
as.data.frame(uniroot(myfunc, y = 0.5, A = x, lower = 0, upper = 10,
extendInt = "yes"))
})) %>%
unnest()
# # A tibble: 3 x 8
# crop year A root f.root iter init.it estim.prec
# <chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl>
# 1 corn 2011. 4.00 -0.375 0. 20 19 52439.
# 2 corn 2012. 8.50 -0.176 2.22e-16 20 18 0.0000610
# 3 soy 2011. 4.20 -0.357 2.22e-16 21 19 0.0000610

The solution with dplyr is
data |>
rowwise() |>
mutate(var_name = uniroot(f, c(lower_limit, upper_limit), vars_from_data)$root)

Related

Replace Value based on Value Before

I have a tibble with variables that use '99' or '999' as a conglomeration of all values above the value before. How would I change that to the value before + 2.
Example below.
level <- c(1,2,3,4,99)
variable <- c('age', 'age','age','age','age')
value <- c(.5, .75, 1, 1.25, 1.89)
d <- data.frame(Variable = variable, Level = level, value = value)
I would like to end up with
Variable Level Value
age 1 .5
age 2 .75
age 3 1
age 4 1.25
age 6 1.89
I'm not even sure where to start in picking the value before the 99 based on the condition that the starting value is 99.
Maybe
d$Level <- if(d$Level = 99, nrow(-1) + 2, d$Level)
I would use data.table::shift():
with(d, ifelse(Level %in% c(99, 999), shift(Level) + 2, Level))
[1] 1 2 3 4 6
But to do this in base R you could define a helper function:
baseShiftBy1 <- function(x) c(NA, x[-length(x)])
with(d, ifelse(Level %in% c(99, 999), baseShiftBy1(Level) + 2, Level))
Create a new vector with lagged values
temp=c(0,d$Level[1:(length(d$Level)-1)])+2
d$Level=ifelse(d$Level==99,temp,d$Level)

Filter_all with differing condition for each column

I have the following vector
vec1 = c(0.001, 0.05, 0.003, 0.1)
and a data frame
df = data_frame( x = seq(0.001, 0.1, length.out = 10), y = seq(0.03, 0.07, length.out = 10), z = seq(0, 0.005, length.out = 10), w = seq(0.05, 0.25, length.out = 10))
I would like to filter df such that the output would contain the rows of df for which, in each column, the minimum value would be the corresponding value of vec1 - 0.05, and the maximum would be vec1 + 0.05.
So in this example, only the first 4 rows satisfy this condition (in x I allow -0.049 to 0.501 based on the first entry of vec1, in y I allow 0 to 0.1 based on the second entry, and so on).
I am sure this can be done with filter_all and (.), something along the lines of
filter_all(df, all_vars(. >= (vec1(.) - 0.05) & . <= (vec1(.) + 0.05))))
But this doesn't work.
What am I doing wrong?
We can use mapply on the dataframe and pass it along with vec1 and check which of the values satisfy the criteria and select only those rows where all of the columns have TRUE value in it.
df[rowSums(mapply(function(x, y) x > (y-0.05) & x < (y+0.05),
df, vec1)) == ncol(df), ]
# x y z w
# <dbl> <dbl> <dbl> <dbl>
#1 0.0120 0.0344 0.000556 0.0722
#2 0.0230 0.0389 0.00111 0.0944
#3 0.0340 0.0433 0.00167 0.117
#4 0.0450 0.0478 0.00222 0.139

How to do map2_if or pmap_if in purrr

I am trying to map over a function with multiple arguments and at the same time only apply the function based on a condition with the purrr package. I can map over multiple argument with map2 or the pmap function and I can map based on a condition with the map_if function.
The following code is an basic example of what I want to do but the function map2_if doesn't exist yet. I know that for this specific problem there is a
way to solve it without functional programming but my goal is to apply it to
more complex functions.
x = rnorm(n = 5, mean = 0, sd = 1)
y = rnorm(n = 5, mean = 1, sd = 2)
func = function(x, y){x + y}
map2_if(.x = x, .y = y, .p = (x > 0 & y > 0), func)
map2_if can't really exist, because map_if returns the element unchanged if it doesn't satisfy the predicate, and if there are two inputs, what map2_if should return if the predicate is not satisfied is unclear.
Instead, it's simpler to put the data in a data frame and break the task down into steps:
library(tidyverse)
set.seed(0)
data_frame(x = rnorm(n = 5, mean = 0, sd = 1),
y = rnorm(n = 5, mean = 1, sd = 2)) %>%
filter(x > 0, y > 0) %>%
mutate(z = map2_dbl(x, y, `+`))
#> # A tibble: 3 x 3
#> x y z
#> <dbl> <dbl> <dbl>
#> 1 1.3297993 0.4105591 1.740358
#> 2 1.2724293 0.9884657 2.260895
#> 3 0.4146414 5.8093068 6.223948
If func is vectorized (like +), you don't even really need map2_dbl.

Interpolate missing values of a data frame

I have a dataset like this:
x y z
1 1 0.954
1 3 0.134
1 30 0.123
2 1 0.425
2 3 0.123
2 30 0.865
5 1 0.247
5 3 0.654
5 30 0.178
Let's think of this as the height of a surface sampled at 9 points over a 4x29 field. Suppose I want to fill in the missing values by interpolating (linear is fine), so that I end up with a z value for every (integer) x in [1,5] and every y in [1,30]. I want the result to still be a data frame with the same structure.
How can I do this in R?
I'll take the previous lack of answer as a gift :)
#akima_0.5-12
library(akima)
my_df <- data.frame(
x = c(rep(1, 3), rep(2, 3), rep(5, 3)),
y = rep(c(1, 3, 30), 3),
z = c(0.954, 0.134, 0.123, 0.425, 0.123, 0.865, 0.247, 0.654, 0.178)
)
my_op <- interp(
x = my_df$x,
y = my_df$y,
z = my_df$z,
xo = 1:5, # vector of x coordinates to use in interpolation
yo = 1:30, # vector of y coordinates to use in interpolation
linear = TRUE # default interpolation method
)
my_op$z # matrix of interpolated z coordinates, (row, col) correspond to (x, y)
ind <- which(!is.nan(my_op$z), arr.ind = TRUE)
desired_output <- data.frame(
x = ind[, 1],
y = ind[, 2],
z = as.vector(my_op$z) # data are organized column-by-column
)

Conditional change to data frame column(s) based on values in other columns

Within the simulated data set
n = 50
set.seed(378)
df <- data.frame(
age = sample(c(20:90), n, rep = T),
sex = sample(c("m", "f"), n, rep = T, prob = c(0.55, 0.45)),
smoker = sample(c("never", "former", "active"), n, rep = T, prob = c(0.4, 0.45, 0.15)),
py = abs(rnorm(n, 25, 10)),
yrsquit = abs (rnorm (n, 10,2)),
outcome = as.factor(sample(c(0, 1), n, rep = T, prob = c(0.8, 0.2)))
)
I need to introduce some imbalance between the outcome groups (1=disease, 0=no disease). For example, subjects with the disease are older and more likely to be male. I tried
df1 <- within(df, sapply(length(outcome), function(x) {
if (outcome[x] == 1) {
age[x] <- age[x] + 15
sex[x] <- sample(c("m","f"), prob=c(0.8,0.2))
}
}))
but there is no difference as shown by
tapply(df$sex, df$outcome, length)
tapply(df1$sex, df$outcome, length)
tapply(df$age, df$outcome, mean)
tapply(df1$age, df$outcome, mean)
The use of sapply inside within does not work as you expect. The function within does only use the returned value of sapply. But in your code, sapply returns NULL. Hence, within does not modify the data frame.
Here is an easier way to modify the data frame without a loop or sapply:
idx <- df$outcome == "1"
df1 <- within(df, {age[idx] <- age[idx] + 15;
sex[idx] <- sample(c("m", "f"), sum(idx),
replace = TRUE, prob = c(0.8, 0.2))})
Now, the data frames are different:
> tapply(df$age, df$outcome, mean)
0 1
60.46341 57.55556
> tapply(df1$age, df$outcome, mean)
0 1
60.46341 72.55556
> tapply(df$sex, df$outcome, summary)
$`0`
f m
24 17
$`1`
f m
2 7
> tapply(df1$sex, df$outcome, summary)
$`0`
f m
24 17
$`1`
f m
1 8

Resources