Fitting curves with DRC package in R? - r

I'm trying to fit curves with the DRC package in R.
Example:
x_yrs<-c(2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014,
2015, 2016, 2017)
y<-c(1.89, 0.34, 0.47, 2.46, 2.13, 7.49, 47.24, 117.84, 202.8, 322.7,
540.72, 744.22, 1148.7)
MaxPop<-110000
Y_Adj<-y/MaxPop
EV<-drm(y~ x_yrs,fct = LL.3(fixed = c(NA, NA, NA)))
plot(EV, broken = TRUE, type = "all")
EV<-drm(y~ x_yrs,fct = LL.5(fixed = c(NA, NA, NA, NA, NA)))
plot(EV, broken = TRUE, type = "all")
x_yrs_Adj<- x_yrs-2004
EV<-drm(Y_Adj~ x_yrs_Adj,fct = LL.5(fixed = c(NA, NA, NA, NA, NA)))
plot(EV, broken = TRUE, type = "all",xlim = c(0, 40), ylim = c(0, 1))
I would like to max value of the curve to be "1" or the "MaxPop" ie as the upper asymptote.
How would I go about changing the drm model to accomplish this?

"I would like to set the future population size to reach 110,000." I don't think it will be possible to fit a model with that constraint based on the data you give. The response that you have for the support of the function doesn't even get near to that (potentially?) asymptotic region. So I think you need to rethink your approach.
That aside, in drc you can realise constraints by specifying values for specific parameters through the fixed function argument.
EV <- drm(Y_Adj ~ x_yrs_Adj, fct = LL.5(fixed = c(NA, 0, 1, NA, NA)))
You can find out about the individual parameters if you do e.g. ?LL.5:
LL.5(fixed = c(NA, NA, NA, NA, NA), names = c("b", "c", "d", "e", "f"), ...)
[...]
The five-parameter logistic function is given by the expression
f(x) = c + \frac{d-c}{(1+\exp(b(\log(x)-\log(e))))^f}
So in this case, we set c to zero and then fix d = 1.
Let's show the plot
plot(EV, broken = TRUE, type = "all", xlim = c(0, 40000), ylim = c(0, 1))
You can see the issue here. As you don't have any support of values x_yrs_Adj closer to the function's asymptotic behaviour, your fit (and the resulting estimated parameters) will be poor.

Related

How to do the Whitney U test (wilcox.test) across several columns?

So I have data looking a little something like this:
Data:
Area
Al
Cd
Cu
A
10000
0.2
30
A
15000
0.5
25
A
NA
Na
NA
B
8000
1.1
55
B
11000
0.2
40
B
13000
0.1
40
etc.
And I want to do a Mann Whitney U test between group A and B separately for each element/column.
I have managed to do this manually for each element individually according to this:
#Data is the above dataframe
Area_A <- subset(Data, Group %in% c("A"))
Area_B <- subset(Data, Group %in% c("B"))
WhitneyU_Al <- wilcox.test(Area_A$Al, Area_B$Al, na.rm = TRUE, paired = FALSE, exact = FALSE)
(I couldn't figure out how to do it based on the rows in the column "Areas" in one data frame, which is why I divided it into two subsets).
Now, I have a lot more columns than just these three (43 to be exact), and I was wondering if there was some way to do this across all columns without changing it manually each time?
I tried a few variations of this:
WhitneyU <- wilcox.test(Area_A, Area_B, na.rm = TRUE, paired = FALSE, exact = FALSE)
#OR
WhitneyU <- wilcox.test(Area_A[2:43], Area_B[2:43], na.rm = TRUE, paired = FALSE, exact = FALSE)
But they both return the error that "'x' must be numeric".
I suspect the answer isn't this easy and that I am barking up the wrong tree? Either that, or the question/answer is too obvious and I am just not seeing it.
When I tried looking up multiple tests most answers deal with how to do multiple tests if you have multiple "groups" (as in, they have area A, B, C and D). Sorry if this has been asked before and I didn't find it (or I didn't understand it). I did look.
Any help is appreciated.
Edit:
Upon request, using dput() on part of my data it looks a bit like this:
structure(list(Group = c("A", "A", "A", "A",
"A", "B", "B", "B", "B", "B", "B"
), Al = c(NA, NA, NA, 18100, 18400, 32500, 33200, 31200,
17400, 13900, 14400), As = c(NA, NA, NA, 16.9, 14.6, 8.83, 8.59,
8.42, 13.4, 13.5, 13.7), B = c(NA, NA, NA, 18, 16, 14, 14, 11,
53, 87, 58), Bi = c(NA, NA, NA, 0.13, 0.12, 0.57, 0.55, 0.52, 0.22,
0.18, 0.21), Ca = c(NA, NA, NA, 5950, 5480, 6220, 6230, 5950,
6850, 8170, 7000), Cd = c(NA, NA, NA, 0.2, 0.2, 0.2, 0.2, 0.18,
0.31, 0.36, 0.46)), row.names = c(1L, 2L, 3L, 4L, 5L, 40L, 41L,
42L, 43L, 44L, 45L), class = c("tbl_df", "tbl", "data.frame"))
wilcox.test requires the first input (x) to be numeric. In R, factors have an integer value assigned to them “under the hood” (ie, A = 1, B = 2,…). So you can convert the group variable in your data frame df. This should work to perform the test across all other columns:
df$Group <- as.factor(df$Group)
lapply(df[-1], function(x){
wilcox.test(x ~ df$Group)
})

plotly error_y values applied to incorrect y-values when NA present

I have a table of rates and confidence intervals I want to plot. Some of the rates (and their CI's) are suppressed according to data quality rules. When I plot series with missing values, the error bar values are assigned to the incorrect rate and the preceding rate is given (+ 0 / - 0) error bar values.
Reprex:
reprex <- tibble(year = as_factor(c(2016, 2017, 2018, 2019)),
rate = c(NA, 0.153, 0.123, NA),
lcl = c(NA, 0.0813, 0.0612, NA),
ucl = c(NA, 0.261, 0.219, NA)) %>%
mutate(difflow = rate-lcl,
diffhi = ucl-rate)
plot <- plot_ly()
plot <- add_trace(plot,
data = reprex,
connectgaps = F,
x = ~year,
y = ~rate, mode = 'markers+lines', type = "scatter",
error_y = ~list(type = "data",
array = ucl-rate,
arrayminus = rate-lcl,
color = "black"))
In the above plot, rates for 2016 and 2019 are correctly missing. The error confidence limits for 2017 are (+0 / -0) and the limits plotted for 2018 (+0.108 / -0.0717) match the values for diffhi and difflow of 2017. How do I correct this?
EDIT: I tried wrapping the array and arrayminus values in na.omit(). This works for the above reprex, but fails when additional NA's and data are introduced. Below, the 2017 rate now has 2018's confidence intervals and 2018 has no error bars. This is different from before when Plotly assigned error bar values of zero. Now they are just missing.
reprex <- tibble(year = as_factor(c(2013, 2014, 2015, 2016, 2017, 2018, 2019)),
rate = c(3, 2, NA, NA, 0.153, 0.123, NA),
lcl = c(2, 1, NA, NA, 0.0813, 0.0612, NA),
ucl = c(4, 5, NA, NA, 0.261, 0.219, NA)) %>%
mutate(difflow = rate-lcl,
diffhi = ucl-rate)
plot <- plot_ly()
plot <- add_trace(plot,
data = reprex,
connectgaps = F,
x = ~year,
y = ~rate, mode = 'markers+lines', type = "scatter",
error_y = ~list(type = "data",
array = na.omit(ucl-rate),
arrayminus = na.omit(rate-lcl),
color = "black"))
plot
We could replace the NA with 0
library(dplyr)
library(tidyr)
reprex <- reprex %>%
mutate(across(where(is.numeric), replace_na, 0))
Applying the OP's code gives

Interpolate Time Series Using Weighted Loess in R

I have a dataset of multiple lakes with water level elevations through time. The observations are not regularly spaced and have many large gaps. Further, some of the older observations may be of lower or unknown quality. I created a separate model that does a reasonably good job of predicting water levels across time, but still misses the actual observations by varying amounts.
I would like to create a third inputed/interpolated set of data in which the solution is:
informed by the modeled values where observations are missing
crosses the highly weighted observations
and is informed by the lower weighted observations
So far, I have used the fable package's TSLM->interpolate to perform this. It works reasonably well, but I cannot see a way to introduce weighting to the process. Further, it relies to heavily on the global coefficient and intercepts making it a bit too volatile when the modeled value significantly misses the observed. I am thinking that I need to use some sort of weighted loess that relies on local coefficients and can accomodate weighting.
library(dplyr)
library(tsibble)
library(fable)
library(ggplot2)
test_data <- data.frame(obs_year = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009),
site_name = c("Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2"),
observed = c(100,200,NA, NA, NA, NA, 220, NA, NA, 125, NA,NA,425, NA, 475, NA, 450, 450, 475, 500),
weights = c(1,1,NA, NA, NA, NA, 2, NA, NA, 2, NA,NA,2, NA, 1, NA, 2, 2, 2, 2),
modeled = c(110,120,165,150, 200, 225, 240, 250, 150, 130, 450,430,415,400, 425, 450, 460, 460, 470, 490))
test_tsibble <- as_tsibble(test_data, key = site_name, index = obs_year)
tslm_interpolate <- test_tsibble %>%
group_by(site_name) %>%
model(lm = TSLM(observed~modeled)) %>%
fabletools::interpolate(test_tsibble)
tslm_interpolate <- left_join(tslm_interpolate, test_data, by = c("site_name", "obs_year")) %>%
dplyr::select(obs_year, site_name, observed = observed.y, imputed = observed.x, modeled, weights)
tslm_interpolate %>%
ggplot(aes(x=obs_year))+
geom_line(aes(y = imputed), color = "blue")+
geom_line(aes(y = modeled), color = "red")+
geom_point(aes(y = observed), color = "green")+
facet_wrap(~site_name, scales = "free_y")

custom rmeta - forest plot generation does not work: " 'x' and 'units' must have length > 0"

I tried to generate a "forest plot" without summary estimates using the rmeta package. However, using ?forestplot and then starting from the description or the example does not help, I am always getting the same error. I would assume that it is a simple one that has to do with the matrix/vector lengths somewhat not lining up but I kept changing and adjusting and still cannot find the error...
Here is the example code:
tabletext<-cbind(c(NA, NA, NA, NA, NA, NA),
c(NA, NA, NA, NA, NA, NA),
c("variable1","subgroup","2nd", "3rd", "4th", "5th"),
c(NA,"mean","1.8683639", "2.5717301", "4.4966049, 9.0008054")
)
tabletext
png("forestplot.png")
forestplot(tabletext, mean = c(NA, NA, 1.8683639, 2.5717301, 4.4966049, 9.0008054), lower = c(NA, NA, 1.4604643, 2.0163468, 3.5197956, 6.9469213), upper = c(NA, NA, 2.3955105, 3.2897459, 5.7672966, 11.7288609),
is.summary = c(rep(FALSE, 6)), zero = 1, xlog=FALSE, boxsize=0.75, xticks = NULL, clip = c(0.9, 12))
dev.off()
Error message:
clip = c(0.9, 12))
Error in unit(rep(1, sum(widthcolumn)), "grobwidth", labels[[1]][widthcolumn]) :
'x' and 'units' must have length > 0
dev.off()
Any help is very much appreciated!
This works with the forestplot-package although you need to remove the xticks=NULL:
tabletext<-cbind(c(NA, NA, NA, NA, NA, NA),
c(NA, NA, NA, NA, NA, NA),
c("variable1","subgroup","2nd", "3rd", "4th", "5th"),
c(NA,"mean","1.8683639", "2.5717301", "4.4966049, 9.0008054")
)
png("forestplot.png")
forestplot(tabletext,
mean = c(NA, NA, 1.8683639, 2.5717301, 4.4966049, 9.0008054),
lower = c(NA, NA, 1.4604643, 2.0163468, 3.5197956, 6.9469213),
upper = c(NA, NA, 2.3955105, 3.2897459, 5.7672966, 11.7288609),
is.summary = c(rep(FALSE, 6)), zero = 1,
xlog=FALSE, boxsize=0.75, clip = c(0.9, 12))
dev.off()
Gives (I recommend some polishing before submitting for publishing):

Adding different vertical lines for each panel in xyplot using lattice in R

I have a graph of plant species frequency by year for several sites that I am plotting using xyplot in the lattice package. I've figured out how to get the scatterplot for each species-site combo. However, I want to add an abline representing each year in which a chemical treatment was done. Chemical treatments were added in different years at each site, and I'd like to add a vertical abline for each species-site graph in which a chemical treatment was performed at that site. Here is my xyplot code:
library(plyr)
sp.1 <- data.frame(site=rep('a', 10), year=seq(2001, 2010, 1), year.trt=c(NA, NA, NA, NA, 2005, NA, NA, 2008, NA, NA), pl.1=rnorm(10, 4, 1), pl.2=rnorm(10, 6, 2))
sp.2 <- data.frame(site=rep('b', 10), year=seq(2001, 2010, 1), year.trt=c(2001, NA, NA, NA, NA, 2006, NA, NA, NA, NA), pl.1=rnorm(10, 5, 2), pl.2=rnorm(10, 4, 1))
sp.3 <- data.frame(site=rep('c', 10), year=seq(2001, 2010, 1), year.trt=c(NA, NA, NA, 2004, NA, NA, NA, NA, 2009, NA), pl.1=rnorm(10, 8, 1), pl.2=rnorm(10, 3, 3))
data <- rbind.fill(sp.1, sp.2, sp.3)
xy.plot <- xyplot(pl.1 + pl.2 ~ year | site, data=data, outer=T, type='l',
as.table=T, xlab=c('Year'), ylab=c('Spp. Frequency (%)'),
panel=function(x, y,...){
panel.xyplot(x,y, type='l')
panel.abline(v=data$year.trt)
})
print(xy.plot)
So, the important line of code in this block is the 'panel.abline(v=test$trt.year)'. Currently, this plots all years in my dataset in which a chemical treatment was done, however, I'd like it to show in each panel which year a treatment was done for that specific site.
Any insight would be greatly appreciated.
Thanks,
Paul

Resources