simple curve fitting in R - r

I am trying to find a fit for my data. But so far had no luck.
Tried the logarithmic, different ones from the drc package .. but I am sure there must be a better one I just don't know the type.
On a different note - I would be grateful for advice on how to go about curve hunting in general.
library(drc)
df<-structure(list(x = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55), y = c(0.1066, -0.6204, -0.2028, 0.2621, 0.4083,
0.4497, 0.6343, 0.7762, 0.8809, 1.0029, 0.8089, 0.7845, 0.8009,
0.9319, 0.9414, 0.9505, 0.9323, 1.0321, 0.9381, 0.8975, 1.0929,
1.0236, 0.9589, 1.0644, 1.0411, 1.0763, 0.9679, 1.003, 1.142,
1.1049, 1.2868, 1.1569, 1.1952, 1.0802, 1.2125, 1.3765, 1.263,
1.2507, 1.2125, 1.2207, 1.2836, 1.3352, 1.1311, 1.2321, 1.4277,
1.1645), w = c(898, 20566, 3011, 1364, 1520, 2376, 1923, 1934,
1366, 1010, 380, 421, 283, 262, 227, 173, 118, 113, 95, 69, 123,
70, 80, 82, 68, 83, 76, 94, 101, 97, 115, 79, 98, 84, 92, 121,
97, 102, 93, 92, 101, 74, 124, 64, 52, 63)), row.names = c(NA,
-46L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`47` = 47L), class = "omit"))
fit <- drm(data = df,y ~ x,fct=LL.4(), weights = w)
plot(fit)

1) If we ignore the weights then y = a + b * x + c/x^2 seems to fit and is linear in the coefficients so is easy to fit. This seems upward sloping so we started with a line but then we needed to dampen that so we added a reciprocal term. A reciprocal quadratic worked slightly better than a plain reciprocal based on the residual sum of squares so we switched to that.
fm <- lm(y ~ x + I(1 / x^2), df)
coef(summary(fm))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.053856e+00 0.116960752 9.010341 1.849238e-11
## x 4.863077e-03 0.002718613 1.788808 8.069195e-02
## I(1/x^2) -1.460443e+02 16.518887452 -8.841049 3.160306e-11
The coefficient of the x term is not significant at the 5% level -- the p value is 8% in the table above -- so we can remove it and it will fit nearly as well giving a model with only two parameters. In the plot below the fm fit with 3 parameters is solid and the fm2 fit with 2 parameters is dashed.
fm2 <- lm(y ~ I(1 / x^2), df)
plot(y ~ x, df)
lines(fitted(fm) ~ x, df)
lines(fitted(fm2) ~ x, df, lty = 2)
2) Another approach is to use two straight lines. This is still continuous but has one non-differentiable point at the point of transition. The model has 4 parameters, the intercepts and slopes of each line. Below we do use the weights. It has the advantage of an obvious motivation based on the appearance of the data. The break point at the intersection of the two lines may have significance as the transition point between the higher sloping initial growth and the lower sloping subsequent growth.
# starting values use lines fitted to 1st ten and last 10 points
fm_1 <- lm(y ~ x, df, subset = 1:10)
fm_2 <- lm(y ~ x, df, subset = seq(to = nrow(df), length = 10))
st <- list(a = coef(fm_1)[[1]], b = coef(fm_1)[[2]],
c = coef(fm_2)[[1]], d = coef(fm_2)[[2]])
fm3 <- nls(y ~ pmin(a + b * x, c + d * x), df, start = st, weights = w)
# point of transition
X <- with(as.list(coef(fm3)), (a - c) / (d - b)); X
## [1] 16.38465
Y <- with(as.list(coef(fm3)), a + b * X); Y
## [1] 0.8262229
plot(y ~ x, df)
lines(fitted(fm3) ~ x, df)

The basic idea is to understand how the selected function performs. Take a function you know (e.g. the logistic) and modify it. Or (even better) go to the literature and see which functions people use in your specific domain. Then create a user-defined model, play with it to understand the parameters, define good start values and then fit it.
Her a quick & dirty example of a user-defined function (with package growthrates). It can surely be made similarly with drc.
library("growthrates")
grow_userdefined <- function (time, parms) {
with(as.list(parms), {
y <- (K * y0)/(y0 + (K - y0) * exp(-mumax * time)) + shift
return(as.matrix(data.frame(time = time, y = y)))
})
}
fit <- fit_growthmodel(FUN=grow_userdefined,
p = c(y0 = -1, K = 1, mumax = 0.1, shift = 1),
time = df$x, y = df$y)
plot(fit)
summary(fit)
It can of course be made better. As we have no exponential start at the onset, one can for example start with a simple saturation function instead of a logistic, e.g. something Monod-like. As said, the preferred way is to use a function related to the application domain.

Related

R:mgcv add colorbar to 2D heatmap of GAM

I'm fitting a gam with mgcv and plot the result with the default plot.gam() function. My model includes a 2D-smoother and I want to plot the result as a heatmap. Is there any way to add a colorbar for the heatmap?
I've previously looked into other GAM potting packages, but none of them provided the necessary visualisation. Please note, this is just a simplification for illustration purposes; the actual model (and reporting needs) is much more complicated
edited: I initially had swapped y and z in my tensor product, updated to reflect the correct version both in the code and the plot
df.gam<-gam(y~te(x,z), data=df, method='REML')
plot(df.gam, scheme=2, hcolors=heat.colors(999, rev =T), rug=F)
sample data:
structure(list(x = c(3, 17, 37, 9, 4, 11, 20.5, 11.5, 16, 17,
18, 15, 13, 29.5, 13.5, 25, 15, 13, 20, 20.5, 17, 11, 11, 5,
16, 13, 3.5, 16, 16, 5, 20.5, 2, 20, 9, 23.5, 18, 3.5, 16, 23,
3, 37, 24, 5, 2, 9, 3, 8, 10.5, 37, 3, 9, 11, 10.5, 9, 5.5, 8,
22, 15.5, 18, 15, 3.5, 4.5, 20, 22, 4, 8, 18, 19, 26, 9, 5, 18,
10.5, 30, 15, 13, 27, 19, 5.5, 18, 11.5, 23.5, 2, 25, 30, 17,
18, 5, 16.5, 9, 2, 2, 23, 21, 15.5, 13, 3, 24, 17, 4.5), z = c(144,
59, 66, 99, 136, 46, 76, 87, 54, 59, 46, 96, 38, 101, 84, 64,
92, 56, 69, 76, 93, 109, 46, 124, 54, 98, 131, 89, 69, 124, 105,
120, 69, 99, 84, 75, 129, 69, 74, 112, 66, 78, 118, 120, 103,
116, 98, 57, 66, 116, 108, 95, 57, 41, 20, 89, 61, 61, 82, 52,
129, 119, 69, 61, 136, 98, 94, 70, 77, 108, 118, 94, 105, 52,
52, 38, 73, 59, 110, 97, 87, 84, 119, 64, 68, 93, 94, 9, 96,
103, 119, 119, 74, 52, 95, 56, 112, 78, 93, 119), y = c(96.535,
113.54, 108.17, 104.755, 94.36, 110.74, 112.83, 110.525, 103.645,
117.875, 105.035, 109.62, 105.24, 119.485, 107.52, 107.925, 107.875,
108.015, 115.455, 114.69, 116.715, 103.725, 110.395, 100.42,
108.79, 110.94, 99.13, 110.935, 112.94, 100.785, 110.035, 102.95,
108.42, 109.385, 119.09, 110.93, 99.885, 109.96, 116.575, 100.91,
114.615, 113.87, 103.08, 101.15, 98.68, 101.825, 105.36, 110.045,
118.575, 108.45, 99.21, 109.19, 107.175, 103.14, 94.855, 108.15,
109.345, 110.935, 112.395, 111.13, 95.185, 100.335, 112.105,
111.595, 100.365, 108.75, 116.695, 110.745, 112.455, 104.92,
102.13, 110.905, 107.365, 113.785, 105.595, 107.65, 114.325,
108.195, 96.72, 112.65, 103.81, 115.93, 101.41, 115.455, 108.58,
118.705, 116.465, 96.89, 108.655, 107.225, 101.79, 102.235, 112.08,
109.455, 111.945, 104.11, 94.775, 110.745, 112.44, 102.525)), row.names = c(NA,
-100L), class = "data.frame")
It would be easier (IMHO) to do this reliably within the ggplot2 ecosphere.
I'll show a canned approach using my {gratia} package but also checkout {mgcViz}. I'll also suggest a more generic solution using tools from {gratia} to extra information about your model's smooths and then plot them yourself using ggplot().
library('mgcv')
library('gratia')
library('ggplot2')
library('dplyr')
# load your snippet of data via df <- structure( .... )
# then fit your model (note you have y as response & in the tensor product
# I assume z is the response below and x and y are coordinates
m <- gam(z ~ te(x, y), data=df, method='REML')
# now visualize the mode using {gratia}
draw(m)
This produces:
{gratia}'s draw() methods can't plot everything yet, but where it doesn't work you should still be able to evaluate the data you need using tools in {gratia}, which you can then plot with ggplot() itself by hand.
To get values for your smooths, i.e. the data behind the plots that plot.gam() or draw() display, use gratia::smooth_estimates()
# dist controls what we do with covariate combinations too far
# from support of the data. 0.1 matches mgcv:::plot.gam behaviour
sm <- smooth_estimates(m, dist = 0.1)
yielding
r$> sm
# A tibble: 10,000 × 7
smooth type by est se x y
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 te(x,y) Tensor NA 35.3 11.5 2 94.4
2 te(x,y) Tensor NA 35.5 11.0 2 94.6
3 te(x,y) Tensor NA 35.7 10.6 2 94.9
4 te(x,y) Tensor NA 35.9 10.3 2 95.1
5 te(x,y) Tensor NA 36.2 9.87 2 95.4
6 te(x,y) Tensor NA 36.4 9.49 2 95.6
7 te(x,y) Tensor NA 36.6 9.13 2 95.9
8 te(x,y) Tensor NA 36.8 8.78 2 96.1
9 te(x,y) Tensor NA 37.0 8.45 2 96.4
10 te(x,y) Tensor NA 37.2 8.13 2 96.6
# … with 9,990 more rows
In the output, x and y are a grid of values over the range of both covariates (the number of points in the grid in each covariate is controlled by n such that the grid for a 2d tensor product smooth is of size n by n). est is the estimated value of the smooth at the values of the covariates and se its standard error. For models with multiple smooths, the smooth variable uses the internal label that {mgcv} gives each smooth - these are the labels used in the output you get from calling summary() on your GAM.
We can add a confidence interval if needed using add_confint().
Now you can plot your smooth(s) by hand using ggplot(). At this point you have two options
if draw() can handle the type of smooth you want to plot, you can use the draw() method for that object and then build upon it, or
plot everything by hand.
Option 1
# evaluate just the smooth you want to plot
smooth_estimates(m, smooth = "te(x,y)", dist = 0.1) %>%
draw() +
geom_point(data = df, alpha = 0.2) # add a point layer for original data
This pretty much gets you what draw() produced when given the model object itself. And you can add to it as if it were a ggplot object (which is not the case of the objects returned by gratia:::draw.gam(), which is wrapped by {patchwork} and needs other ways to interact with the plots).
Option 2
Here you are in full control
sm <- smooth_estimates(m, smooth = "te(x,y)", dist = 0.1)
ggplot(sm, aes(x = x, y = y)) +
geom_raster(aes(fill = est)) +
geom_point(data = df, alpha = 0.2) + # add a point layer for original data
scale_fill_viridis_c(option = "plasma")
which produces
A diverging palette is likely better for this, along the lines of the one gratia:::draw.smooth_estimates uses
sm <- smooth_estimates(m, smooth = "te(x,y)", dist = 0.1)
ggplot(sm, aes(x = x, y = y)) +
geom_raster(aes(fill = est)) +
geom_contour(aes(z = est), colour = "black") +
geom_point(data = df, alpha = 0.2) + # add a point layer for original data
scale_fill_distiller(palette = "RdBu", type = "div") +
expand_limits(fill = c(-1,1) * abs(max(sm[["est"]])))
which produces
Finally, if {gratia} can't handle your model, I'd appreciate you filing a bug report here so that I can work on supporting as many model types as possible. But do try {mgcViz} as well for an alternative approach to visualsing GAMs fitted using {mgcv}.
A base plot solution would be to use fields::image.plot directly. Unfortunately, it require data in a classic wide format, not the long format needed by ggplot.
We can facilitate plotting by grabbing the object returned by plot.gam(), and then do a little manipulation of the object to get what we need for image.plot()
Following on from #Anke's answer then, instead of plotting with plot.gam() then using image.plot() to add the legend, we proceed to use plot.gam() to get what we need to plot, but do everything in image.plot()
plt <- plot(df.gam)
plt <- plt[[1]] # plot.gam returns a list of n elements, one per plot
# extract the `$fit` variable - this is est from smooth_estimates
fit <- plt$fit
# reshape fit (which is a 1 column matrix) to have dimension 40x40
dim(fit) <- c(40,40)
# plot with image.plot
image.plot(x = plt$x, y = plt$y, z = fit, col = heat.colors(999, rev = TRUE))
contour(x = plt$x, y = plt$y, z = fit, add = TRUE)
box()
This produces:
You could also use the fields::plot.surface() function
l <- list(x = plt$x, y = plt$y, z = fit)
plot.surface(l, type = "C", col = heat.colors(999, rev = TRUE))
box()
This produces:
See ?fields::plot.surface for other arguments to modify the contour plot etc.
As shown, these all have the correct range on the colour bar. It would appear that #Anke's version the colour bar mapping is off in all of the plots, but mostly just a little bit so it wasn't as noticeable.
Following Gavin Simpson's answer and this thread (How to add colorbar with perspective plot in R), I think I've come up with a solution that uses plot.gam() (though I really love that {gratia} takes it into a ggplot universe and will definitely look more into that)
require(fields)
df.gam<-gam(y~te(x,z), data=df, method='REML')
sm <- as.data.frame(smooth_estimates(df.gam, dist = 0.1))
plot(df.gam, scheme=2, hcolors=heat.colors(999, rev =T), contour.col='black', rug=F, main='', cex.lab=1.75, cex.axis=1.75)
image.plot(legend.only=T, zlim=range(sm$est), col=heat.colors(999, rev =T), legend.shrink = 0.5, axis.args = list(at =c(-10,-5,0,5, 10, 15, 20)))
I hope I understood correctly that gratia:smooth_estimates() actually pulls out the partial effects.
For my model with multiple terms (and multiple tensor products), this seems to work nicely by indexing the sections of the respective terms in sm. Except for one, where the colorbar and the heatmap aren't quite matching up. I can't provide the actual underlaying data, but add that plot for illustration in case anyone has any idea. I'm using the same approach as outlined above. In the colorbar, dark red is at 15-20, but in the heatmap the isolines just above 0 already correspond with the dark red (while 0 is dark yellow'ish in the colorbar).

SVD function in R. I want to get the singular values $d from a list of datasets. I want to put it in a table form

I want to use the svd function to get the singular values of a large datasets in a list.
When I use the svd function in a single matrix, I am able to use $d and get the values, but for the list I cannot get the output.
Here is the code for a matrix and the output.
tb = matrix(c(64, 112, 59, 174, 111, 37,
39, 135, 115, 92, 161, 70,
93, 119, 50, 142, 20, 114,
149, 191, 62, 17, 145, 21,
60, 37, 29, 74, 42, 242), nrow = 5, ncol = 6, byrow = TRUE)
## Compute SVD of tb
#
my_svd = svd(tb)
## Retrieve (save) singular values
#
sv = my_svd$d
## Compute ratio between "1st/2nd" & "2nd/3rd" singular values
#
ratios = matrix(c(sv[1]/sv[2], sv[2]/sv[3]), nrow = 1)
colnames(ratios) = c("sv1/sv2", "sv2/sv3")
## Print ratios
ratios
How do I apply this to the list of dataset?
my current code
svdresult <- lapply(d1,svd)
svdresult
d1 is my list of dataset
How do I get svdresult$d on the list of datasets.
Thanks in advance
Maybe something like the following?
get_svd_ratios <- function(data) {
sv = svd(data)$d
n = length(sv)
ratios = matrix(sv[1:(n - 1)] / sv[2:n] , nrow = 1)
names = paste(
paste0("sv", 1:(n - 1)),
paste0("sv", 2:n),
sep = "/"
)
colnames(ratios) = names
return(ratios)
}
lapply(list(tb), get_svd_ratios)
# [[1]]
# sv1/sv2 sv2/sv3 sv3/sv4 sv4/sv5
# [1,] 2.261771 1.680403 1.29854 2.682195

geeglm giving error "incompatible types (from language to character) in subassignment type fix"

I have a problem using geeglm form the geepack library. I don't understand what is wrong:
df <- structure(list(ID = c(3, 3, 10, 11, 11, 13),
VO2max = c(3584.2, 2235.7, 4191.6, 2025.1, 3184.6, 2134.9),
HRR120 = c(56, 34, 60, 57, 30, 67),
Age = c(16, 24, 29, 11, 27, 34),
Weight = c(63.8, 72, 81, 41, 108, 55),
Height = c(176, 167, 178, 150, 176, 156 )),
class = c("data.frame"))
library(geepack)
geeglm(formula = "VO2max ~ HRR120 ",family = "gaussian",corstr = "independence",id = ID,data = df)
Error in mcall$formula[3] <- switch(match(length(sformula), c(0, 2, 3)), :
incompatible types (from language to character) in subassignment type fix
I tried
df$ID <- as.character(df$ID)
df$ID <- as.factor(df$ID)
without any luck. Does anyone have an explanation ?
Just unquote formula and everything works:
geeglm(formula = VO2max ~ HRR120, family = "gaussian",corstr = "independence",id = ID,data = df)
Call:
geeglm(formula = VO2max ~ HRR120, family = "gaussian",
data = df, id = ID, corstr = "independence")
Coefficients:
(Intercept) HRR120
2764.311798 2.533649
Degrees of Freedom: 6 Total (i.e. Null); 4 Residual
Scale Link: identity
Estimated Scale Parameters: [1] 666987
Correlation: Structure = independence
Number of clusters: 4 Maximum cluster size: 2

Quick way to print the regression line slope of two vectors

Lets say I have the following two vectors:
years <- seq(1991, 2000, by = 1)
height <- c(30, 34, 40, 45, 66, 70, 81, NA, 90, 110)
I now want to perform a simple linear regression:
lm(formula = height ~ years)
All I need is the slope of my regression line to get a trend-value.
I there any quick way or a function to give my the regression line slope of two vectors?
You'd do:
model <- lm(formula = height ~ years)
model$coefficients[2]
Output:
years
8.857353
A somewhat less readable way would also be doing directly:
lm(formula = height ~ years)$coefficients[2]
Maybe this too:
years <- seq(1991, 2000, by = 1)
height <- c(30, 34, 40, 45, 66, 70, 81, NA, 90, 110)
df1<-data.frame(Yr=years,Ht=height)
lmfun<-function(df,yname,xname){
f<-as.formula(paste0(yname,"~",xname))
lm.fit<-do.call("lm",list(data=quote(df),f))
coef(lm.fit)
}
lmfun(df1,yname="Ht","Yr")[2] #or [1] depending on the need
Output:
Yr
8.857353

Given points on a circle, find the minimum number of points which can be separated with a line through the center

Assume you are given a vector of degrees which represent points on the unit circle. How could you formally check to see what the minimum number of points you could isolate in one semicircle with a diameter? I understand there may be multiple diameters which satisfy this property for a given set of data points. That is okay. I am solely interested in the minimum number of points which can be isolated, not the diameter in particular. It also needs to be computationally efficient, so it works for a large number of points. I have written the following based on #d.b suggestion, but the algorithm fails for tst4.
In R,
# Plots the points on a circle and attempts to find the minimum m (algorithm incorrect for tst )
min_dia <- function(degs, plot = T){
library(dplyr)
plot_circle <- function(x, y, r) {
angles <- seq(0, 2*pi,length.out = 360)
lines(r*cos(angles) + x, r*sin(angles) + y)
}
deg <- degs
plot_boo <- plot
# #d.b suggestion method for finding m
temp <- abs((deg - min(deg) + 180) %% 360 - 180)
m <- min(table(cut(temp, breaks = c(-180, 90, 180))))
if(plot_boo == T){
tm_deg <- c(0, 30, 45, 60, 90, 120, 135, 150, 180, 210, 225, 240, 270, 300, 315, 330)
tm_rad <- (tm_deg * pi) / 180
th <- (deg*pi)/180
r <- 1
x <- r*cos(th)
y <- r*sin(th)
windows.options(width = 600, height = 600)
plot(x, y, xlim = c(-1.1, 1.1), ylim = c(-1.1, 1.1), pch = 20, xlab = "", ylab = "", main = "Plot of Given Data Points by Degrees")
plot_circle(0, 0, 1)
points(0, 0)
text(r*cos(tm_rad), r*sin(tm_rad), labels = paste0(tm_deg), cex= 0.5, pos = 3)
}
return(m)
}
# Function to plot diameter by degrees
plot_dia <- function(deg){
deg1 <- deg
deg2 <- deg + 180
th1 <- (deg1*pi)/180
th2 <- (deg2*pi)/180
x1 <- cos(th1)
y1 <- sin(th1)
x2 <- cos(th2)
y2 <- sin(th2)
lines(c(x1, x2), c(y1, y2))
}
# Testing
tst1 <- c(15, 45, 20) # m = 0
tst2 <- c(15, 45, 200) # m = 1
tst3 <- c(15, 46, 114, 137, 165, 187, 195, 215, 271, 328) # m = 3
tst4 <- c(36, 304, 281, 254, 177, 59, 47, 158, 244, 149, 317, 235, 345, 209, 204,
156, 325, 95, 215, 267)
# Implementation
min_dia(tst1)
plot_dia(90) # eyeball and plot to check
min_dia(tst2)
plot_dia(190) # eyeball and plot to check
min_dia(tst3)
plot_dia(110) # eyeball and plot to check
min_dia(tst4)
plot_dia(150) # m is probably 2
For the three points I have provided in the code with degrees 15, 45, and 225, the minimum number of points (say m) I could separate with a line would be 1.
For the points with degrees 15, 20, 25, the answer would obviously be 0.
Any help or guidance on an efficient algorithm to solve this minimization problem would be greatly appreciated.
Update:
Here is the plot if you were to run through the R code along with an example of a line which illustrates the minimum number of points you could separate, being 1.
Update:
I have also updated the code above which allows one to plot the data points, speculate a diameter which minimizes m, and the plot the diameter by degree.
If points are not sorted, then sort them by angle.
Walk through the list using two-pointer approach. Increment right index if angle difference is <180, increment left index if angle difference is >180. Minimum from (right-left, length-right+left) is your desired value.
Note that scanning should be performed in cyclic manner (you may add copy of the list with +360 addition like this 15, 45, 225, 375, 585)
Here is a brute force method. Just draw a line at all angle (0.5:359.5) and see what angle gives the least value.
bar = function(degs){
CUTS = sapply(0:359 + 0.5, function(D){
temp = ((degs - D + 180) %% 360 - 180)
min(table(cut(temp, breaks = c(-180, 0, 180))))
})
D = (0:359 + 0.5)[which.min(CUTS)]
m = min(CUTS)
plot(0, 0, type = "n",
xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5),
ann = FALSE, axes = FALSE, asp = 1)
plotrix::draw.circle(0, 0, 1)
degs = degs * pi/180
xs = cos(degs)
ys = sin(degs)
x1 = cos(D * pi/180)
y1 = sin(D * pi/180)
x2 = cos((D * pi/180) + pi)
y2 = sin((D * pi/180) + pi)
lines(c(x1, x2), c(y1, y2))
points(xs, ys, pch = 19)
points(0, 0)
return(c(min_num = m, angle = D))
}
tst4 <- c(36, 304, 281, 254, 177, 59, 47, 158, 244, 149, 317, 235,
345, 209, 204, 156, 325, 95, 215, 267)
bar(degs = tst4)
# min_num angle
# 5.0 145.5

Resources