Consider I have data like this:
df<-data.frame(x=c(1100,800,600,550,500,350),y=c(0.05,0.17,0.91,0.95,1,0.13))
how can I fit a curve through it based on a log normal shape/distribution
I can use a nls model but get an error always:
fit <-nls(y ~ a*dlnorm(x, mean, sd), data = df,
start = list(mean =0, sd = 10,a=1e4))
Thanks a lot!
I'm not sure why nls behaves like that, but you may directly use optim:
opt <- optim(c(1, 1, 1), function(p) sum((dlnorm(df$x, p[1], p[2]) * p[3] - df$y)^2))
opt$par
# [1] 6.3280753 0.2150322 299.3154123
plot(x = df$x, y = df$y, type = 'b', ylim = c(0, 1), xlim = c(0, 1100))
curve(opt$par[3] * dlnorm(x, opt$par[1], opt$par[2]), from = 0, to = 1100, add = TRUE, col = 'red')
Related
Find the maximum likelihood estimate of p in a binomial distribution characterized by 9
successes in 20 trials. Show graphically that this is the maximum. Include your R code with
your answers.
This is a prompt I've been given for a homework assignment but the teacher never explained how to do it. I understand the basic concept, but I'm not sure how to find the estimate of p in R, or how to graph it. Any advice/help would be greatly appreciated!
The exercise is not that difficult.
Write down two functions, the textbook likelihood and log-likelihood;
Use optim to maximize them;
Below the tol argument is smaller than the default. Since both functions are continuous, finite and convex their maxima are guaranteed to exist and a smaller tol will find better maxima values;
Finally, plot the functions and the maxima found earlier.
ll <- function(x, n, k) choose(n, k) * x^k * (1 - x)^(n - k)
log_ll <- function(x, n, k) lchoose(n, k) + k*log(x) + (n - k)*log(1 - x)
xmax <- optimize(ll, c(0, 1), n = 20, k = 9, maximum = TRUE, tol = .Machine$double.eps^0.5)
xmax$maximum
#> [1] 0.45
xmax_log <- optimize(log_ll, c(0, 1), n = 20, k = 9, maximum = TRUE, tol = .Machine$double.eps^0.5)
xmax_log$maximum
#> [1] 0.45
# save the default graphics parameters
old_par <- par(mfrow = c(2, 1), mai = c(0.8, 1, 0.1, 1))
#
curve(ll(x, n = 20, k = 9), from = 0, to = 1, xlab = "")
segments(x0 = xmax$maximum, y0 = -1, y1 = xmax$objective, lty = "dashed")
segments(x0 = -1, x1 = xmax$maximum, y0 = xmax$objective, lty = "dashed")
points(xmax$maximum, xmax$objective, col = "red", pch = 16)
#
curve(log_ll(x, n = 20, k = 9), from = 0, to = 1, xlab = "Binomial proportion")
segments(x0 = xmax_log$maximum, y0 = -50, y1 = xmax_log$objective, lty = "dashed")
segments(x0 = -1, x1 = xmax_log$maximum, y0 = xmax_log$objective, lty = "dashed")
points(xmax_log$maximum, xmax_log$objective, col = "red", pch = 16)
#
# restore the default graphics parameters
par(old_par)
Created on 2022-10-11 with reprex v2.0.2
I have a data with continuous independent variable and binary dependent. Therefore I was trying to apply logistic regression for the analysis of this data. However in contrast to the classical case with S-shaped transition, I have a two transitions.
Here is an example of what I mean
library(ggplot)
library(visreg)
classic.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 14), 1, 0, rep(1, times = 14)))
model.classic = glm(formula = y ~ x,
data = classic.data,
family = "binomial")
summary(model.classic)
visreg(model.classic,
partial = FALSE,
scale = "response",
alpha = 0)
my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))
model.my = glm(formula = y ~ x,
data = my.data,
family = "binomial")
summary(model.my)
visreg(model.my,
partial = FALSE,
scale = "response",
alpha = 0)
The blue lines on both plots - it is outcome of glm, while red line it what I want to have.
Is there any way to apply logistic regression to such data? Or should I apply some other type of regression analysis?
In your second model, y is not a linear function of x. When you write y ~ x you assume that when x increases, y will increase/decrease depending on a positive/negative coefficient. That is not the case, it's increasing and then decreasing, making the average effect of x zero (hence the strait line). You therefore need a non-linear function. You could do that with a gam from the mgcv package, where the effect of x is modelled as a smooth function:
library(mgcv)
my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))
m = gam(y ~ s(x), data = my.data, family = binomial)
plot(m)
That would lead to the following fit on the original scale:
my.data$prediction = predict(m, type = "response")
plot(my.data$x, my.data$y)
lines(my.data$x, my.data$prediction, col = "red")
Im a student of statistics and i would like kindly request for some assistance. I would like to plot predicted values together with actual values of the course of 100 days in my dataset:
Sample Data:
set.seet(1010)
count<-rpois(100, lambda = 5)
mood<- rbinom(100, size = 1, prob = .7)
temp<-rnorm(100, mean = 20, sd = 5)
wind<-rbinom(100, size = 3, prob = .7)
days<-seq(1,100,by=1)
df<-data.frame(count,mood,temp,wind,days)
Plotting actual values during 100 days:
plot(count~days,type="l")
Regression:
poisson <- glm(count ~mood+wind+temp)
Condition on my predictors and otaining predicted values:
hyp<- c(1,1,3,20)
coeff.p1 <- poisson$coefficients
XB <- hyp%*%coeff.p1
predv.y <- exp(XB)
predv.y
May be there is a way to predict values for all observations as for example:
coeff.p1 <- poisson$coefficients
XB <- c(,2:4)%*%t(coeff.p1)
I intend to multiply with columns 2:4 by always get
Error in c(, 2:4) : argument 1 is empty
Here Im stuck. As a result I would like to obtain predicted values and actuall values for 100 day on one plot.
Thank you
Your object poisson is of class glm, so it has a predict method when given any data.
poisson <- glm(count ~mood+temp+ wind)
df$pred<-predict(poisson,df[,2:4])
plot(df$days,df$count)
lines(df$days, df$pred,type='l',col='blue')
Try this:
set.seed(1010)
count<-rpois(100, lambda = 5)
mood<- rbinom(100, size = 1, prob = .7)
temp<-rnorm(100, mean = 20, sd = 5)
wind<-rbinom(100, size = 3, prob = .7)
days<-seq(1,100,by=1)
df<-data.frame(count,mood,temp,wind,days)
poisson <- glm(count ~ mood+wind+temp
, family = poisson() #specify your model type
, data=df)
# Calculate the predicted
phat.poisson <- predprob(poisson) # for each subj, prob of observing each value
phat.poisson.mn <- apply(phat.poisson, 2, mean) # mean predicted probs
#your plot of observed vs. predicted
hist(count, prob = TRUE, col = "grey60", breaks=seq(-0.5, 12.5, 1), xlab = "Counts",main = NULL, ylim=c(0, .20))
lines(x = seq(0, 12, 1), y = phat.poisson.mn, lty=2, lwd=2, col="red")
points(x = seq(0, 12, 1), y = phat.poisson.mn, cex=1, pch=16, col="red")
I feel that I am close to finding the answer for my problem, but somehow I just cannot manage to do it. I have used nls function to fit 3 parameters using a rather complicated function describing fertilization success of eggs (y-axis) in a range of sperm concentrations (x-axis) (Styan's model [1], [2]). Fitting the parameters works fine, but I cannot manage to plot a smoothed extrapolated curve using predict function (see at the end of this post). I guess it is because I have used a value that was not fitted on x-axis. My question is how to plot a smoothed and extrapolated curve based on a model fitted with nls function
using non-fitted parameter on x-axis?
Here is an example:
library(ggplot2)
data.nls <- structure(list(S0 = c(0.23298, 2.32984, 23.2984, 232.98399, 2329.83993,
23298.39926), fert = c(0.111111111111111, 0.386792452830189,
0.158415841584158, 0.898648648648649, 0.616, 0.186440677966102
), speed = c(0.035161615379406, 0.035161615379406, 0.035161615379406,
0.035161615379406, 0.035161615379406, 0.035161615379406), E0 = c(6.86219803476946,
6.86219803476946, 6.86219803476946, 6.86219803476946, 6.86219803476946,
7.05624476582978), tau = c(1800, 1800, 1800, 1800, 1800, 1800
), B0 = c(0.000102758645352932, 0.000102758645352932, 0.000102758645352932,
0.000102758645352932, 0.000102758645352932, 0.000102758645352932
)), .Names = c("S0", "fert", "speed", "E0", "tau", "B0"), row.names = c(NA,
6L), class = "data.frame")
## Model S
modelS <- function(Fe, tb, Be) with (data.nls,{
x <- Fe*(S0/E0)*(1-exp(-B0*E0*tau))
b <- Fe*(S0/E0)*(1-exp(-B0*E0*tb))
x*exp(-x)+Be*(1-exp(-x)-(x*exp(-x)))*exp(-b)})
## Define starting values
start <- list(Fe = 0.2, tb = 0.1, Be = 0.1)
## Fit the model using nls
modelS.fitted <- nls(formula = fert ~ modelS(Fe, tb, Be), data = data.nls, start = start,
control=nls.control(warnOnly=TRUE,minFactor=1e-5),trace = T, lower = c(0,0,0),
upper = c(1, Inf, 1), algorithm = "port")
## Combine model parameters
model.data <- cbind(data.nls, data.frame(pred = predict(modelS.fitted)))
## Plot
ggplot(model.data) +
geom_point(aes(x = S0, y = fert), size = 2) +
geom_line(aes(x = S0, y = pred), lwd = 1.3) +
scale_x_log10()
I have tried following joran's example here, but it has no effect, maybe because I did not fit S0:
r <- range(model.data$S0)
S0.ext <- seq(r[1],r[2],length.out = 200)
predict(modelS.fitted, newdata = list(S0 = S0.ext))
# [1] 0.002871585 0.028289057 0.244399948 0.806316161 0.705116868 0.147974213
You function should have the parameters (S0,E0,B0,tau,Fe,tb,Be). nls will look for the parameters in the data.frame passed to its data argument and only try to fit those it doesn't find there (provided that starting values are given). No need for this funny with business in your function. (with shouldn't be used inside functions anyway. It's meant for interactive usage.) In predict newdata must contain all variables, that is S0,E0,B0, and tau.
Try this:
modelS <- function(S0,E0,B0,tau,Fe, tb, Be) {
x <- Fe*(S0/E0)*(1-exp(-B0*E0*tau))
b <- Fe*(S0/E0)*(1-exp(-B0*E0*tb))
x*exp(-x)+Be*(1-exp(-x)-(x*exp(-x)))*exp(-b)}
## Define starting values
start <- list(Fe = 0.2, tb = 0.1, Be = 0.1)
## Fit the model using nls
modelS.fitted <- nls(formula = fert ~ modelS(S0,E0,B0,tau,Fe, tb, Be), data = data.nls, start = start,
control=nls.control(warnOnly=TRUE,minFactor=1e-5),trace = T, lower = c(0,0,0),
upper = c(1, Inf, 1), algorithm = "port")
## Combine model parameters
model.data <- data.frame(
S0=seq(min(data.nls$S0),max(data.nls$S0),length.out=1e5),
E0=seq(min(data.nls$E0),max(data.nls$E0),length.out=1e5),
B0=seq(min(data.nls$B0),max(data.nls$B0),length.out=1e5),
tau=seq(min(data.nls$tau),max(data.nls$tau),length.out=1e5))
model.data$pred <- predict(modelS.fitted,newdata=model.data)
## Plot
ggplot(data.nls) +
geom_point(aes(x = S0, y = fert), size = 2) +
geom_line(data=model.data,aes(x = S0, y = pred), lwd = 1.3) +
scale_x_log10()
Obviously, this might not be what you want, since the function has multiple variables and more than one vary in new.data. Normally one would only vary one and keep the others constant for such a plot.
So this might be more appropriate:
S0 <- seq(min(data.nls$S0),max(data.nls$S0),length.out=1e4)
E0 <- seq(1,20,length.out=20)
B0 <- unique(data.nls$B0)
tau <- unique(data.nls$tau)
model.data <- expand.grid(S0,E0,B0,tau)
names(model.data) <- c("S0","E0","B0","tau")
model.data$pred <- predict(modelS.fitted,newdata=model.data)
## Plot
ggplot(model.data) +
geom_line(data=,aes(x = S0, y = pred, color=interaction(E0,B0,tau)), lwd = 1.3) +
geom_point(data=data.nls,aes(x = S0, y = fert), size = 2) +
scale_x_log10()
In R, how to use ellipses to represent error bars (standard deviation) for x and y variables if only summary data, i.e. mean and SD for different data sets, are available. Any feedback is appreciated.
You can write your own function like this one:
draw_ellipse = function (mean_x, mean_y, sd_x, sd_y)
{
ellipse <- function (x) { sin(acos(x)) }
t = seq(-1, 1, length.out = 100)
el_y = sd_y*ellipse(t)
newx = mean_x + sd_x * t
polygon(c(newx, rev(newx)), c(mean_y + el_y, rev(mean_y - el_y)), col = "grey", border = NA)
}
You can use it very easily using apply():
x = runif(10)
y = runif(10)
sd_x = abs(rnorm(10, 0.1, 0.02))
sd_y = abs(rnorm(10, 0.05, 0.01))
plot(x, y)
df = data.frame(x, y, sd_x, sd_y)
apply(df, 1, function (x) { draw_ellipse(x[1], x[2], x[3], x[4]) })
points(x, y, pch = 3)
Solution for plotting ellipses with different colors:
draw_ellipse = function (mean_x, mean_y, sd_x, sd_y, colidx)
{
ellipse <- function (x) { sin(acos(x)) }
t = seq(-1, 1, length.out = 100)
el_y = sd_y*ellipse(t)
newx = mean_x + sd_x * t
polygon(c(newx, rev(newx)), c(mean_y + el_y, rev(mean_y - el_y)), col = as.character(colors[colidx]), border = NA)
}
x = runif(10)
y = runif(10)
sd_x = abs(rnorm(10, 0.1, 0.02))
sd_y = abs(rnorm(10, 0.05, 0.01))
plot(x, y)
colors = rainbow(length(x))
df = data.frame(x, y, sd_x, sd_y, colidx = 1:length(x))
apply(df, 1, function (x) { draw_ellipse(x[1], x[2], x[3], x["sd_y"], x["colidx"]) })
points(x, y, pch = 3)
You might like the function car::ellipse , i.e., the ellipse() function in the car package.
The ellipse function in the ellipse package will take summary information (including correlation) and provide the ellipse representing the confidence region.