R nonlinear regression of cumulative X and Y data - r

I'm trying to figure how to make a nonlinear regression of some cumulative data of X and Y values. The dataset is based on cumulative items and their respective cumulated demand. I have a plot that looks like this
based on the following observation of 5299 items, which is available here: abc.csv datafile
and I would like to fit a model that can explain it quite neatly. Given the plot, I reckon that there is a high degree of detail. Hence, I would believe that it would be possible to find a function that would explain the data with very high accuracy.
The problem is, however, that I find myself trying to fit a model with nls() by trial and error. Furthermore, some of the functions that I've tried give me some explanation, but not in full detail. For instance
nlm <- nls(abc$Cumfreq ~c*(1-exp(-a*abc$noe))+b, data=abc,
start = list(a=4.14, b=0.21, c=0.79))
Yields me:
My question is: how do I obtain a regression with a better fit? Is there a function in R or another way of achieving this? (fingers crossed for a math genius out there)

Related

Extracting linear term from a polynomial predictor in a GLM

I am relatively new to both R and Stack overflow so please bear with me. I am currently using GLMs to model ecological count data under a negative binomial distribution in brms. Here is my general model structure, which I have chosen based on fit, convergence, low LOOIC when compared to other models, etc:
My goal is to characterize population trends of study organisms over the study period. I have created marginal effects plots by using the model to predict on a new dataset where all covariates are constant except year (shaded areas are 80% and 95% credible intervals for posterior predicted means):
I am now hoping to extract trend magnitudes that I can report and compare across species (i.e. say a certain species declined or increased by x% (+/- y%) per year). Because I use poly() in the model, my understanding is that R uses orthogonal polynomials, and the resulting polynomial coefficients are not easily interpretable. I have tried generating raw polynomials (setting raw=TRUE in poly()), which I thought would produce the same fit and have directly interpretable coefficients. However, the resulting models don't really run (after 5 hours neither chain gets through even a single iteration, whereas the same model with raw=FALSE only takes a few minutes to run). Very simplified versions of the model (e.g. count ~ poly(year, 2, raw=TRUE)) do run, but take several orders of magnitude longer than setting raw=FALSE, and the resulting model also predicts different counts than the model with orthogonal polynomials. My questions are (1) what is going on here? and (2) more broadly, how can I feasibly extract the linear term of the quartic polynomial describing response to year, or otherwise get at a value corresponding to population trend?
I feel like this should be relatively simple and I apologize if I'm overlooking something obvious. Please let me know if there is further code that I should share for more clarity–I didn't want to make the initial post crazy long, but happy to show specific predictions from different models or anything else. Thank you for any help.

Type of regression for large dataset, nonlinear, skewed in R

I'm researching moth biomass in different biotopes, and I want to find a model that estimates the biomass. I have measured the length and width of the forewing, abdomen and thorax of 37088 specimens, and I have weighed them individually (dried).
First, I wanted to a simple linear regression of each variable on the biomass. The problem is, none of the assumptions are met. The data is not linear, biomass (and some variables) don't follow a normal distribution, there is heteroskedasticity, and a lot of outliers. Now I have tried to transform my data using log, x^2, 1/x, and boxcox, but none of them actually helped. I have also tried Thiel-Sen regression (not possible because of too much data) and Siegel regression (biomass is not a vector). Is there some other form of non-parametric or median-based regression that I can try? Because I am really out of ideas.
Here is a frequency histogram for biomass:
Frequency histogram dry biomass
So what I actually want to do is to build a model that accurately estimates the dry biomass, based on the measurements I performed. I have a power function (Rogers et al.) that is general for all insects, but there is a significant difference between this estimate and what I actually weighed. Therefore, I just want to build to build a model with all significant variables. I am not very familiar with power functions, but maybe it is possible to build one myself? Can anyone recommend a method? Thanks in advance.
To fit a power function, you could perhaps try nlsLM from the minpack.lm package
library(minpack.lm)
m <- nlsLM( y ~ a*x^b, data=your.data.here )
Then see if it performs satisfactory.

Probit Regression in R

My question might seem very basic but since am new to R, I am struggling with it. Shall be grateful if I could get some insight...
I need to conduct a probit regression with respect to temperature and people's preference (desire warmer or desire cooler). The preference is therefore my dependent binary variable. I basically desire a plot in which on the y axis there is percentage of respondents desiring warmer/cooler and the x axis has temperature. The plot must therefore have two probit curves- one representing desire warmer and the other curve indicating desire cooler. I am interested at the intersection of the two curves.
Shall be grateful if someone could help me in the R code for the same.

Survival Curves For Cox PH Models. Checking My Understanding About Plotting Them

Im using the book Applied Survival Analysis Using R by Moore to try and model some time-to-event data. The issue I'm running into is plotting the estimated survival curves from the cox model. Because of this I'm wondering if my understanding of the model is wrong or not. My data is simple: a time column t, an event indicator column (1 for event 0 for censor) i, and a predictor column with 6 factor levels p.
I believe I can plot estimated surival curves for a cox model as follows below. But I don't understand how to use survfit and baseplot, nor functions from survminer to achieve the same end. Here is some generic code for clarifying my question. I'll use the pharmcoSmoking data set to demonstrate my issue.
library(survival)
library(asaur)
t<-pharmacoSmoking$longestNoSmoke
i<-pharmacoSmoking$relapse
p<-pharmacoSmoking$levelSmoking
data<-as.data.frame(cbind(t,i,p))
model <- coxph(Surv(data$t, data$i) ~ p, data=data)
As I understand it, with the following code snippets, modeled after book examples, a baseline (cumulative) hazard at my reference factor level for p may be given from
base<-basehaz(model, centered=F)
An estimate of the survival curve is given by
s<-exp(-base$hazard)
t<-base$time
plot(s~t, typ = "l")
The survival curve associated with a different factor level may then be given by
beta_n<-model$coefficients #only one coef in this case
s_n <- s^(exp(beta_n))
lines(s_n~t)
where beta_n is the coefficient for the nth factor level from the cox model. The code above gives what I think are estimated survival curves for heavy vs light smokers in the pharmcoSmokers dataset.
Since thats a bit of code I was looking to packages for a one-liner solution, I had a hard time with the documentation for Survival ( there weren't many examples in the docs) and also tried survminer. For the latter I've tried:
library(survminer)
ggadjustedcurves(model, variable ="p" , data=data)
This gives me something different than my prior code, although it is similar. Is the method I used earlier incorrect? Or is there a different methodology that accounts for the difference? The survminer code doesn't work from my data (I get a 'can't allocated vector of size yada yada error, and my data is ~1m rows) which seems weird considering I can make plots using what I did before no problem. This is the primary reason I am wondering if I am understanding how to plot survival curves for my model.

Plotting backtransformed data with LS means plot

I have used the package lsmeans in R to get the average estimate for all observations for my treatment factor (across the levels of a block factor in the experimental design that has been included with systematic effect because it only had 3 levels). I have used a sqrt transformation for my response variable.
Thus I have used the following commands in R.
First defining model
model<-sqrt(response)~treatment+block
Then applying lsmeans
model_lsmeans<-lsmeans(model,~treatment)
Then plotting this
plot(model_lsmeans,ylab="treatment", xlab="response(with 95% CI)")
This gives a very nice graph with estimates and 95% confidense intervals for the different treatment.
The problems is just that this graph is for the transformed response.
How do I get this same plot with the backtransformed response (so the squared response)?
I have tried to create a new data frame and extract the lsmean, lower.CL, and upper.CL:
a<-summary(model_lsmeans)
New_dataframe<-as.data.frame(a[c("treatment","lsmean","lower.CL","upper.CL")])
And then make these squared
New_dataframe$lsmean<-New_dataframe$lsmean^2
New_dataframe$lower.CL<-New_dataframe$lower.CL^2
New_dataframe$upper.CL<-New_dataframe$upper.CL^2
New_dataframe
This gives me the estimates and CI boundaries squared that I need.
The problem is that I cannot make the same graph for thise estimates and CI as the one that I did in LS means above.
How can I do this? The reason that I ask is that I want to have graphs that are all of a similar style for my article. Since I very much like this LSmeans plot, and it is very convenient for me to use on the non-transformed response variables, I would like to have all my graphs in this style.
Thank you very much for your help! Hope everything is clear!
Kind regards
Ditlev

Resources