LOESS smoothing - geom_smooth vs() loess() - r

I have some data which I would like to fit with a model. For this example we have been using LOESS smoothing (<1.000 observations). We applied LOESS smoothing using the geom_smooth() function from the ggplot package. So far, so good.
The next step was to acquire a first derivative of the smoothed curve, and as far as we know this is not possible to extract from geom_smooth(). Thus, we sought to manually create our model using loess() and use this to extract our first derivative from this.
Strangely however, we observed that the plotted geom_smooth() curve is different from the manually constructed loess() curve. This can be observed in the figure which is shown underneath; in red the geom_smooth() and in orange the loess() function.
If somebody would be interested, a minimal working reproducible example can be found here.
Would somebody be able to pinpoint why the curves are different? Is this because of the optimization settings of both curves? In order to acquire a meaningful derivative we need to ensure that these curves are identical.

Related

How to produce this figure of thin-plate splines fit to observed data with contours plotted?

How does one reproduce this figure from Elements of Statistical Learning page 166?
I understand that a regression is fit to the data using age and obesity as features. But I am wondering how to represent the fitted surface using contours as is shown in the figure. I would prefer an R implementation because I believe that is what was used in this case.

How to plot different ROC curves with different symbols on the line using ROCR package?

I am trying to plot average ROC curves from different models using ROCR package.
I actually made it work, with each curve in different colors. However, in a black and white printing, I need to plot different curves with different symbols, rather than colors. I tried using type="o" and pch options in plot. However, I guess because the ROCR performance creates so many points for plotting an accurate roc curve, the curves just look like a very thick solid lines - you cannot tell which symbol used for each curve.
And here is the code that I used:
pred_our_update<-prediction(prob_our_update,label)
perf_our_update<-performance(pred_our_update,"tpr","fpr")
plot(perf_our_update,avg="vertical",spread.estimate="stderror",type="o", pch=1,add=TRUE)
Anyone know how to resolve this?
One easy solution is using the downsampling option to cut down the amount of data actually plotted, which may let the symbols stand out more without making any material difference to the shape of the curves. I don't know your data set size, but perhaps start with:
plot(perf_our_update,avg="vertical",spread.estimate="stderror",downsampling=0.1,type="o", pch=1,add=TRUE)

Why does my linear regression fit line look wrong?

I have plotted a 2-D histogram in a way that I can add to the plot with lines, points etc.
Now I seek to apply a linear regression fit at the region of dense points, however my linear regression line seems totally off where it should be?
To demonstrate here is my plot on the left with both a lowess regression fit and linear fit.
lines(lowess(na.omit(a),na.omit(b),iter=10),col='gray',lwd=3)
abline(lm(b[cc]~a[cc]),lwd=3)
Here a and b are my values and cc are the points within the densest parts (i.e. most points lay there), red+yellow+blue.
Why doesn't my regression line look more like that on the right (hand-drawn fit)?
If I was plotting a line of best fit it would be there?
I have numerous plots similar to this but still I get the same results....
Are there any alternative linear regression fits that could prove to be better for me?
A linear regression is a method to fit a linear function to a set of points (observations) minimizing the least-squares error.
Now imagine your heatmap indicating a shape where you would assume a vertical line fitting best. Just turn your heatmap 10 degrees counter clock-wise and you have it.
Now how would a linear function supposed to be defined which is vertical? Exactly, it is not possible.
The result of this little thought experiment is that you confuse the purpose of linear regression and what you most likely want is - as indicated already by Gavin Simpson - the 1st principal component vector.

plotting cox proportional hazard model in R

I'm trying to plot a cox proportional hazard model in R. (or a logit model)
I used the following code (which I copied from https://sites.google.com/site/daishizuka/toolkits/plotting-logistic-regression-in-r)
c<-coxph(formula=Surv(year, promo)~prov.yrs, data=cul)
curve(predict(c, data.frame(prov.yrs=x), type="risk"), add=TRUE)
I get the error message
Error in plot.xy(xy.coords(x, y), type = type, ...) :
invalid graphics state
I believe there is something wrong with plotting this, so I was wondering if there is a way to plot this. I get the same error message when I use glm. Any help will be appreciated!!
The example you copied from shows a logistic regression, but you are fitting a coxph model, they are very different in how they are handled.
If you just want a plot of the the hazard ratio then your code will basically work (except you are adding to a plot that is not there, which may be what generates the error, try changing add to FALSE).
If you want to plot the survival curve(s) then use the survfit function to get the predicted survival information and plot that.
The error message suggests you did not have a device open or perhaps there was some other problem with the plot you were trying to add to? That code produces a plot over a range input [0,1] with a toy example I built from the coxph help page. Perhaps your range for the 'prov.yrs' is different than an existing plot, or there is no device open? Try plot.new(), plot whatever else you were going to use, and then rerun? (The add=TRUE will suppress plotting of the box, axes and labels.)

How to implement histfit in r?

There is histfit function in Matlab would plot histogram and fit the distribution by bin values.
The distribution's parameters have to be estimated.
How to implement histfit in r? I searched for a long time, but it has no lucky.
This post have mentioned this before, but there is no preferable solution. The sn package seems support several distribution, not so much.
I explore the data with hist function, the histogram shows gamma distribution in gerneral.
But if I add up bins and show it again, the graph will show more details, and gamma distribution fails.
fitdistr would fail to find parameters also.
so I want to fit the data just using the coarse data from histogram. This is the question, thank you for your help.
The fitdistr function in the MASS package can be used to find parameters for a given distribution (including gamma). The function density and the logspline package (and others) can be used to estimate the density function of the data without assuming a specific distribution.
The lines and curve functions can be used to add an estimated density curve to a plotted histogram (use prob=TRUE when creating the histogram).
If you want to compare your data to a specific distribution then tools like qqplots (qqplot function or others) or visual tests (vis.test in the TeachingDemos package) will probably be better than a histogram and density plot.
I have to answer it myself, package 'bda' could fit the binned data in several distributions, however it could only binning data by rounding.

Resources