r fit logistic curve through three points - r

I have three points (1,4) (3,6) (2,5),the (2,5)is the midpoints .how to fit a particular kind of logistic curves like the following figure.

I fit a modified logistic with two parameters, "y = a / (1.0 + b*exp(-X))" with parameters a = 6.5094194977264266E+00 and b = 1.7053273551626427E+00 - see attached graph. I recommend not using this as it is a perfect fit with no errors, two parameters and two data points.

Related

Using ROC curve to find optimum cutoff for my weighted binary logistic regression (glm) in R

I have build a binary logistic regression for churn prediction in Rstudio. Due to the unbalanced data used for this model, I also included weights. Then I tried to find the optimum cutoff by try and error, however To complete my research I have to incorporate ROC curves to find the optimum cutoff. Below I provided the script I used to build the model (fit2). The weight is stored in 'W'. This states that the costs of wrongly identifying a churner is 14 times as large as the costs of wrongly identifying a non-churner.
#CH1 logistic regression
library(caret)
W = 14
lvl = levels(trainingset$CH1)
print(lvl)
#if positive we give it the defined weight, otherwise set it to 1
fit_wts = ifelse(trainingset$CH1==lvl[2],W,1)
fit2 = glm(CH1 ~ RET + ORD + LVB + REVA + OPEN + REV2KF + CAL + PSIZEF + COM_P_C + PEN + SHOP, data = trainingset, weight=fit_wts, family=binomial(link='logit'))
# we test it on the test set
predlog1 = ifelse(predict(fit2,testset,type="response")>0.5,lvl[2],lvl[1])
predlog1 = factor(predlog1,levels=lvl)
predlog1
confusionMatrix(pred,testset$CH1,positive=lvl[2])
For this research I have also build ROC curves for decision trees using the pROC package. However, of course the same script does not work the same for a logistic regression. I have created a ROC curve for the logistic regression using the script below.
prob=predict(fit2, testset, type=c("response"))
testset$prob=prob
library(pROC)
g <- roc(CH1 ~ prob, data = testset, )
g
plot(g)
Which resulted in the ROC curve below.
How do I get the optimum cut off from this ROC curve?
Getting the "optimal" cutoff is totally independent of the type of model, so you can get it like you would for any other type of model with pROC. With the coords function:
coords(g, "best", transpose = FALSE)
Or directly on a plot:
plot(g, print.thres=TRUE)
Now the above simply maximizes the sum of sensitivity and specificity. This is often too simplistic and you probably need a clear definition of "optimal" that is adapted to your use case. That's mostly beyond the scope of this question, but as a starting point you should a look at Best Thresholds section of the documentation of the coords function for some basic options.

Fitting a Gaussian to a dataset using geom_density

I'm pretty new to R and am trying to analyse some data and fit a Gaussian to it using the ggplot2 package.
I am able to plot a smooth curve using geom_smooth and the results are as expected. However, using geom_density (see code below) the result is not as expected.
ggplot(All_Wavelengths_LabVIEW_selected_)+
geom_smooth(mapping = aes(Actual_Wavelength, B), se = FALSE)+
geom_density(kernel = "gaussian", Actual_Wavelength, B)
Instead of a Gaussian fit, I get:
'Error in fortify(data) : object 'B' not found'
I don't understand how this can occur given it uses B to plot the smooth curve without any issue.
In addition, I would like to do the following:
Extract FWHM value of the peak
Overlay multiple of these Gaussian fits for other sets of data (similar to B) with the same X axis
Is this possible?
Any help on this would be greatly appreciated.

The 4 outputs of a linear model in R

In R, after creating a linear model using the function model <- lm() and plotting it using plot(model), you will get back 4 graphs each displaying your model differently. Can anyone explain what these graphs mean?
plot.lm can produce 6 different diagnostic plots, controlled by the which parameter. These are:
a plot of residuals against fitted values
a Normal Q-Q plot
a Scale-Location plot of sqrt(| residuals |) against fitted values
a plot of Cook's distances versus row labels
a plot of residuals against leverages
a plot of Cook's distances against leverage/(1-leverage)
By default it will produce numbers 1, 2, 3 and 5, pausing between plots in interactive mode.
You can see them all in one go if you set up the graphics device for multiple plots, eg:
mdl <- lm(hp~disp,mtcars)
par(mfrow=c(3,2))
plot(mdl,which=1:6)
Interpretation of these plots is a question for Cross Validated, though ?plot.lm gives some basic information.

Interpolation and Curve fitting with R

I am a chemical engineer and very new to R. I am attempting to build a tool in R (and eventually a shiny app) for analysis of phase boundaries. Using a simulation I get output that shows two curves which can be well represented by a 4th order polynomial. The data is as follows:
https://i.stack.imgur.com/8Oa0C.jpg
The procedure I have to follow uses the difference between the two curves to produce a second. In order to compare the curves, the data has to increase as a function of pressure in set increments, for example of 0.2 As can be seen, the data from the simulation is not incremental and there is no way to compare the curves based on the output.
To resolve this, in excel I carried out the following steps on each curve:
I plotted the data with pressure on the x axis and temperature on the y axis
Found the line of best fit using a 4th order polynomial
Used the equation of the curve to calculate the temperature at set increments of pressure
From this, I was able to compare the curves mathematically and produce the required output.
Does anyone have any suggestions how to carry this out in R, or if there is a more statistical or simplified approach that I have missed(extracting bezier curve points etc)?
As a bit of further detail, I have taken the data and merged it using tidyr so that the graphs (4 in total) are displayed in just three columns, the graph title, temperature and pressure. I did this after following a course on ggplot2 on Datacamp, but not sure if this format is suitable when carrying out regression etc? The head of my dataset can be seen here:
https://i.stack.imgur.com/WeaPz.jpg
I am very new to R, so apologies if this is a stupid question and I am using the wrong terms.
Though I agree with #Jaap's comment, polynomial regression is very easy in R. I'll give you the first line:
x <- c(0.26,3.33,5.25,6.54,7.38,8.1,8.73,9.3,9.81,10.28,10.69,11.08,11.43,11.75,12.05,12.33)
y <- c(16.33,24.6,31.98,38.38,43.3,48.18,53.08,57.99,62.92,67.86,72.81,77.77,82.75,87.75,92.77,97.81)
lm <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4))
Now your polynomial coefficients are in lm$coef, you can extract them and easily plot the fitted line, e.g.:
coefs <- lm$coef
plot(x, y)
lines(x, coefs[1] + coefs[2] * x + coefs[3] * x^2 + coefs[4] * x^3 + coefs[5] * x^4)
The fitted values are also simply given using lm$fit. Build the same polynomial for the second curve and compare the coefficients, not just the "lines".

How to plot and analyze multi variable SVM regression in R

I am new to R and having some trouble with plotting svm models.
1)How can we plot and analyze mulit variable SVM regression model results.
library(e1071)
set.seed(3)
data = data.frame(matrix(rnorm(100*5), nrow=100))
train=data[1:70,]
test=data[71:100,]
fit = svm(X1 ~ ., data=train)
summary(fit)
pred=predict(fit,test)
2) Assume one of the variable (eg: X2) contains qualitative data (eg: high,low and medium) instead of quantitative data, then how should we plot
In short: you cannot. There is no way to visualize an object that is more than 3-dimensional.
What you can do is to deal with some simplification, approximation, etc. you often visualize characteristic of the model and not the model itself. For example one might plot:
relation between error metric (like R2) vs. some hyperparameter (regularization strength, kernel width, size of the training sets etc.)
find two most significant dimensions of the dataset and plot your model as 3d surface on top of these two dimensions only
if your dimensionality is not very high you can do pairplots, so visualize each pair of dimensions -> as it requires d(d-1)/2 plots, thus for d=5 it is just 10 plots.
many other characteristic important from the perspective of your experiment

Resources