How to draw my function to plot with data in R - r

I have data about response time at web site according users that hit at the same time.
For example:
10 users hit the same time have (average) response time 300ms
20 users -> 450ms etc
I import the data in R and I make the plot from 2 columns data (users, response time).
Also I use the function loess to draw a line about those points, at the plot.
Here's the code that I have wrote:
users <- seq(5,250, by=5)
responseTime <- c(179.5,234.0,258.5,382.5,486.0,679.0,594.0,703.5,998.0,758.0,797.0,812.0,804.5,890.5,1148.5,1182.5,1298.0,1422.0,1413.5,1209.5,1488.0,1632.0,1715.0,1632.5,2046.5,1860.5,2910.0,2836.0,2851.5,3781.0,2725.0,3036.0,2862.0,3266.0,3175.0,3599.0,3563.0,3375.0,3110.0,2958.0,3407.0,3035.5,3040.0,3378.0,3493.0,3455.5,3268.0,3635.0,3453.0,3851.5)
data1 <- data.frame(users,responseTime)
data1
plot(data1, xlab="Users", ylab="Response Time (ms)")
lines(data1)
loess_fit <- loess(responseTime ~ users, data1)
lines(data1$users, predict(loess_fit), col = "green")
Here's my plot's image:
My questions are:
How to draw my nonlinear function at the same plot to compare it with the other lines?
example: response_time (f(x)) = 30*users^2.
Also how to make predictions for the line of function loess and for my function and show them to the plot, example: if I have data until 250 users, make prediction until 500 users

If you know the equation of the line that you want to draw, then just define a variable for your prediction:
predictedResponseTime <- 30 * users ^ 2
lines(users, predictedResponseTime)
If the problem is that you want to fit a line, then you need to call a modelling function.
Since loess is a non-parametric model, is isn't appropriate to use it to make predictions outside of the range of your data.
In this case, a simple (ordinary least squares) linear regression using lm provides a reasonable fit.
model <- lm(responseTime ~ users)
prediction <- data.frame(users = 1:500)
prediction$responseTime <- predict(model, prediction)
with(prediction, lines(users, responseTime))

Another solution to plot your curve knowing the underlying function is function curve.
In your example of f(x)=30x^2:
plot(data1, xlab="Users", ylab="Response Time (ms)")
lines(data1)
lines(data1$users, predict(loess_fit), col = "green")
curve(30*x^2,col="red", add=TRUE) #Don't forget the add parameter.

Related

How to deal with a quadratic model that has too many fitted values?

I'm trying to fit a quadratic regression model to a dataset and then plot the curve on a scatterplot. The dataset is about number of episodes and screentime for characters in a TV show.
I plotted a scatterplot with episodes on x axis and screentime on y axis this worked fine.
Then I create the model as follows:
#ordering
gottemp <- got[order(got$episodes),]
#plotting
plot(screentime~episodes, data = gottemp, xlab ="Number of episodes", ylab = "Screentime (minutes)", col=c("blue","red")[gender], pch=c(1,2)[gender])
legend("topleft",pch = c(1,2),col=c("blue","red"),c("female","male"))
title("Plot of Screentimes vs Number of Episodes")
#creating 3model and plotting line
model <- lm(screentime~episodes+I(episodes^2), data = got)
lines(fitted(model))
This gives me a model with correct coeefficients however the line that is plotted is not what would be expected. When I view the model i see that there are 113 fitted values, which I think is due to some characters having the same number of episodes so to fix this I think there should only be one fitted value for each number of episodes.
Something like
nd <- data.frame(episodes=seq(min(episodes), max(episodes), length=51)
nd$screentime <- predict(model, newdata=nd)
with(nd, lines(episodes, screentime))
should do what you want. There's probably a duplicate around somewhere ...

R: How to read Nomograms to predict the desired variable

I am using Rstudio. I have created nomograms using function nomogram from package rms using following code (copied from the example code of the documentation):
library(rms)
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')
f <- lrm(y ~ lsp(age,50)+sex*rcs(cholesterol,4)+blood.pressure)
nom <- nomogram(f, fun=function(x)1/(1+exp(-x)), # or fun=plogis
fun.at=c(.001,.01,.05,seq(.1,.9,by=.1),.95,.99,.999),
funlabel="Risk of Death")
#Instead of fun.at, could have specified fun.lp.at=logit of
#sequence above - faster and slightly more accurate
plot(nom, xfrac=.45)
Result:
This code produces a nomogram but there is no line connecting each scale (called isopleth) to help predict the desired variable ("Risk of Death") from the plot. Usually, nomograms have the isopleth for prediction (example from wikipedia). But here, how do I predict the variable value?
EDIT:
From the documentation:
The nomogram does not have lines representing sums, but it has a
reference line for reading scoring points (default range 0--100). Once
the reader manually totals the points, the predicted values can be
read at the bottom.
I don't understand this. It seems that predicting is supposed to be done without the isopleth, from the scale of points. but how? Can someone please elaborate with this example on how I can read the nomograms to predict the desired variable? Thanks a lot!
EDIT 2 (FYI):
In the description of the bounty, I am talking about the isopleth. When starting the bounty, I did not know that nomogram function does not provide isopleth and has points scale instead.
From the documentation, the nomogram is used to manualy obtain prediction:
In the top of the plot (over Total points)
you draw a vertical line for each of the variables of your patient (for example age=40, cholesterol=220 ( and sex=male ), blood.pressure=172)
then you sum up the three values you read on the Points scale (40+60+3=103) to obtain Total Points.
Finally you draw a vertical line on the Total Points scale (103) to read the Risk of death (0.55).
These are regression nomograms, and work in a different way to classic nomograms. A classic nomogram will perform a full calculation. For these nomograms you drop a line from each predictor to the scale at the bottom and add your results.
The only way to have a classic 'isopleth' nomogram working on a regression model would be 1 have just two predictors or 2 have a complex multi- step nomogram.

Obtaining confidence interval for npreg as values, not as plot

I am using the well known "np" package of Hayfield & Racine for non-parametric regressions. It allows plotting confidence bands for the estimated coefficient based on bootstrap procedures. See the code below for an example.
Question: I am wondering how to obtain these confidence intervalls in numerical form? One, but not the only reason for this question is that I really don't like the presentation of the ci's. More generally speaking, I would like to use and further process the confidence band within my analysis.
library(np)
# generate random variables:
x <- 1:100 + rnorm(100)/2
y <- (1:100)^(0.25) + rnorm(100)/2
mynp <- npreg(y~x)
plot(mynp, plot.errors.method="bootstrap")`
when executing plot, the function is calling to the plot method of np package which is the function npplot
npplot exepts an argument plot.behavior which equals to plot by default which plots the results and returns NULL. you should set plot.behavior = "plot-data", and the function will plot and return the data of the object.
dat <- plot(mynp, plot.errors.method="bootstrap",plot.behavior = "plot-data")
Than the values in the line can be accesed through dat$r1$mean and the values to be added to the mean to get the upper and lower ci accesed through dat$r1$merr.
notice that not all value are plotted. only half of them (every other value and than the last).
read the 'help' on npplot for more options.
further is an example of the use of the code and the results:
library(np)
# generate random variables:
x <- 1:100 + rnorm(100)/2
y <- (1:100)^(0.25) + rnorm(100)/2
mynp <- npreg(y~x)
dat <- plot(mynp, plot.errors.method="bootstrap",plot.behavior = "plot-data")
Then recreating the results:
z <- unlist(dat$r1$eval,use.names = F)
CI.up = as.numeric(dat$r1$mean)+as.numeric(dat$r1$merr[,2])
CI.dn = as.numeric(dat$r1$mean)+as.numeric(dat$r1$merr[,1])
plot(dat$r1$mean~z, cex=1.5,xaxt='n', ylim=c(1.0,3.5),xlab='',ylab='lalala!', main='blahblahblah',col='blue',pch=16)
arrows(z,CI.dn,z,CI.up,code=3,length=0.2,angle=90,col='red')
we will get:
As you can see, theresults are the same (only I have calculated the intervals for each point and not only for half of them).
note the plot.errors.type attribute for npplot which gets "standard" and "quantiles" and is "standard" at default. When you specify "standard" dat$r1$merr will keep the standard errors and the plot will include mean+std err as intervals. Alternatively the plot will include the quantiles as the intervals and the quantiles will be saved at dat$r1$merr. which quntiles to use are specified by plot.errors.quantiles quantiles and it's only relevant if plot.errors.type = "quantiles"

Solution of varying coefficients ODE

I have a set of observed raw data and use 2nd order ODE to fit the data
y''+b1(t)y'+b0(t)y = 0
The b1 and b0 are time-dependent and I use principal differential analysis(PDA) (R-package: fda, function: pda.fd)to get the estimate of b1(t) and b0(t) .
To check the validity of the estimates of b1(t) and b0(t), I use collocation method (R-package bvpSolve, function:bvpcol) to get the numerical solution of the ODE and compare the solution with the smoothing curve fitting of the raw data.
My question is that my numerical solution from bvpcol can caputure the shape of the fitting curve but not for the value of the function. There are different in term of some constant multiples.
(Since I am not allowed to post images,please see the link for figure)
See the figure of my output. The gray dot is my raw data, the red line is Fourier expansion of the raw data, the green line is numerical solution of bvpcol function and the blue line the green-line/1.62. We can see the green line can capture the shape but with values that are constant times of fourier expansion.
I fit several other data and have similar situation but different constant. I am wondering it is the problem of numerical solution of ODE or some other reasons and how to solve this problem to get a good accordance between numerical solution(green) and true Fourier expansion?
Any help and idea is appreciated!
Here is a raw data and code:
RData is here
library(fda)
library(bvpSolve)
# load the data
load('y.RData')
tvec = 1:length(y)
tvec = (tvec-min(tvec))/(max(tvec)-min(tvec))
# create basis
fbasis = create.fourier.basis(c(0,1),nbasis=nbasis)
bbasis = create.bspline.basis(c(0,1),norder=8,nbasis=47)
bfdPar = fdPar(bbasis)
yfd = smooth.basis(tvec,y,fbasis)$fd
yfdlist = list(yfd)
bwtlist = rep(list(bfdPar),2)
# PDA fit
bwt = pda.fd(yfdlist,bwtlist)$bwtlist
# output of estimated coefficients
beta0.fd<-bwt[[1]]$fd
beta1.fd<-bwt[[2]]$fd
# define the vary-coef function in terms of t
fbeta0<-function(t)eval.fd(t,beta0.fd)
fbeta1<-function(t)eval.fd(t,beta1.fd)
# define 2nd order ODE
fun2 <- function(t,y,pars) {
with(as.list(c(y,pars)),{
beta0 = pars[[1]];
beta1 = pars[[2]];
dy1 = y[2]
dy2 = -beta1(t)*y[2]-beta0(t)*y[1]
return(list(c(dy1,dy2)))
})
}
# BVP
yinit<-c(p1[1],NA)
yend<-c(p1[length(p1)],NA)
t<-seq(tvec[1],tvec[length(tvec)],0.005)
col<-bvpcol(yini=yinit,yend=yend,x=t,func=fun2,parms=c(fbeta0,fbeta1),atol=1e-5,islin=T)
# plot output
plot(col[,1],col[,2],col='green',type='l')
points(tvec,p1,col='darkgray')
lines(yfd,col='red',lwd=2)
lines(col[,1],col[,2],col='green',type='l')
lines(col[,1],col[,2]/1.62,col='blue',type='l',lwd=2,lty=4)
legend('topleft',col=c('green','darkgray','red','blue'),
legend=c('ODE solution','raw data','basis curve fitting','ODE solution/1.62'),lty=1)

Plot "regression line" from multiple regression in R

I ran a multiple regression with several continuous predictors, a few of which came out significant, and I'd like to create a scatterplot or scatter-like plot of my DV against one of the predictors, including a "regression line". How can I do this?
My plot looks like this
D = my.data; plot( D$probCategorySame, D$posttestScore )
If it were simple regression, I could add a regression line like this:
lmSimple <- lm( posttestScore ~ probCategorySame, data=D )
abline( lmSimple )
But my actual model is like this:
lmMultiple <- lm( posttestScore ~ pretestScore + probCategorySame + probDataRelated + practiceAccuracy + practiceNumTrials, data=D )
I would like to add a regression line that reflects the coefficient and intercept from the actual model instead of the simplified one. I think I'd be happy to assume mean values for all other predictors in order to do this, although I'm ready to hear advice to the contrary.
This might make no difference, but I'll mention just in case, the situation is complicated slightly by the fact that I probably will not want to plot the original data. Instead, I'd like to plot mean values of the DV for binned values of the predictor, like so:
D[,'probCSBinned'] = cut( my.data$probCategorySame, as.numeric( seq( 0,1,0.04 ) ), include.lowest=TRUE, right=FALSE, labels=FALSE )
D = aggregate( posttestScore~probCSBinned, data=D, FUN=mean )
plot( D$probCSBinned, D$posttestScore )
Just because it happens to look much cleaner for my data when I do it this way.
To plot the individual terms in a linear or generalised linear model (ie, fit with lm or glm), use termplot. No need for binning or other manipulation.
# plot everything on one page
par(mfrow=c(2,3))
termplot(lmMultiple)
# plot individual term
par(mfrow=c(1,1))
termplot(lmMultiple, terms="preTestScore")
You need to create a vector of x-values in the domain of your plot and predict their corresponding y-values from your model. To do this, you need to inject this vector into a dataframe comprised of variables that match those in your model. You stated that you are OK with keeping the other variables fixed at their mean values, so I have used that approach in my solution. Whether or not the x-values you are predicting are actually legal given the other values in your plot should probably be something you consider when setting this up.
Without sample data I can't be sure this will work exactly for you, so I apologize if there are any bugs below, but this should at least illustrate the approach.
# Setup
xmin = 0; xmax=10 # domain of your plot
D = my.data
plot( D$probCategorySame, D$posttestScore, xlim=c(xmin,xmax) )
lmMultiple <- lm( posttestScore ~ pretestScore + probCategorySame + probDataRelated + practiceAccuracy + practiceNumTrials, data=D )
# create a dummy dataframe where all variables = their mean value for each record
# except the variable we want to plot, which will vary incrementally over the
# domain of the plot. We need this object to get the predicted values we
# want to plot.
N=1e4
means = colMeans(D)
dummyDF = t(as.data.frame(means))
for(i in 2:N){dummyDF=rbind(dummyDF,means)} # There's probably a more elegant way to do this.
xv=seq(xmin,xmax, length.out=N)
dummyDF$probCSBinned = xv
# if this gives you a warning about "Coercing LHS to list," use bracket syntax:
#dummyDF[,k] = xv # where k is the column index of the variable `posttestScore`
# Getting and plotting predictions over our dummy data.
yv=predict(lmMultiple, newdata=subset(dummyDF, select=c(-posttestScore)))
lines(xv, yv)
Look at the Predict.Plot function in the TeachingDemos package for one option to plot one predictor vs. the response at a given value of the other predictors.

Resources