I'm using the last column from the following data,
Data
And I'm trying to apply the idea of a kernel density estimator to this dataset which is represented by
where k is some kernal, normally a normal distribution though not necessarily., h is the bandwidth, n is the length of the data set, X_i is each data point and x is a fitted value. So using this equation I have the following code,
AstroData=read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/galaxy.dat"),
header=FALSE)
x=AstroData$V3
xsorted=sort(x)
x_i=xsorted[1:1266]
hist(x_i, nclass=308)
n=length(x_i)
h1=.002
t=seq(min(x_i),max(x_i),0.01)
M=length(t)
fhat1=rep(0,M)
for (i in 1:M){
fhat1[i]=sum(dnorm((t[i]-x_i)/h1))/(n*h1)}
lines(t, fhat1, lwd=2, col="red")
Which produces a the following plot,
which is actually close to what I want as the final result should appear as this once I remove the histograms,
Which if you noticed is finer tuned and the red lines which should represent the density are rather rough and are not scaled as high. The final plot that you see is run using the density function in R,
plot(density(x=y, bw=.002))
Which is what I want to get to without having to use any additional packages.
Thank you
After some talk with my roommate he gave me the idea to go ahead and decrease the interval of the t-values (x). In doing some I changed it from 0.01 to 0.001. So the final code for this plot is as appears,
AstroData=read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/galaxy.dat"),
header=FALSE)
x=AstroData$V3
xsorted=sort(x)
x_i=xsorted[1:1266]
hist(x_i, nclass=308)
n=length(x_i)
h1=.002
t=seq(min(x_i),max(x_i),0.001)
M=length(t)
fhat1=rep(0,M)
for (i in 1:M){
fhat1[i]=sum(dnorm((t[i]-x_i)/h1))/(n*h1)}
lines(t, fhat1, lwd=2, col="blue")
Which in terms gives the following plot, which is the one that I wanted,
Related
Why I did get lines instead of standard bubbles in my q-q plot?
My code:
data <- read.csv("C:\\Users\\anton\\SanFrancisco.csv")
x <- data$ï..San.Francisco
head(x)
library("fitdistrplus")
fitnor <- fitdist(x, "norm")
fitlogis <- fitdist(x, "logis")
qqcomp(list(fitnor, fitlogis), legendtext=c("Normal", "Logistic"))
From the documentation for qqcomp - get to it by ?qqcomp.
qqcomp provides a plot of the quantiles of each theoretical
distribution (x-axis) against the empirical quantiles of the data
(y-axis), by default defining probability points as (1:n - 0.5)/n for
theoretical quantile calculation (data are assumed continuous). For
large dataset (n > 1e4), lines are drawn instead of points and
customized with the fitpch parameter.
This is a design feature. Your data must have more than 10000 values. If that is the case, the bubbles on the q-q plot would be difficulty to individually distinguish. Additionally, they are large enough that the bubbles for one model would cover those for the other.
So I'm asked to obtain an estimate theta of the variable Length in the MASS package. The code I used is shown below, as well as the resulting curve. Somehow, I don't end up with a smooth curve, but with a very "blocky" one, as well as some lines between points on the curve. Can anyone help me to get a smooth curve?
utils::data(muscle,package = "MASS")
Length.fit<-nls(Length~t1+t2*exp(-Conc/t3),muscle,
start=list(t1=3,t2=-3,t3=1))
plot(Length~Conc,data=muscle)
lines(muscle$Conc, predict(Length.fit))
Image of the plot:
.
Edit: as a follow-up question:
If I want to more accurately predict the curve, I use nonlinear regression to predict the curve for each of the 21 species. This gives me a vector
theta=(T11,T12,...,T21,T22,...,T3).
I can create a for-loop that plots all of the graphs, but like before, I end up with the blocky curve. However, seeing as I have to plot these curves as follows:
for(i in 1:21) {
lines(muscle$Conc,theta[i]+theta[i+21]*
exp(-muscle$Conc/theta[43]), col=color[i])
i = i+1
}
I don't see how I can use the same trick to smooth out these curves, since muscle$Conc still only has 4 values.
Edit 2:
I figured it out, and changed it to the following:
lines(seq(0,4,0.1),theta[i]+theta[i+21]*exp(-seq(0,4,0.1)/theta[43]), col=color[i])
If you look at the output of cbind(muscle$Conc, predict(Length.fit)), you'll see that many points are repeated and that they're not sorted in order of Conc. lines just plots the points in order and connects the points, giving you multiple back-and-forth lines. The code below runs predict on a unique set of ordered values for Conc.
plot(Length ~ Conc,data=muscle)
lines(seq(0,4,0.1),
predict(Length.fit, newdata=data.frame(Conc=seq(0,4,0.1))))
So... I'm looking at an example in a book that goes something like this:
library(daewr)
mod1 <- aov(height ~ time, data=bread)
summary(mod1)
...
par(mfrow=c(2,2))
plot(mod1, which=5)
plot(mod1, which=1)
plot(mod1, which=2)
plot(residuals(mod1) ~ loaf, main="Residuals vs Exp. Units", font.main=1, data=bread)
abline(h = 0, lty = 2)
That all works... but the text is a little vague about the purpose of the parameter 'which='. I dug around in the help (in Rstudio) on plot() and par(), looked around online... found some references to a different 'which()'... but nothing really referring me to the purpose/syntax for the parameter 'which=' inside plot().
A bit later (next page, figures) I found a mention of using names(mod1) to view the list of quantities calculated by aov... which I presume is what which= is refering to, i.e. which item in the list to plot where in the 2x2 matrix of plots. Yay. Now where the heck is that buried in the docs?!?
which selects which plot to be displayed:
A plot of residuals against fitted values
A normal Q-Q plot
A Scale-Location plot of sqrt(| residuals |) against fitted values
A plot of Cook's distances versus row labels
A plot of residuals against leverages
A plot of Cook's distances against leverage/(1-leverage)
By default, the first three and 5 are provided.
Check ?plot.lm in r for more details.
I'm simply trying to display the fit I've generated using lm(), but the lines function is giving me a weird result in which there are multiple lines coming out of one point.
Here is my code:
library(ISLR)
data(Wage)
lm.mod<-lm(wage~poly(age, 4), data=Wage)
Wage$lm.fit<-predict(lm.mod, Wage)
plot(Wage$age, Wage$wage)
lines(Wage$age, Wage$lm.fit, col="blue")
I've tried resetting my plot with dev.off(), but I've had no luck. I'm using rStudio. FWIW, the line shows up perfectly fine if I make the regression linear only, but as soon as I make it quadratic or higher (using I(age^2) or poly()), I get a weird graph. Also, the points() function works fine with poly().
Thanks for the help.
Because you forgot to order the points by age first, the lines are going to random ages. This is happening for the linear regression too; he reason it works for lines is because traveling along any set of points along a line...stays on the line!
plot(Wage$age, Wage$wage)
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue')
Consider increasing the line width for a better view:
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue', lwd = 3)
Just to add another more general tip on plotting model predictions:
An often used strategy is to create a new data set (e.g. newdat) which contains a sequence of values for your predictor variables across a range of possible values. Then use this data to show your predicted values. In this data set, you have a good spread of predictor variable values, but this may not always be the case. With the new data set, you can ensure that your line represents evenly distributed values across the variable's range:
Example
newdat <- data.frame(age=seq(min(Wage$age), max(Wage$age),length=1000))
newdat$pred <- predict(lm.mod, newdata=newdat)
plot(Wage$age, Wage$wage, col=8, ylab="Wage", xlab="Age")
lines(newdat$age, newdat$pred, col="blue", lwd=2)
I have a simple data set with two columns of data- K and SwStr.
K = c(.259, .215, .224, .223, .262, .233)
SwStr = c(.130, .117, .117, .114, .113, .111)
I plotted the data using:
plot(res$K, res$SwStr)
I want to plot the result of a linear model, using SwStr to predict K. I try to do that using:
graphic<-lm(K~SwStr-1, data=res)
P=predict(graphic)
plot(res$K, res$SwStr)
lines(P, lty="dashed", col="green", lwd=3)
But when I do this, I don't get any line plotted. What am I doing wrong?
(1) You are inverting the axes of the original plot. If you want SwStr on the x axis and K on the y axis you need
plot(res$SwStr, res$K)
or
with(res,plot(K~SwStr))
If you check the actual values of the plotted points on the graph, this might be obvious (especially if K and SwStr have different magnitudes) ...
For lm fits you can also use abline(graphic,...)
edit: (2) You also have to realize that predict gives just the predicted y values, not the x values. So you want something like this:
K=c(.259, .215, .224, .223, .262, .233)
SwStr=c(.130, .117, .117, .114, .113, .111)
g <- lm(K~SwStr-1)
par(las=1,bty="l") ## my favourites
plot(K~SwStr)
P <- predict(g)
lines(SwStr,P)
Depending on the situation, you may also want to use the newdata argument to predict to specify a set of evenly spaced x values ...