I want to plot the histogram with real data and compare it with a theoretical normal distribution in one plot. But the scale looks different. Two plots have different scale
# you can generate some ramdom data on ystar which is realy data.
x<-seq(-4,4,length=200)
y<-dnorm(x,mean=0, sd=1)
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5),ylim=c(0,0.7))
par(new = TRUE)
hist(ystar,xlim = c(-10,10),freq = FALSE,ylim=c(0,0.7),breaks = 50)
Desire output
Assuming that ystar is a vector, you should change this:
y<-dnorm(x,mean=0, sd=1)
To:
y<-dnorm(x,mean=mean(ystar), sd=sd(ystar))
This will produce a distribution function that approximately matches the histogram.
You should then be able to use the same x-limits for both the histogram and the theoretical distribution, which will eliminate the strange overlapping axis labels you have in your current version.
Related
I'm trying to use the inverse cumulative distribution method to plot a histogram from the standard cauchy distribution and I'm getting a strange plot that doesn't look like the textbook standard cauchy. I think I have my inverse function correct (x = tan(pi*(x - 1/2))) so I would appreciate some help. Here is the r code that I have used:
n <- 10000
u <- runif(n)
c.samp <- sapply(u, function(u) tan(pi*(u - 1/2)))
hist(c.samp, breaks = 90, col = "blue",
main = "Hist of Cauchy")
The resulting plot just doesn't look correct:
Any help is appreciated, thank you.
The histogram and sampling technique is correct.
Compare the results with the following (which uses the R Cauchy sampling function).
c.samp2 <- rcauchy(n)
hist(c.samp2, breaks = 90, col = "blue",
main = "Hist of Cauchy 2")
The output here also look incorrect, but it is not.
First, you should note the x-axis is by default chosen based on the extreme values that you happen to encounter. As you probably know, the Cauchy distribution is extremely fat-tailed and very large, but rare, values are expected. When running 10000 samples from the Cauchy distribution, those relatively few single measurements squeeze the plot and do not show up on the plot because only very few observations are allocated to each bins in those extremes.
The default parameters of how hist chooses the bins are also poorly suited for distribution like the Cauchy. Try e.g.
hist(c.samp2, breaks = "FD", col = "blue",
bins = 50,
main = "Hist of Cauchy 2",
xlim = c(-500, 500))
I suggest to read the help("hist") page carefully and play around with the parameters to get a good and useful histogram.
By tweaking the chosen x-axis ranges, using an y-axis probability scale, adding the theoretical distribution and a "rug", you get something more useful.
hist(c.samp, breaks = "FD", col = "blue",
main = "Hist of Cauchy distribution",
xlim = c(-50, 50),
freq = FALSE)
curve(dcauchy, add = TRUE, col = "red")
rug(c.samp)
Note that using c.samp or c.samp2 now hardly changes the plot.
Could someone explain me why I get different lines when I plot? Somehow I thought the line should be the same
data(aircraft)
help(aircraft)
attach(aircraft)
lgWeight <- log(Weight)
library(KernSmooth)
# a) Fit a nonparametric regression to data (xi,yi) and save the estimated values mˆ (xi).
# Regression of degree 2 polynomial of lgWeight against Yr
op <- par(mfrow=c(2,1))
lpr1 <- locpoly(Yr,lgWeight, bandwidth=7, degree = 2, gridsize = length(Yr))
plot(Yr,lgWeight,col="grey", ylab="Log(Weight)", xlab = "Year")
lines(lpr1,lwd=2, col="blue")
lines(lpr1$y, col="black")
How can I get the values from the model? If I print the model, it gives me the values on $x and $y, but somehow if I plot them, is not the same as the blue line. I need the values of the fitted model (blue) for every x, could someone help me?
The fitted model (blue curve) is correctly in lpr1. As you said, the correct y-values are in lpr1$y and the correct x-values are in lpr1$x.
The reason the second plot looks like a straight line is because you are only giving the plot function one variable, lpr1$y. Since you don't specify the x-coordinates, R will automatically plot them along an index, from 1 to the length of the y variable.
The following are two explicit and equivalent ways to plot the curve and line:
lines(x = lpr1$x, y = lpr1$y,lwd=2, col="blue") # plots curve
lines(x = 1:length(lpr1$y), y = lpr1$y, col="black") # plot line
I'm trying to log-transform the x axis of a density plot and get unexpected results. The code without the transformation works fine:
library(ggplot2)
data = data.frame(x=c(1,2,10,11,1000))
dens = density(data$x)
densy = sapply(data$x, function(x) { dens$y[findInterval(x, dens$x)] })
ggplot(data, aes(x = x)) +
geom_density() +
geom_point(y = densy)
If I add scale_x_log10(), I get the following result:
Apart from the y values having been rescaled, something seems to have happened to the x values as well -- the peaks of the density function are not quite where the points are.
Am I using the log transformation incorrectly here?
The shape of the density curve changes after the transformation because the distribution of the data has changed and the bandwidths are different. If you set a bandwidth of (bw=1000) prior to the transformation and 10 afterward, you will get two normal looking densities (with different y-axis values because the support will be much larger in the first case). Here is an example showing how varying bandwidths change the shape of the density.
data = data.frame(x=c(1,2,10,11,1000), y=0)
## Examine how changing bandwidth changes the shape of the curve
par(mfrow=c(2,1))
greys <- colorRampPalette(c("black", "red"))(10)
plot(density(data$x), main="No Transform")
points(data, pch=19)
plot(density(log10(data$x)), ylim=c(0,2), main="Log-transform w/ varying bw")
points(log10(data$x), data$y, pch=19)
for (i in 1:10)
points(density(log10(data$x), bw=0.02*i), col=greys[i], type="l")
legend("topright", paste(0.02*1:10), col=greys, lty=2, cex=0.8)
I am facing a probably pretty easy-to-solve issue: adding a log- curve to a scatter plot.
I have already created the corresponding model and now only need to add the respective curve/line.
The current model is as follows:
### DATA
SpStats_urbanform <- c (0.3702534,0.457769,0.3069843,0.3468263,0.420108,0.2548158,0.347664,0.4318018,0.3745645,0.3724192,0.4685135,0.2505839,0.1830535,0.3409849,0.1883303,0.4789871,0.3979671)
co2 <- c (6.263937,7.729964,8.39634,8.12979,6.397212,64.755192,7.330138,7.729964,11.058834,7.463414,7.196863,93.377393,27.854284,9.081405,73.483949,12.850917,12.74407)
### Plot initial plot
plot (log10 (1) ~ log10 (1), col = "white", xlab = "PUSHc values",
ylab = "Corrected GHG emissions [t/cap]", xlim =c(0,xaxes),
ylim =c(0,yaxes), axes =F)
axis(1, at=seq(0.05, xaxes, by=0.05), cex.axis=1.1)
axis(2, at=seq(0, yaxes, by=1), cex.axis=1.1 )
### FIT
fit_co2_urbanform <- lm (log10(co2) ~ log10(SpStats_urbanform))
### Add data points (used points() instead of simple plot() bc. of other code parts)
points (co2_cap~SpStats_urbanform, axes = F, cex =1.3)
Now, I've already all the fit_parameters and are still not able to construct the respective fit-curve for co2_cap (y-axis)~ SpStats_urbanform (x-axis)
Can anyone help me finalizing this little piece of code ?
First, if you want to plot in a log-log space, you have to specify it with argument log="xy":
plot (co2~SpStats_urbanform, log="xy")
Then if you want to add your regression line, then use abline:
abline(fit_co2_urbanform)
Edit: If you don't want to plot in a log-log scale then you'll have to translate your equation log10(y)=a*log10(x)+b into y=10^(a*log10(x)+b) and plot it with curve:
f <- coefficients(fit_co2_urbanform)
curve(10^(f[1]+f[2]*log10(x)),ylim=c(0,100))
points(SpStats_urbanform,co2)
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fitting a density curve to a histogram in R
I'd like to plot on the same graph the histogram and various pdf's. I've tried for just one pdf with the following code (adopted from code I've found in the web):
hist(data, freq = FALSE, col = "grey", breaks = "FD")
.x <- seq(0, 0.1, length.out=100)
curve(dnorm(.x, mean=a, sd=b), col = 2, add = TRUE)
It gives me an error. Can you advise me?
For multiple pdf's what's the trick?
And I've observed that the histogram seems to be plot the density (on y-y axis) instead of the number of observations.... how can I change this?
Many thanks!
It plots the density instead of the frequency because you specified freq=FALSE. It is not very fair to complain about it doing exactly what you told it to do.
The curve function expects an expression involving x (not .x) and it does not require you to precompute the x values. You probably want something like:
a <- 5
b <- 2
hist( rnorm(100, a, b), freq=FALSE )
curve( dnorm(x,a,b), add=TRUE )
To head of your next question, if you specify freq=TRUE (or just leave it out for the default) and add the curve then the curve just runs along the bottom (that is the whole purpose of plotting the histogram as a density rather than frequencies). You can work around this by scaling the expression given to curve by the width of the bins and the number of total points:
out <- hist( rnorm(100, a, b) )
curve( dnorm(x,a,b)*100*diff(out$breaks[1:2]), add=TRUE )
Though personally the first option (density scale) without tickmark labels on the y-axis makes more sense to me.
h<-hist(data, breaks="FD", col="red", xlab="xTitle", main="Normal pdf and histogram")
xfit<-seq(min(data),max(data),length=100)
x.norm<-rnorm(n=100000, mean=a, sd=b)
yfit<-dnorm(xfit,mean=mean(x.norm),sd=sd(x.norm))
yfit <- yfit*diff(h$mids[1:2])*length(loose_All)
lines(xfit, yfit, col="blue", lwd=2)