R overlap normal curve to probability histogram - r

In R I'm able to overlap a normal curve to a density histogram:
Eventually I can convert the density histogram to a probability one:
a <- rnorm(1:100)
test <-hist(a, plot=FALSE)
test$counts=(test$counts/sum(test$counts))*100 # Probability
plot(test, ylab="Probability")
curve(dnorm(x, mean=mean(a), sd=sd(a)), add=TRUE)
But I cannot overlap the normal curve anymore since it goes off scale.
Any solution? Maybe a second Y-axis

Now the question is clear to me. Indeed a second y-axis seems to be the best choice for this as the two data sets have completely different scales.
In order to do this you could do:
set.seed(2)
a <- rnorm(1:100)
test <-hist(a, plot=FALSE)
test$counts=(test$counts/sum(test$counts))*100 # Probability
plot(test, ylab="Probability")
#start new graph
par(new=TRUE)
#instead of using curve just use plot and create the data your-self
#this way below is how curve works internally anyway
curve_data <- dnorm(seq(-2, 2, 0.01), mean=mean(a), sd=sd(a))
#plot the line with no axes or labels
plot(seq(-2, 2, 0.01), curve_data, axes=FALSE, xlab='', ylab='', type='l', col='red' )
#add these now with axis
axis(4, at=pretty(range(curve_data)))
Output:

At first you should save your rnorm data otherwise you get different data each time.
seed = rnorm(100)
Next go ahead with
hist(seed,probability = T)
curve(dnorm(x, mean=mean(na.omit(seed)), sd=sd(na.omit(seed))), add=TRUE)
Now you have the expected result. Histogram with density curve.

The y-axis isn't a "probability" as you have labeled it. It is count data. If you convert your histogram to probabilities, you shouldn't have a problem:
x <- rnorm(1000)
hist(x, freq= FALSE, ylab= "Probability")
curve(dnorm(x, mean=mean(x), sd=sd(x)), add=TRUE)

Related

R shift of the plots

I have a code in R
x=rnorm(1000,1,1)
quantile(x,0.05)
x1=rnorm(1000,-10,1)
sum(x1>quantile(x,0.05))/length(x1)
y=hist(x,plot=FALSE)$density
plot(y)
plot(y,type="l")
y1=hist(x1,plot=FALSE)$density
matplot(y1,type="l",add=TRUE)
I want to change it so that the plots do not overlap but are next to each other. Is it enough that I change the values for the mean and sd or I have to change something else in the code. I am new to this, so please help me
In order to plot both histograms, you need to set the correct x and y limits for the plot windows because base R graphics will not resize the window after the first set of data has been drawn. Here's one way to do that
x <- rnorm(1000,1,1)
x1 <- rnorm(1000,-10,1)
y <- hist(x,plot=FALSE)
y1 <- hist(x1, plot=FALSE)
plot(0,0,
ylim=range(c(y$counts, y1$counts)),
xlim=range(c(y$breaks, y1$breaks)),
xlab="x", ylab="counts", type="n")
plot(y, add=TRUE)
plot(y1, add=TRUE)

How to add a Poisson distribution curve that approaches 3?

I want to add a curve to an existing plot.
This curve should be a poisson distribution curve that approaches the mean 3.
I've tried this code
points is a vector with 1000 values
plot(c(1:1000), points,type="l")
abline(h=3)
x = 0:1000
curve(dnorm(x, 3, sqrt(3)), lwd=2, col="red", add=TRUE)
I am getting a plot, but without any curve.
I would like to see a curve that approaches 3.
you can do something like this:
plot(0:20, 3+dpois( x=0:20, lambda=3 ), xlim=c(-2,20))
normden <- function(x){3+dnorm(x, mean=3, sd=sqrt(3))}
curve(normden, from=-4, to=20, add=TRUE, col="red")
running this code will produce the following:
is that what you intended?

Multiple plots using curve() function (e.g. normal distribution)

I am trying to plot multiple functions using curve(). My example tries to plot multiple normal distributions with different means and the same standard deviation.
png("d:/R/standardnormal-different-means.png",width=600,height=300)
#First normal distribution
curve(dnorm,
from=-2,to=2,ylab="d(x)",
xlim=c(-5,5))
abline(v=0,lwd=4,col="black")
#Only second normal distribution is plotted
myMean <- -1
curve(dnorm(x,mean=myMean),
from=myMean-2,to=myMean+2,
ylab="d(x)",xlim=c(-5,5), col="blue")
abline(v=-1,lwd=4,col="blue")
dev.off()
As the curve() function creates a new plot each time, only the second normal distribution is plotted.
I reopened this question because the ostensible duplicates focus on plotting two different functions or two different y-vectors with separate calls to curve. But since we want the same function, dnorm, plotted for different means, we can automate the process (although the answers to the other questions could also be generalized and automated in a similar way).
For example:
my_curve = function(m, col) {
curve(dnorm(x, mean=m), from=m - 3, to=m + 3, col=col, add=TRUE)
abline(v=m, lwd=2, col=col)
}
plot(NA, xlim=c(-10,10), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, seq(-6,6,2), rainbow(7))
Or, to generalize still further, let's allow multiple means and standard deviations and provide an option regarding whether to include a mean line:
my_curve = function(m, sd, col, meanline=TRUE) {
curve(dnorm(x, mean=m, sd=sd), from=m - 3*sd, to=m + 3*sd, col=col, add=TRUE)
if(meanline==TRUE) abline(v=m, lwd=2, col=col)
}
plot(NA, xlim=c(-10,10), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, rep(0,4), 4:1, rainbow(4), MoreArgs=list(meanline=FALSE))
You can also use line segments that start at zero and stop at the top of the density distribution, rather than extending all the way from the bottom to the top of the plot. For a normal distribution the mean is also the point of highest density. However, I've used the which.max approach below as a more general way of identifying the x-value at which the maximum y-value occurs. I've also added arguments for line width (lwd) and line end cap style (lend=1 means flat rather than rounded):
my_curve = function(m, sd, col, meanline=TRUE, lwd=1, lend=1) {
x=curve(dnorm(x, mean=m, sd=sd), from=m - 3*sd, to=m + 3*sd, col=col, add=TRUE)
if(meanline==TRUE) segments(m, 0, m, x$y[which.max(x$y)], col=col, lwd=lwd, lend=lend)
}
plot(NA, xlim=c(-10,20), ylim=c(0,0.4), xlab="Mean", ylab="d(x)")
mapply(my_curve, seq(-5,5,5), c(1,3,5), rainbow(3))

Setting Trend Line Length in R

I have managed to create a scatterplot with two datasets on a single plot. One set of data has an X axis that ranges from 0 -40 (Green), while the other only ranges from 0 -15 (Red).
I used this code to add trend lines to the red and green data separately (using par(new)).
plot( x1,y1, col="red", axes=FALSE, xlab="",ylab="",ylim= range(0:1), xlim= range(0:40))
f <- function(x1,a,b,d) {(a*x1^2) + (b*x1) + d}
fit <- nls(y1 ~ f(x1,a,b,d), start = c(a=1, b=1, d=1))
co <- coef(fit)
curve(f(x, a=co[1], b=co[2], d=co[3]), add = TRUE, col="red", lwd=1)
My issue is I can't seem to find a way to stop the red trend line at 15 on the x axis. I "googled" around and nothing seemed to come up for my issue. Lots on excel trend lines! I tired adding an end= statement to fit<- and that did not work either.
Please help,
I hope I have posted enough information. Thanks in advance.
Try using ggplot. Following example uses mtcars data:
library(ggplot2)
ggplot(mtcars, aes(qsec, wt, color=factor(vs)))+geom_point()+ stat_smooth(se=F)
You can do this in base graphics with the from and to arguments of the curve function (see the help for curve for more details). For example:
# Your function
f <- function(x1,a,b,d) {(a*x1^2) + (b*x1) + d}
# Plot the function from x=-100 to x=100
curve(f(x, a=-2, b=3, d=0), from=-100, to=100, col="red", lwd=1, lty=1)
# Same curve going from x=-100 to x=0 (and shifted down by 1000 units so it's
# easy to see)
curve(f(x, a=-2, b=3, d=-1000), from=-100, to=0,
add=TRUE, col="blue", lwd=2, lty=1)
If you want to set the curve x-limits programmatically, you can do something like this (assuming your data frame is called df and your x-variable is called x):
curve(f(x, a=-2, b=3, d=0), from=range(df$x)[1], to=range(df$x)[2],
add=TRUE, col="red", lwd=1, lty=1)

Plot a log-curve to a scatter plot

I am facing a probably pretty easy-to-solve issue: adding a log- curve to a scatter plot.
I have already created the corresponding model and now only need to add the respective curve/line.
The current model is as follows:
### DATA
SpStats_urbanform <- c (0.3702534,0.457769,0.3069843,0.3468263,0.420108,0.2548158,0.347664,0.4318018,0.3745645,0.3724192,0.4685135,0.2505839,0.1830535,0.3409849,0.1883303,0.4789871,0.3979671)
co2 <- c (6.263937,7.729964,8.39634,8.12979,6.397212,64.755192,7.330138,7.729964,11.058834,7.463414,7.196863,93.377393,27.854284,9.081405,73.483949,12.850917,12.74407)
### Plot initial plot
plot (log10 (1) ~ log10 (1), col = "white", xlab = "PUSHc values",
ylab = "Corrected GHG emissions [t/cap]", xlim =c(0,xaxes),
ylim =c(0,yaxes), axes =F)
axis(1, at=seq(0.05, xaxes, by=0.05), cex.axis=1.1)
axis(2, at=seq(0, yaxes, by=1), cex.axis=1.1 )
### FIT
fit_co2_urbanform <- lm (log10(co2) ~ log10(SpStats_urbanform))
### Add data points (used points() instead of simple plot() bc. of other code parts)
points (co2_cap~SpStats_urbanform, axes = F, cex =1.3)
Now, I've already all the fit_parameters and are still not able to construct the respective fit-curve for co2_cap (y-axis)~ SpStats_urbanform (x-axis)
Can anyone help me finalizing this little piece of code ?
First, if you want to plot in a log-log space, you have to specify it with argument log="xy":
plot (co2~SpStats_urbanform, log="xy")
Then if you want to add your regression line, then use abline:
abline(fit_co2_urbanform)
Edit: If you don't want to plot in a log-log scale then you'll have to translate your equation log10(y)=a*log10(x)+b into y=10^(a*log10(x)+b) and plot it with curve:
f <- coefficients(fit_co2_urbanform)
curve(10^(f[1]+f[2]*log10(x)),ylim=c(0,100))
points(SpStats_urbanform,co2)

Resources