Reverse Statistics with R - r

What I want to do sounds simple. I want to plot a normal IQ curve with R with a mean of 100 and a standard deviation of 15. Then, I'd like to be able to overlay a scatter plot of data on top of it.
Anybody know how to do this?

I'm guessing what you want to do is this: you want to plot the model normal density with mean 100 and sd = 15, and you want to overlay on top of that the empirical density of some set of observations that purportedly follow the model normal density, so that you can visualize how well the model density fits the empirical density. The code below should do this (here, x would be the vector of actual observations but for illustration purposes I'm generating it with a mixed normal distribution N(100,15) + 15*N(0,1), i.e. the purported N(100,15) distribution plus noise).
require(ggplot2)
x <- round( rnorm( 1000, 100, 15 )) + rnorm(1000)*15
dens.x <- density(x)
empir.df <- data.frame( type = 'empir', x = dens.x$x, density = dens.x$y )
norm.df <- data.frame( type = 'normal', x = 50:150, density = dnorm(50:150,100,15))
df <- rbind(empir.df, norm.df)
m <- ggplot(data = df, aes(x,density))
m + geom_line( aes(linetype = type, colour = type))

Well, it's more like a histogram, since I think you are expecting these to be more like an integer rounded process:
x<-round(rnorm(1000, 100, 15))
y<-table(x)
plot(y)
par(new=TRUE)
plot(density(x), yaxt="n", ylab="", xlab="", xaxt="n")
If you want the theoretic value of dnorm superimposed, then use one of these:
lines(sort(x), dnorm(sort(x), 100, 15), col="red")
-or
points(x, dnorm(x, 100, 15))

You can generate IQ scores PDF with:
curve(dnorm(x, 100, 15), 50, 150)
But why would you like to overlay scatter over density curve? IMHO, that's very unusual...

In addition to the other good answers, you might be interested in plotting a number of panels, each with its own graph. Something like this.

Related

How can I portray my lm() model across different ggplot scatterplot differently?

I am currently regressing GDP on multiple factors (7 different variables to be exact), My x variable is quarterly Dates (2006-Q1 to 2020-Q4). I need need to plot my scatter plot for the GDP with Date and plot my lm() linear line on top of it. I can not use geom_smooth() as it wont include all the regression coefficients and i can't do it any other way and am stuck. I attempted to use predict but when plotted its a non linear line. To sum it up, I need to take my lm() model and put it on my scatterplot.
What you can do is plot the distribution as a scatter and then create a set of points using predefined x values and calculate the y values based on the coefficients of the lm() model then change the visualization of it to be as lines. if you chose enough x values (100 for example), it will look like a curve.
Here is an example:
x0 <- seq(0,1,length.out = 25)
y0 <- x0^2 + 2*x0 + 3 + runif(n = 25, min = 0, max = 1)
library(plotly)
plot_ly(x=x0, y=y0, type="scatter", mode = "markers")
Now we can calculate the quadratic regression and plot a set of points, using the coefficients to obtain the curves.
x0_2 <- x0^2
linreg <- lm(y0~x0+x0_2)
x1 <- seq(min(x0), max(x0), length.out = 100)
y1 <- x1*x1*linreg[["coefficients"]][3] + x1*linreg[["coefficients"]][2] + linreg[["coefficients"]][1]
library(plotly)
plot_ly(x=x0, y=y0, type="scatter", mode = "markers") %>% add_lines(x=x1, y=y1, type="scatter", mode = "lines")

Overlay a Normal curve to Histogram

I repeat 50 times a rnorm with n=100, mean=100 and sd=25. Then I plot the histogram of all the sample means, but now I need to overlay a normal curve over the histogram.
x <- replicate(50, rnorm(100, 100, 25), simplify = FALSE)
x
sapply(x, mean)
sapply(x, sd)
hist(sapply(x, mean))
Do you know ow to overlay a normal curve over the histogram of the means?
Thanks
When we plot the density rather than the frequency histogram by setting freq=FALSE, we may overlay a curve of a normal distribution with the mean of the means. For the xlim of the curve we use the range of the means.
mean.of.means <- mean(sapply(x, mean))
r <- range(sapply(x, mean))
v <- hist(sapply(x, mean), freq=FALSE, xlim=r, ylim=c(0, .5))
curve(dnorm(x, mean=mean.of.means, sd=1), r[1], r[2], add=TRUE, col="red")
Also possible is to draw a sufficient amount of a normal distribution, and overlay the histogram with the lines of the density distribution.
lines(density(rnorm(1e6, mean.of.means, 1)))
Note, that I have used 500 mean values in my answer, since the comparison with a normal distribution may become meaningless with too few values. However, you can play with the breaks= option in the histogram function.
Data
set.seed(42)
x <- replicate(500, rnorm(100, 100, 25), simplify = FALSE)

How do you implement rgamma and dgamma in a single plot

For an assignment I was asked this:
For the values of
(shape=5,rate=1),(shape=50,rate=10),(shape=.5,rate=.1), plot the
histogram of a random sample of size 10000. Use a density rather than
a frequency histogram so that you can add in a line for the population
density (hint: you will use both rgamma and dgamma to make this plot).
Add an abline for the population and sample mean. Also, add a subtitle
that reports the population variance as well as the sample variance.
My current code looks like this:
library(ggplot2)
set.seed(1234)
x = seq(1, 1000)
s = 5
r = 1
plot(x, dgamma(x, shape = s, rate = r), rgamma(x, shape = s, rate = r), sub =
paste0("Shape = ", s, "Rate = ", r), type = "l", ylab = "Density", xlab = "", main =
"Gamma Distribution of N = 1000")
After running it I get this error:
Error in plot.window(...) : invalid 'xlim' value
What am I doing incorrectly?
plot() does not take y1 and y2 arguments. See ?plot. You need to do a plot (or histogram) of one y variable (e.g., from rgamma), then add the second y variable (e.g., from dgamma) using something like lines().
Here's one way to get a what you want:
#specify parameters
s = 5
r = 1
# plot histogram of random draws
set.seed(1234)
N = 1000
hist(rgamma(N, shape=s, rate=r), breaks=100, freq=FALSE)
# add true density curve
x = seq(from=0, to=20, by=0.1)
lines(x=x, y=dgamma(x, shape=s, rate=r))

Chi-Square Density Graph in R

How can I plot a chi-square density graph in R?
I got the following codes but I'm not sure how to manipulate them:
curve( dchisq(x, df=28), col='red', main = "Chi-Square Density Graph",
from=0,to=60)
xvec <- seq(7.5,60,length=101)
pvec <- dchisq(xvec,df=28)
polygon(c(xvec,rev(xvec)),c(pvec,rep(0,length(pvec))),
col=adjustcolor("black",alpha=0.3))
Could someone explain what the codes mean?
The package ggplot2 provides an easy way to plot Chi square distributions. You have to simply specify a stat_function with dchisq as your function and then a list to args that indicates the degrees of freedom.
For example, here is sample code for a Chi square distribution for 4 degrees of freedom:
library(ggplot2)
ggplot(data.frame(x = c(0, 20)), aes(x = x)) +
stat_function(fun = dchisq, args = list(df = 4))

Create sample vector data in R with a skewed distribution with limited range

I want to create in R a sample vector of data in R, in which I can control the range of values selected, so I think I want to use sample to limit the range of values generated rather than an rnorm-type command that generates a range of values based upon the type of distribution, variance, SD, etc.
So I'm looking to do a sample with a specified range (e.g. 1-5) for a skewed distribution something like this:
x=rexp(100,1/10)
Here's what I have but does not provide a skewed distribution:
y=sample(1:5,234, replace=T)
How can I have my cake (limited range) and eat it too (skewed distribution), so to speak.
Thanks
set.seed(3)
hist(sample(1:10, size = 100, replace = TRUE, prob = 10:1))
The beta distribution takes values from 0 to 1. If you want your values to be from 0 to 5 for instance, then you can multiply them by 5. Finally, you can get a "skewness" with the beta distribution.
For example, for the skewness you can get these three types:
And using R and beta distribution you can get similar distributions as follows. Notice that the Green Vertical line refers to mean and the Red to median:
x= rbeta(10000,5,2)
hist(x, main="Negative or Left Skewness", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
x= rbeta(10000,2,5)
hist(x, main="Positive or Right Skewness", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
x= rbeta(10000,5,5)
hist(x, main="Symmetrical", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)), col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))
To better see what the sample function is doing with integers, use the barplot function, not the histogram function:
set.seed(3)
barplot(table(sample(1:10, size = 100, replace = TRUE, prob = 10:1)))

Resources