I repeat 50 times a rnorm with n=100, mean=100 and sd=25. Then I plot the histogram of all the sample means, but now I need to overlay a normal curve over the histogram.
x <- replicate(50, rnorm(100, 100, 25), simplify = FALSE)
x
sapply(x, mean)
sapply(x, sd)
hist(sapply(x, mean))
Do you know ow to overlay a normal curve over the histogram of the means?
Thanks
When we plot the density rather than the frequency histogram by setting freq=FALSE, we may overlay a curve of a normal distribution with the mean of the means. For the xlim of the curve we use the range of the means.
mean.of.means <- mean(sapply(x, mean))
r <- range(sapply(x, mean))
v <- hist(sapply(x, mean), freq=FALSE, xlim=r, ylim=c(0, .5))
curve(dnorm(x, mean=mean.of.means, sd=1), r[1], r[2], add=TRUE, col="red")
Also possible is to draw a sufficient amount of a normal distribution, and overlay the histogram with the lines of the density distribution.
lines(density(rnorm(1e6, mean.of.means, 1)))
Note, that I have used 500 mean values in my answer, since the comparison with a normal distribution may become meaningless with too few values. However, you can play with the breaks= option in the histogram function.
Data
set.seed(42)
x <- replicate(500, rnorm(100, 100, 25), simplify = FALSE)
Related
I want to create 100 samples from a normal distribution. For the first class, the mean is to be taken as (0,0) and covariance matrix as [(1,0),(0,1)]. For the second class, the mean is to be taken as (5,0) but the covariance matrix is the same as for the first class and finally would like to visualize all 200 instances in a single plot with different colors for each class.
My problem is: When I generate this plot I am unsure about the final plot whether it actually has a volume of 200 samples.
My approach:
a1 <- c(1,0)
a2 <- c(0,1)
M <- cbind(a1, a2)
x <- cov(M)
dev <- sd(x, na.rm = FALSE)
C0 <- sample(rnorm(100, mean=0, sd=dev), size=100, replace=T)
C1 <- sample(rnorm(100, mean=5, sd=dev), size=100, replace=T)
plot(C0,C1, col=c("red","blue"), main = '200 samples, with mean 0 and 5 and S.D=0.5')
legend("topright", 95, legend=c("C0", "C1"),
col=c("red", "blue"), lty=1:2, cex=0.8)
I would like to know the corrections in my code.
plot
Aside from the plotting issue mentioned in the other answer, it seems from your description like you want to sample from two 2D multivariate normal distributions with different means.
If so, you can simply use the mvtnorm library to sample from these distributions, which is the multivariate normal distribution.
library(mvtnorm)
C0 <- rmvnorm(100, c(0,0), M) # 100 samples, means (0, 0), covariance mtx M
C1 <- rmvnorm(100, c(5,0), M)
Right now, you take the covariance of the covariance matrix you have by typing x <- cov(M). This doesn't make much sense unless I'm misunderstanding what you're trying to accomplish.
EDIT: This is the full code for what I think you're trying to accomplish:
a1 <- c(1, 0)
a2 <- c(0, 1)
M <- cbind(a1, a2)
C0 <- rmvnorm(100, c(0, 0), M)
C1 <- rmvnorm(100, c(5, 0), M)
plot(C0, col = "red", xlim = c(-5, 10), ylim = c(-5, 5), xlab = "X", ylab = "Y")
points(C1, col = "blue")
legend("topright", inset = .05, c("Class 1", "Class 2"), fill = c("red", "blue"))
which outputs the plot
Your x and y axes demonstrate that you're plotting C1 against C0. That's why your y-axis has its midpoint at 5 and the x-axis has it at 0. What you've done is plot 100 points with their x-coordinate from C0 and y-coordinate from C1.
Short of counting them, proving that you have 100 points on the screen is difficult. I know of no way to access the data that R has used to display your plot. However, one trick is to call text(C0,C1,label=1:150) after your code. This adds the numbers 1:150 to your plot, with each number having a corresponding label. If you had 200 points, this would be a tidy plot. However, since you have 100, many are labelled twice, making the plot unreadable.
If we make a new plot and use text(C0,C1,label=1:100) instead, things are much more clear:
I would like to plot the line and the 95% confidence interval from a linear model where the response has been logit transformed back on the original scale of the data. So the result should be a curved line including the confidence intervals on the original scale, where it would be a straight line on the logit transformed scale. See code:
# Data
dat <- data.frame(c(45,75,14,45,45,55,65,15,3,85),
c(.37, .45, .24, .16, .46, .89, .16, .24, .23, .49))
colnames(dat) <- c("age", "bil.")
# Logit transformation
dat$bb_logit <- log(dat$bil./(1-dat$bil.))
# Model
modelbb <- lm(bb_logit ~ age + I(age^2), data=dat)
summary(modelbb)
# Backtranform
dat$bb_back <- exp(predict.lm(modelbb))/ (1 + exp(predict.lm(modelbb)))
# Plot
plot(dat$age, dat$bb_back)
abline(modelbb)
What do I try here is to plot the curved regression line and add the confidence interval. Within ggplot2 there is the geom_smooth function where the the linear model can be specified, but I could not find a way of plotting the predictions from the predict.lm(my model).
I would also like to know how to add a coloured polygon which will represent the confidence interval as in the image below. I know I have to use function polygon and coordinates but I do not know how.
You may use predict on an age range say 1:100, specify interval= option for the CIs. Plotting with type="l" will smooth a nice curve. Confidence intervals then can be added using lines.
p <- predict(modelbb, data.frame(age=1:100), interval="confidence")
# Backtransform
p.tr <- exp(p) / (1 + exp(p))
plot(1:100, p.tr[,1], type="l", ylim=range(p.tr), xlab="age", ylab="bil.")
sapply(2:3, function(i) lines(1:100, p.tr[,i], lty=2))
legend("topleft", legend=c("fit", "95%-CI"), lty=1:2)
Yields
Edit
To get shaded confidence bands use polygon. Since you want two confidence levels you probably need to make one prediction for each. The line will get covered by the polygons, so it's better to make an empty plot first using type="n" and draw the lines at the end. (Note that I'll also show you some hints for custom axis labeling.) The trick for the polygons is to express the values back and forth using rev.
p.95 <- predict(modelbb, data.frame(age=1:100), interval="confidence", level=.95)
p.99 <- predict(modelbb, data.frame(age=1:100), interval="confidence", level=.99)
# Backtransform
p.95.tr <- exp(p.95) / (1 + exp(p.95))
p.99.tr <- exp(p.99) / (1 + exp(p.99))
plot(1:100, p.99.tr[,1], type="n", ylim=range(p.99.tr), xlab="Age", ylab="",
main="", yaxt="n")
mtext("Tree biomass production", 3, .5)
mtext("a", 2, 2, at=1.17, xpd=TRUE, las=2, cex=3)
axis(2, (1:5)*.2, labels=FALSE)
mtext((1:5)*2, 2, 1, at=(1:5)*.2, las=2)
mtext(bquote(Production ~(kg~m^-2~year^-1)), 2, 2)
# CIs
polygon(c(1:100, 100:1), c(p.99.tr[,2], rev(p.99.tr[,3])), col=rgb(.5, 1, .2),
border=NA)
polygon(c(1:100, 100:1), c(p.95.tr[,2], rev(p.95.tr[,3])), col=rgb(0, .8, .5),
border=NA)
# fit
lines(1:100, p.99.tr[,1], ylim=range(p.99.tr), lwd=2)
#legend
legend("topleft", legend=c("fit", "99%-CI", "95%-CI"), lty=c(1, NA, NA), lwd=2,
pch=c(NA, 15, 15), bty="n",
col=c("#000000", rgb(.5, 1, .2), rgb(0, .8, .5)))
Yields
I have the following data:
I plotted the points of that data and then smoothed it on the plot using the following code :
scatter.smooth(x=1:length(Ticker$ROIC[!is.na(Ticker$ROIC)]),
y=Ticker$ROIC[!is.na(Ticker$ROIC)],col = "#AAAAAA",
ylab = "ROIC Values", xlab = "Quarters since Feb 29th 2012 till Dec 31st 2016")
Now I want to find the Point-wise slope of this smoothed curve. Also fit a trend line to the smoothed graph. How can I do that?
There are some interesting R packages that implement nonparametric derivative estimation. The short review of Newell and Einbeck can be helpful: http://maths.dur.ac.uk/~dma0je/Papers/newell_einbeck_iwsm07.pdf
Here we consider an example based on the pspline package (smoothing splines with penalties on order m derivatives):
The data generating process is a negative logistic models with an additive noise (hence y values are all negative like the ROIC variable of #ForeverLearner) :
set.seed(1234)
x <- sort(runif(200, min=-5, max=5))
y = -1/(1+exp(-x))-1+0.1*rnorm(200)
We start plotting the nonparametric estimation of the curve (the black line is the true curve and the red one the estimated curve):
library(pspline)
pspl <- smooth.Pspline(x, y, df=5, method=3)
f0 <- predict(pspl, x, nderiv=0)
Then, we estimate the first derivative of the curve:
f1 <- predict(pspl, x, nderiv=1)
curve(-exp(-x)/(1+exp(-x))^2,-5,5, lwd=2, ylim=c(-.3,0))
lines(x, f1, lwd=3, lty=2, col="red")
And here the second derivative:
f2 <- predict(pspl, x, nderiv=2)
curve((exp(-x))/(1+exp(-x))^2-2*exp(-2*x)/(1+exp(-x))^3, -5, 5,
lwd=2, ylim=c(-.15,.15), ylab=)
lines(x, f2, lwd=3, lty=2, col="red")
#DATA
set.seed(42)
x = rnorm(20)
y = rnorm(20)
#Plot the points
plot(x, y, type = "p")
#Obtain points for the smooth curve
temp = loess.smooth(x, y, evaluation = 50) #Use higher evaluation for more points
#Plot smooth curve
lines(temp$x, temp$y, lwd = 2)
#Obtain slope of the smooth curve
slopes = diff(temp$y)/diff(temp$x)
#Add a trend line
abline(lm(y~x))
This question is related to two different questions I have asked previously:
1) Reproduce frequency matrix plot
2) Add 95% confidence limits to cumulative plot
I wish to reproduce this plot in R:
I have got this far, using the code beneath the graphic:
#Set the number of bets and number of trials and % lines
numbet <- 36
numtri <- 1000
#Fill a matrix where the rows are the cumulative bets and the columns are the trials
xcum <- matrix(NA, nrow=numbet, ncol=numtri)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(5/6,1/6), replace = TRUE)
xcum[,i] <- cumsum(x)/(1:numbet)
}
#Plot the trials as transparent lines so you can see the build up
matplot(xcum, type="l", xlab="Number of Trials", ylab="Relative Frequency", main="", col=rgb(0.01, 0.01, 0.01, 0.02), las=1)
My question is: How can I reproduce the top plot in one pass, without plotting multiple samples?
Thanks.
You can produce this plot...
... by using this code:
boring <- function(x, occ) occ/x
boring_seq <- function(occ, length.out){
x <- seq(occ, length.out=length.out)
data.frame(x = x, y = boring(x, occ))
}
numbet <- 31
odds <- 6
plot(1, 0, type="n",
xlim=c(1, numbet + odds), ylim=c(0, 1),
yaxp=c(0,1,2),
main="Frequency matrix",
xlab="Successive occasions",
ylab="Relative frequency"
)
axis(2, at=c(0, 0.5, 1))
for(i in 1:odds){
xy <- boring_seq(i, numbet+1)
lines(xy$x, xy$y, type="o", cex=0.5)
}
for(i in 1:numbet){
xy <- boring_seq(i, odds+1)
lines(xy$x, 1-xy$y, type="o", cex=0.5)
}
You can also use Koshke's method, by limiting the combinations of values to those with s<6 and at Andrie's request added the condition on the difference of Ps$n and ps$s to get a "pointed" configuration.
ps <- ldply(0:35, function(i)data.frame(s=0:i, n=i))
plot.new()
plot.window(c(0,36), c(0,1))
apply(ps[ps$s<6 & ps$n - ps$s < 30, ], 1, function(x){
s<-x[1]; n<-x[2];
lines(c(n, n+1, n, n+1), c(s/n, s/(n+1), s/n, (s+1)/(n+1)), type="o")})
axis(1)
axis(2)
lines(6:36, 6/(6:36), type="o")
# need to fill in the unconnected points on the upper frontier
Weighted Frequency Matrix is also called Position Weight Matrix (in bioinformatics).
It can be represented in a form of a sequence logo.
This is at least how I plot weighted frequency matrix.
library(cosmo)
data(motifPWM); attributes(motifPWM) # Loads a sample position weight matrix (PWM) containing 8 positions.
plot(motifPWM) # Plots the PWM as sequence logo.
What I want to do sounds simple. I want to plot a normal IQ curve with R with a mean of 100 and a standard deviation of 15. Then, I'd like to be able to overlay a scatter plot of data on top of it.
Anybody know how to do this?
I'm guessing what you want to do is this: you want to plot the model normal density with mean 100 and sd = 15, and you want to overlay on top of that the empirical density of some set of observations that purportedly follow the model normal density, so that you can visualize how well the model density fits the empirical density. The code below should do this (here, x would be the vector of actual observations but for illustration purposes I'm generating it with a mixed normal distribution N(100,15) + 15*N(0,1), i.e. the purported N(100,15) distribution plus noise).
require(ggplot2)
x <- round( rnorm( 1000, 100, 15 )) + rnorm(1000)*15
dens.x <- density(x)
empir.df <- data.frame( type = 'empir', x = dens.x$x, density = dens.x$y )
norm.df <- data.frame( type = 'normal', x = 50:150, density = dnorm(50:150,100,15))
df <- rbind(empir.df, norm.df)
m <- ggplot(data = df, aes(x,density))
m + geom_line( aes(linetype = type, colour = type))
Well, it's more like a histogram, since I think you are expecting these to be more like an integer rounded process:
x<-round(rnorm(1000, 100, 15))
y<-table(x)
plot(y)
par(new=TRUE)
plot(density(x), yaxt="n", ylab="", xlab="", xaxt="n")
If you want the theoretic value of dnorm superimposed, then use one of these:
lines(sort(x), dnorm(sort(x), 100, 15), col="red")
-or
points(x, dnorm(x, 100, 15))
You can generate IQ scores PDF with:
curve(dnorm(x, 100, 15), 50, 150)
But why would you like to overlay scatter over density curve? IMHO, that's very unusual...
In addition to the other good answers, you might be interested in plotting a number of panels, each with its own graph. Something like this.