I have data cdecn:
set.seed(0)
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn))
I have created a plot which displays the cumulative probabilities.
aprob <- ecdf(a)
plot(aprob)
I am wondering how I can switch the x-axis and y-axis to get a new plot, i.e., the inverse of ECDF.
Also, for the new plot, is there a way to add a vertical line through where the my curve intersects 0?
We can do the following. My comments along the code is very explanatory.
## reproducible example
set.seed(0)
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn)) ## random samples
a <- sort(a) ## sort samples in ascending order
e_cdf <- ecdf(a) ## ecdf function
e_cdf_val <- 1:length(a) / length(a) ## the same as: e_cdf_val <- e_cdf(a)
par(mfrow = c(1,2))
## ordinary ecdf plot
plot(a, e_cdf_val, type = "s", xlab = "ordered samples", ylab = "ECDF",
main = "ECDF")
## switch axises to get 'inverse' ECDF
plot(e_cdf_val, a, type = "s", xlab = "ECDF", ylab = "ordered sample",
main = "'inverse' ECDF")
## where the curve intersects 0
p <- e_cdf(0)
## [1] 0.01578947
## highlight the intersection point
points(p, 0, pch = 20, col = "red")
## add a dotted red vertical line through intersection
abline(v = p, lty = 3, col = "red")
## display value p to the right of the intersection point
## round up to 4 digits
text(p, 0, pos = 4, labels = round(p, 4), col = "red")
cdecn <- sample(1:10,570,replace=TRUE)
a <- rnorm(cdecn,mean(cdecn),sd(cdecn))
aprob <- ecdf(a)
plot(aprob)
# Switch the x and y axes
x <- seq(0,1,0.001754386)
plot(y=knots(aprob), x=x, ylab = "Fn(y)")
# Add a 45 degree straight line at 0, 0
my_line <- function(x,y,...){
points(x,y,...)
segments(min(x), y==0, max(x), max(y),...)
}
lines(my_line(x=x, y = knots(aprob)))
The "straight line at x==0" bit makes me suspect that you want a QQplot:
qqnorm(a)
qqline(a)
Related
I am planning to reproduce the attached figure, but I have no clue how to do so:
Let´s say I would be using the CO2 example dataset, and I would like to plot the relative change of the Uptake according to the Treatment. Instead of having the three variables in the example figure, I would like to show the different Plants grouped for each day/Type.
So far, I managed only to get this bit of code, but this is far away from what it should look like.
aov1 <- aov(CO2$uptake~CO2$Type+CO2$Treatment+CO2$Plant)
plot(TukeyHSD(aov1, conf.level=.95))
Axes should be switched, and I would like to add statistical significant changes indicated with letters or stars.
You can do this by building it in base R - this should get you started. See comments in code for each step, and I suggest running it line by line to see what's being done to customize for your specifications:
Set up data
# Run model
aov1 <- aov(CO2$uptake ~ CO2$Type + CO2$Treatment + CO2$Plant)
# Organize plot data
aov_plotdata <- data.frame(coef(aov1), confint(aov1))[-1,] # remove intercept
aov_plotdata$coef_label <- LETTERS[1:nrow(aov_plotdata)] # Example labels
Build plot
#set up plot elements
xvals <- 1:nrow(aov_plotdata)
yvals <- range(aov_plotdata[,2:3])
# Build plot
plot(x = range(xvals), y = yvals, type = 'n', axes = FALSE, xlab = '', ylab = '') # set up blank plot
points(x = xvals, y = aov_plotdata[,1], pch = 19, col = xvals) # add in point estimate
segments(x0 = xvals, y0 = aov_plotdata[,2], y1 = aov_plotdata[,3], lty = 1, col = xvals) # add in 95% CI lines
axis(1, at = xvals, label = aov_plotdata$coef_label) # add in x axis
axis(2, at = seq(floor(min(yvals)), ceiling(max(yvals)), 10)) # add in y axis
segments(x0=min(xvals), x1 = max(xvals), y0=0, lty = 2) #add in midline
legend(x = max(xvals)-2, y = max(yvals), aov_plotdata$coef_label, bty = "n", # add in legend
pch = 19,col = xvals, ncol = 2)
I am analyzing difference scores with polynomial regression in R. Based on [Edwards and Parry's (1993)][1] recommendations I have been trying to combine a persp() plot with a contour() plot. I would also need to plot the first two principal axes on the contour plot. My attempts so far have only provided me with each individual plot, but I don't know how to combine them. An example for the end-result is :
Edwards & Parry (1993) example difference score visualisation
I manage to get the persp() plot just fine. I have also obtained the contour plot. I can't seem to find any way to combine the two. I have managed to make the plot in plotly using the add_surface() option in the pipeline. My problem with the output is that the surface is smooth, and the contourplot lacks the values in the plot. Basically: persp() and contour() are visualised in a way that is extremely similar to the look I'm aiming for, per the example in the source.
My current attempt (in minimalistic code) is as follows:
surface <- function(e, i){
y <- .2*e + .14*i + .08*e^2 + + .1*e*i + .2*i^2
}
e <- i <- seq(-3, 3, length= 20)
y <- outer(e, i, surface)
persp(e, i, y,
xlab = 'Explicit',
ylab = 'Implicit',
zlab = 'Depression',
theta = 45)
contour(e,i,y)
So basically my question is: how can I make a plot like Edwards and Parry (1993) make, with a similar visual style, in R. It does not have to be base-R, I'm happy with any method. I've been stuck on this problem for a week now.
My attempt in plotly (to compare it to my desired end-result) is:
if(!"plotly" %in% installed.packages){install.packages('plotly')}
library(plotly)
plot_ly(z = ~y) %>% add_surface(x = ~e, y= ~i, z= ~y,
contours = list(
z = list(
show=TRUE,
usecolormap=FALSE,
highlightcolor="#ff0000",
project=list(z=TRUE)
)
)
) %>%
layout(
scene=list(
xaxis = list(title = "Explicit"),
yaxis = list(title = "Implicit"),
zaxis = list(title = "Depression")
)
)
[1]: Edwards, J. R., & Parry, M. E. (1993). On the use of polynomial regression as an alternative to difference scores. Academy of Management Journal, 36(6), 1577–1613. https://doi.org/10.2307/256822
I have found an answer and I will share it here. It seems it cannot be done in base-R. But the RSM-package allows for the addition of contour lines to the base of the plot.
In this answer I will give a minimal example of:
the persp() plot
the contour lines in the base
addition of x=y and x=-y axis
calculation and addition of the first and second principal axis
The only thing I could not solve is that the lines now are drawn over the surface. I don't know how to solve it.
library(rsm)
x <- seq(-3,3,by=0.25)
y <- seq(-3,3,by=0.25)
d <- expand.grid(x=x,y=y)
z <- c(data=NA,1089)
b0 = .140; b1 = -.441; b2 = -.154; b3 = .161 ; b4 =-.106; b5 = .168
k=1
for (i in 1:25) {
for (j in 1:25) {
z[k]=b0+b1*x[i]+b2*y[j]+b3*x[i]*x[i]+b4*x[i]*y[j]+ b5*y[j]*y[j]
k=k+1
} }
data.lm <- lm(z~poly(x,y,degree=2),data=d)
res1 <- persp(data.lm,x~y,
zlim=c(-2,max(z)),
xlabs = c('X','Y'),
zlab = 'Z',
contour=list(z="bottom"),
theta=55,
phi=25)
# draw x=y line (lightly dotted)
xy_pos <- matrix(c(-3,-3,3,3),ncol=2,byrow = T)
lines(trans3d(xy_pos[,2], xy_pos[,1], z=-2, pmat = res1$`y ~ x`$transf),
lty = 3,
col = 'darkgrey')
# draw x=-y line (lightly dotted)
xy_neg <- matrix(c(-3,3,3,-3),ncol=2,byrow = T)
lines(trans3d(xy_neg[,2], xy_neg[,1], z=-2, pmat = res1$`y ~ x`$transf),
lty = 3,
col = 'darkgrey')
# Find stationary points:
X0 <- (b2*b4 - 2*b1*b5) / (4*b3*b5 - b4^2)
Y0 <- (b1*b4 - 2*b2*b3) / (4*b3*b5 - b4^2)
# First Principal Axis
p11 = (b5-b3+sqrt((b3-b5)^2+b4^2))/b4
p10 = Y0 - p11*X0
Ypaf1 = p10 + p11*x
# plot first principal axis (full line)
xypaf1 <- matrix(c(Ypaf1[1], -3, Ypaf1[25], 3),ncol=2, byrow=T)
lines(trans3d(xypaf1[,2], xypaf1[,1], z=-2, pmat = res1$`y ~ x`$transf),
lty = 1,
col = 'black')
# Second Principal Axis
p21 = (b5-b3-sqrt((b3-b5)^2+b4^2))/b4
p20 = Y0 - p21*X0
Ypaf2 = p20 + p21*x
# plot second principal axis (dashed line)
xypaf2 <- matrix(c(Ypaf2[1], -3, Ypaf2[25], 3),ncol=2, byrow=T)
lines(trans3d(xypaf2[,2], xypaf2[,1], z=-2, pmat = res1$`y ~ x`$transf),
lty = 2,
col = 'black')
Background
I have a function called TPN. When you run this function, it produces two plots (see picture below). The bottom-row plot samples from the top-row plot.
Question
I'm wondering how I could fix the ylim of the bottom-row plot to be always (i.e., regardless of the input values) the same as ylim of the top-row plot?
R code is provided below the picture (Run the entire block of code).
############## Input Values #################
TPN = function( each.sub.pop.n = 150,
sub.pop.means = 20:10,
predict.range = 10:0,
sub.pop.sd = .75,
n.sample = 2 ) {
#############################################
par( mar = c(2, 4.1, 2.1, 2.1) )
m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)
set.seed(2460986)
Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')
y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )
set.seed(NULL)
x <- rep(predict.range, each = each.sub.pop.n)
plot(x, y) ## Plot #1
sample <- lapply(split(y, x), function(z) sample(z, n.sample, replace = TRUE))
sample <- data.frame(y = unlist(sample),
x = as.numeric(rep(names(sample), each = n.sample)))
plot(sample$x, sample$y) ## Plot # 2
}
## TEST HERE:
TPN()
You can get the ylim using par("yaxp")[1:2]. So, you can change the second plot code to have its ylim as the first plot's:
plot(sample$x, sample$y, ylim = par("yaxp")[1:2]) ## Plot # 2
or as mentioned in the comments, you can simply set the ylim for both plots to be range of both data-sets and add that to both plots:
ylim = range(c(y, sample$y))
Another option: Produce the same plot again but with type = "n" and then filling the points with points(). For example, change your plot 2 to
plot(x, y, type = "n")
points(sample$x, sample$y)
A benefit of this approach is that everything in the plot will be exactly the same, not just the y-axis (which may or may not matter for your function).
If all my data is a long row of comma separated values (duration times), then R can give me a lot of info by
summary(dat)
Question
Can R also make a plot of the data with confidence intervals?
As the first google match stated, you can try doing as:
n<-50
x<-sample(40:70,n,rep=T)
y<-.7*x+rnorm(n,sd=5)
plot(x,y,xlim=c(20,90),ylim=c(0,80))
mylm<-lm(y~x)
abline(mylm,col="red")
newx<-seq(20,90)
prd<-predict(mylm,newdata=data.frame(x=newx),interval = c("confidence"),
level = 0.90,type="response")
lines(newx,prd[,2],col="red",lty=2)
lines(newx,prd[,3],col="red",lty=2)
Of course it is one of possibilities
Another example would be:
x <- rnorm(15)
y <- x + rnorm(15)
new <- data.frame(x = seq(-3, 3, 0.5))
pred.w.clim <- predict(lm(y ~ x), new, interval="confidence")
# Just create a blank plot region with axes first. We'll add to this
plot(range(new$x), range(pred.w.clim), type = "n", ann = FALSE)
# For convenience
CI.U <- pred.w.clim[, "upr"]
CI.L <- pred.w.clim[, "lwr"]
# Create a 'loop' around the x values. Add values to 'close' the loop
X.Vec <- c(new$x, tail(new$x, 1), rev(new$x), new$x[1])
# Same for y values
Y.Vec <- c(CI.L, tail(CI.U, 1), rev(CI.U), CI.L[1])
# Use polygon() to create the enclosed shading area
# We are 'tracing' around the perimeter as created above
polygon(X.Vec, Y.Vec, col = "grey", border = NA)
# Use matlines() to plot the fitted line and CI's
# Add after the polygon above so the lines are visible
matlines(new$x, pred.w.clim, lty = c(1, 2, 2), type = "l", col =
c("black", "red", "red"))
Example 1
Example 2
I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result: