Drawing uniform Distributions with ggplot in R - r

I want to draw different uniform distributions in R, preferably with ggplot. When attempting to draw the pdf of U(0.35,0.55), it looks like there are values around the edges (i.e. the parameter values, in this case 0.35 and 0.55) that have a probability that is different from what it should be. The output can be seen here:
This is consistent across different parameters as inputs of the uniform distribution, and does not seem to be a scale issue.
A code example that reproduces similar results:
#parameters
alpha_1 <- 0.35
beta_1 <- 0.55
alpha_2 <- 0.5
beta_2 <- 0.7
alpha_3 <- 0.1
beta_3 <- 0.3
base <- ggplot() + xlim(-1, 2)
base +
geom_function(aes(colour = "state 1"), fun = dunif, args = list(alpha_1, beta_1)) +
geom_function(aes(colour = "state 2"), fun = dunif, args = list(alpha_2, beta_2)) +
geom_function(aes(colour = "state 3"), fun = dunif, args = list(alpha_3, beta_3))
Using stat_function instead of geom_function does not change anything.
However, the following works:
curve(dunif(x, min = alpha_1, max = beta_1),
from = 0, to = 1,
n = 100000,
col = "blue",
lwd = 2,
add = F,
yaxt = "n",
ylab = 'probability')
curve(dunif(x, min = alpha_2, max = beta_2),
from = 0, to = 1,
n = 100000,
col = "red",
lwd = 2,
add = T,
yaxt = "n",
ylab = 'probability')
curve(dunif(x, min = alpha_3, max = beta_3),
from = 0, to = 1,
n = 100000,
col = "black",
lwd = 2,
add = T,
yaxt = "n",
ylab = 'probability')
How can I get this to work in ggplot? It seems to me the issue might be this:
n values along the x-axis are evaluated to get coordinates for the graph. Around the parameters (a,b) these are, e.g. at a+0.0001 and a-0.0001. The straight line between these coordinates would not be perfectly vertical. I am not sure whether that is the correct diagnosis, but in any case I would appreciate any help. Thanks!

This is easily solved by increasing the number of evaluation points, as Gregor Thomas has helpfully pointed out in a comment. The following adjusted code works just as intended:
library(ggplot2)
#parameters
alpha_1 <- 0.35
beta_1 <- 0.55
alpha_2 <- 0.5
beta_2 <- 0.7
alpha_3 <- 0.1
beta_3 <- 0.3
base <- ggplot() + xlim(-1, 2)
base +
stat_function(aes(colour = "state 1"), fun = dunif, args = list(alpha_1, beta_1), n = 10001) +
stat_function(aes(colour = "state 2"), fun = dunif, args = list(alpha_2, beta_2), n = 10001) +
stat_function(aes(colour = "state 3"), fun = dunif, args = list(alpha_3, beta_3), n = 10001)
Where the key change is to add n=10001 to every stat_function call.

Related

How to produce overlapping QQ plots in R?

I would like to make an overlapping QQ-plot from the GWAS results similar to the attached figure. I have run two GWAS analyses and want to generate a figure where the QQ-plot from both GWAS are overlaid on one another. I am using the R-package "qqman" for that.
Can someone please tell me how to do that in R?
Thank you.
Sample figure
You can use the R package CMplot for producing overlapping QQ plots in R.
library(CMplot)
data(pig60K)
pig60K$trait1[sample(1:nrow(pig60K), round(nrow(pig60K)*0.80))] <- NA
pig60K$trait2[sample(1:nrow(pig60K), round(nrow(pig60K)*0.25))] <- NA
CMplot(pig60K,plot.type="q",col=c("dodgerblue1", "olivedrab3", "darkgoldenrod1"),threshold=1e-6,
ylab.pos=2,signal.pch=c(19,6,4),signal.cex=1.2,signal.col="red",conf.int=TRUE,box=FALSE,multracks=
TRUE,cex.axis=2,file="jpg",memo="",dpi=300,file.output=TRUE,verbose=TRUE,ylim=c(0,8),width=5,height=5)
This works
(abd = data.frame(a = runif(100, 0, 3),
b = runif(100, 346, 455),
d = runif(100, 3952, 4903)) %>%
ggplot() +
geom_qq(aes(sample = a, color = "a")) +
geom_qq(aes(sample = b, color = "b")) +
geom_qq(aes(sample = d, color = "d")) +
theme_minimal())
I'm not sure what your objective is exactly, but more than likely the data will overlap and hide important information. Perhaps what you really need is a bi-variate or multi-variate version of a qq plot? A chi-square qq plot using the Mahalanobis distance might be a better bet, if that's the case.
This plot will show with the arguments set for it with MVN::mvn()
abd = data.frame(a = runif(100, 0, 3),
b = runif(100, 346, 455),
d = runif(100, 3952, 4903))
MVN::mvn(abd, multivariatePlot = "qq", multivariateOutlierMethod = "quan")
You can manually create the chi-square qq plot this way:
(cvAbd = cov(and)) # covariance matrix
(dif = scale(abd, scale = F)) # scaling
# solve() used in next call calculates the inverse matrix
(d = diag(dif %*%
solve(cvAbd) %*% # matrix multiplication
t(dif)))
(r = rank(d)) # ranking
n = dim(abd)[1] # number of observations
p = dim(abd)[2] # number of variables
(ch <- qchisq((r - 0.5)/n, p)) # determine chi-square quantiles
ggplot(data = data.frame(d = d, ch = ch),
aes(d, ch)) +
geom_point() +
ggtitle("Chi-Square Q-Q Plot") +
xlab("Squared Mahalanobis Distance") +
ylab("Chi-Square Quantile") +
theme_minimal()
You were looking for something different -- try this
you can replace runif(10000,.5,1) with your data,
-o and e for the same vector (same data in both)
-o2 and e2 is a second vector or second model (same data in both)
This uses plotly...
# first model
o = -log10(sort(runif(10000,.5,1), decreasing = FALSE))
e = -log10(ppoints(length(runif(10000,.5,1))))
# second model
o2 = -log10(sort(runif(10000,0,1), decreasing = FALSE))
e2 = -log10(ppoints(length(runif(10000,0,1))))
plotly::plot_ly(x = ~e, y = ~o, name = "p1",
type = "scatter", mode = "markers") %>%
add_trace(x = ~e2, y = ~o2, name = "p2") %>%
add_trace(x = c(0, max(e,e2)), y = c(0, max(e,e2)),
mode="lines", name = "Log normal") %>%
layout(xaxis = list(title = 'Expected -log[10](<i>p</i>)'),
yaxis = list(title = 'Observed -log[10](<i>p</i>)'))
You requested a ggplot2 version -
ggplot(data.frame(o = o, e = e, o2 = o2, e2 = e2),
aes(e, o, color = "First Model")) +
geom_point() +
geom_point(aes(o2, e2, color = "Second Model")) +
geom_abline(intercept = 0, slope = 1, color = "darkred") +
scale_color_discrete("") +
xlab(expression(paste('Expected -log[10](', italic('p'),')'))) +
ylab(expression(paste('Observed -log[10](', italic('p'),')'))) +
theme_minimal()

lines function returning too many random lines

I have a weird problem with drawing a graph with confidence intervals for predictions.
Here is my code:
rm(list = ls())
cat("\014")
set.seed(1)
file.name <- "testRegresji.pdf"
count = 20
pdf(file.name)
x <- runif(count, 0, 2)
y <- x + rnorm(count)
model <-lm(y ~ x)
xlab.label <- paste("y = ", format(model$coeff[1], digits = 4),
"+", format(model$coeff[2], digits = 4),
"* x + e")
plot(x, y, xlab = xlab.label, ylab = "", main = paste("n = ", count), col = 8)
matlines(x, predict(model, interval = "confidence"),
type = 'l', lty = c(1, 2, 2), col = "black")
abline(0, 1, col = grey(0.4), lwd = 3)
dev.off()
shell.exec(paste(getwd(), "/", file.name, sep = ""))
The resulting graph looks very weird with too many random lines for confidence intervals, although the result of predict function are correct.
Here's the screenshot of the graph:
What could be the issue? Thanks a lot for any help!

Plotting the 95% confidence interval for means drawn from a normal distribution

I have drawn 100 samples of size 10 from a normal distribution with a mean of 10 and standard deviation of two. Code below:
n <- 10
nreps<-100
sample.mean<-numeric(nreps)
for (i in 1:nreps) {
sample <- rnorm(n=n, mean = 10, sd = 2)
sample.mean[i] <- mean(sample)
a <- qnorm(0.95*2/sqrt(n))
ci <- a
}
plot(sample.mean, 1:100)
I want to create a graph that looks like this
This is what I currently have
I know I need to interpret the left hand and right hand bounds of each mean and then insert a horizontal line between them. Means that fall outside the 95% confidence interval are supposed to be colored differently than the rest. I am just beginning to learn R, so a helpful walk-through would be very appreciated.
try it this way:
library(ggplot2)
set.seed(1321)
n <- 10
sd <- 2
n.reps <- 100
my.mean <- 10
alpha <- 0.05
mydata <- matrix(rnorm(n = n.reps * n, mean = my.mean, sd =sd), ncol = n.reps)
sample.means <- apply(mydata, 2, mean)
error <- apply(mydata, 2, function(x) qt(p=1-alpha/2,df=length(x)-1)*sd (x)/sqrt(length(x)))
dfx <- data.frame(sample.means, error, lcl = sample.means-error, ucl = sample.means+error, trial = 1:n.reps)
dfx$miss <- dfx$ucl < my.mean | dfx$lcl > my.mean
ggplot(dfx, aes(x = sample.means, y = trial, xmin = lcl, xmax = ucl, color = miss)) + geom_errorbarh() + geom_point(pch = 1) +
geom_vline(aes(xintercept=my.mean), lty=2) +
xlab("True Mean in Blue and 95% Confidence Intervals") + ylab ("Trial") + ggtitle(paste("Successful CI's:", 100*mean(!dfx$miss), "%")) + scale_color_manual(values = c("green", "red")) +
theme_bw()
or use base:
oldpar <- par(xpd=FALSE)
par(mar=c(8.1, 3.1, 3.1, 4.1))
with(subset(dfx, !miss), plot(sample.means, trial,
xlab = "Sample Mean",
ylab = "Trial",
col = "forestgreen",
xlim=c(min(dfx$lcl), max(dfx$ucl))))
with(subset(dfx, miss), points(sample.means, trial,
col = "red"))
with(subset(dfx, miss), segments(lcl, trial, ucl, trial, col = "red"))
with(subset(dfx, !miss), segments(lcl, trial, ucl, trial, col = "forestgreen"))
abline(v = my.mean, lty = 2, lwd = 2, col = "blue")
par(xpd=TRUE)
legend("bottomright", c("Successful CI", "Miss"), lty = c(1,1), col = c("forestgreen", "red"),
inset=c(-0.1,-0.45))
title(main = paste("Successful CI's:", 100*mean(!dfx$miss), "%"),
sub = "True mean (in blue) and CI's")
par(oldpar)
HTH
James

ggplot2 grey-scale-schemes: suggestions for clarifying changepoint plot

I can plot multiple simultaneous time series that undergo changepoints and regimes using ggplot2, and I can use colour to make the regimes clear (plotting different sections in different colors using geom_rect). I need to produce a plot that makes it clear where the regimes are without the use of color. With three regimes it is possible to distinguish between the regimes using white, black and gray for shading, but it is difficult to tell them apart if more than three regimes are present.
I've put an example of a plot that I can make using color, I'd be very grateful if someone can suggest a plot that conveys the same information without the use of color.
library(ggplot2)
library(scales)
# generate 3 time series and store them in a data frame
generate_cp_ts <- function(tau, params) {
ts(c(arima.sim(model = list(ar = 0.2), n = tau[1], rand.gen = function(n) params[1] * rnorm(n)), arima.sim(model = list(ar = 0.2), n = tau[2] - tau[1], rand.gen = function(n) params[2] * rnorm(n)), arima.sim(model = list(ar = 0.2), n = tau[3] - tau[2], rand.gen = function(n) params[3] * rnorm(n)), arima.sim(model = list(ar = 0.2), n = tau[4] - tau[3], rand.gen = function(n) params[4] * rnorm(n))))
}
tau <- 100 * (1:4)
ts1 <- generate_cp_ts(tau, c(1.7, 0.3, 1.7, 1.7))
ts2 <- generate_cp_ts(tau, c(0.3, 2, 0.3, 0.9))
ts3 <- generate_cp_ts(tau, c(2, 2, 0.1, 0.7))
tsframe <- data.frame(ts = c(ts1, ts2, ts3), ts_level = factor(paste("Time Series", rep(1:3, each = 400))), time = rep(1:400, 3))
# Work out which colors are needed to color the plot and store in a data frame
CPs <- c(0, tau)
colour.frame <- data.frame(regime.from = rep(CPs[-length(CPs)], each = 3), regime.to = rep(CPs[-1], each = 3), ts_level = factor(paste("Time Series", rep(c(1:3), length(CPs) - 1))), regime = factor(c(0,0,0, 1,1,0, 0,0,1, 0,2,2) + 1))
# Plotting
qplot(x = time, y = ts, data = tsframe, facets = ts_level ~ ., alpha = I(1), geom = "line", ylab = "Time Series", xlab = "Time") +
geom_rect(aes(NULL, NULL, xmin = regime.from, xmax = regime.to, fill = regime), ymin = -6, ymax = 6, data = colour.frame) +
scale_fill_manual(values = alpha(c("blue", "red", "green"), 0.2))
Plot generated by the above code
After you created colour.frame you can insert this code:
tdf <- colour.frame
tdf$xval <- (tdf$regime.from + tdf$regime.to)/2
tdf$yval <- max(tsframe$ts) * 0.8 # if 0.8 is higher (0.9) then the text is set higher
ggplot(tsframe, aes(x = time, y = ts)) +
geom_line() +
facet_grid(ts_level~.) +
geom_vline(xintercept = CPs) + # maybe play around with linetype
geom_text(aes(x = xval, y = yval, label = regime), data = tdf)
which gives this plot:

Density Dependent Growth

I'm trying to create a graph in R to show the carrying capacity of a population using an example given to me which is:
install.packages("deSolve", dependencies = TRUE)
clogistic <- function(times, y, parms){
n <- y[1]
r <- parms[1]
alpha <- parms [2]
dN.dt <- r * n * (1 - alpha * n)
return(list(c(dN.dt)))
}
prms <- c(r = 1, alpha = 0.01)
init.N <- c(1)
t.s <- seq(0.1, 10, by = 0.1)
library(deSolve)
out <- ode(y = init.N, times = t.s, clogistic, parms = prms)
plot(out[,1], out[,2], type="l", xlab = "Time", ylab = "N", col = "blue", lwd = 2)
Now I'm using this to try and show a starting population of 178 with an increase of 21 for 15 time steps. But when I try to change the formula it decreases and bottoms out after one time step and stays bottom for the remainder of the time.
I've tried changing init.N <- c(1) to c(178) which it does but then bottoms out. I've tried changing prms <- c(r = 1, alpha = 0.01) to (r = 21, along with the change in initial population change and without but it just doesn't increase. What is it that I am missing? Knowing R it's going to be something small but I just keep missing it.
Any help will be greatly appreciated.
This is the differential equation that is being integrated:
dN.dt <- r * n * (1 - alpha * n)
If you want an asymptote of n= 200 then set alpha to 1/200 so that the rate of change will go to zero when n gets to 200:
prms <- c(r = 1, alpha = .005)
init.N <- 178
t.s <- seq(0.1, 10, by = 0.1)
library(deSolve)
out <- ode(y = init.N, times = t.s, clogistic, parms = prms)
plot(out[,1], out[,2], type="l", xlab = "Time", ylab = "N", col = "blue", lwd = 2)
With a starting value of 178, the rate of change will be negative when alpha is greater than 1/178, will be flatline with alpha == 1/178, and will be logistic when alpha is less than 1/178.
To go from 300 to 200 you would keep alpha = 1/200 and start at 300:
prms <- c(r = 1, alpha = 1/200)
init.N <- c(300)
t.s <- seq(0.1, 10, by = 0.1)
out <- ode(y = init.N, times = t.s, clogistic, parms = prms)
plot(out[,1], out[,2], type="l", xlab = "Time", ylab = "N", col = "blue", lwd = 2)

Resources