I want to create 3 plots for illustration purposes:
- normal distribution
- right skewed distribution
- left skewed distribution
This should be an easy task, but I found only this link, which only shows a normal distribution. How do I do the rest?
If you are not too tied to normal, then I suggest you use beta distribution which can be symmetrical, right skewed or left skewed based on the shape parameters.
hist(rbeta(10000,5,2))
hist(rbeta(10000,2,5))
hist(rbeta(10000,5,5))
Finally I got it working, but with both of your help, but I was relying on this site.
N <- 10000
x <- rnbinom(N, 10, .5)
hist(x,
xlim=c(min(x),max(x)), probability=T, nclass=max(x)-min(x)+1,
col='lightblue', xlab=' ', ylab=' ', axes=F,
main='Positive Skewed')
lines(density(x,bw=1), col='red', lwd=3)
This is also a valid solution:
curve(dbeta(x,8,4),xlim=c(0,1))
title(main="posterior distrobution of p")
just use fGarch package and these functions:
dsnorm(x, mean = 0, sd = 1, xi = 1.5, log = FALSE)
psnorm(q, mean = 0, sd = 1, xi = 1.5)
qsnorm(p, mean = 0, sd = 1, xi = 1.5)
rsnorm(n, mean = 0, sd = 1, xi = 1.5)
** mean, sd, xi location parameter mean, scale parameter sd, skewness parameter xi.
Examples
## snorm -
# Ranbdom Numbers:
par(mfrow = c(2, 2))
set.seed(1953)
r = rsnorm(n = 1000)
plot(r, type = "l", main = "snorm", col = "steelblue")
# Plot empirical density and compare with true density:
hist(r, n = 25, probability = TRUE, border = "white", col = "steelblue")
box()
x = seq(min(r), max(r), length = 201)
lines(x, dsnorm(x), lwd = 2)
# Plot df and compare with true df:
plot(sort(r), (1:1000/1000), main = "Probability", col = "steelblue",
ylab = "Probability")
lines(x, psnorm(x), lwd = 2)
# Compute quantiles:
round(qsnorm(psnorm(q = seq(-1, 5, by = 1))), digits = 6)
Related
I compare two treatments A and B. The objective is to show that A is not inferior to B. The non inferiority margin delta =-2
After comparing Treatment A - Treatment B I have these results
Mean difference and 95% CI = -0.7 [-2.1, 0.8]
I would like to plot this either with a package or manually. I have no idea how to do it.
Welch Two Sample t-test
data: mydata$outcome[mydata$traitement == "Bras S"] and mydata$outcome[mydata$traitement == "B"]
t = 0.88938, df = 258.81, p-value = 0.3746
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.133224 0.805804
sample estimates:
mean of x mean of y
8.390977 9.054688
I want to create this kind of plot:
You could abstract the relevant data from the t.test results and then plot in base R using segments and points to plot the data and abline to draw in the relevant vertical lines. Since there were no reproducible data, I made some up but the process is generally the same.
#sample data
set.seed(123)
tres <- t.test(runif(10), runif(10))
# get values to plot from t test results
ci <- tres$conf.int
ests <- tres$estimate[1] - tres$estimate[2]
# plot
plot(x = ci, ylim = c(0,2), xlim = c(-4, 4), type = "n", # blank plot
bty = "n", xlab = "Treatment A - Treatment B", ylab = "",
axes = FALSE)
points(x = ests, y = 1, pch = 20) # dot for point estimate
segments(x0 = ci[1], x1 = ci[2], y0 = 1) #CI line
abline(v = 0, lty = 2) # vertical line, dashed
abline(v = 2, lty = 1, col = "darkblue") # vertical line, solid, blue
axis(1, col = "darkblue") # add in x axis, blue
EDIT:
If you wanted to more accurately recreate your figure with the x axis in descending order and using your statement "Mean difference and 95% CI = -0.7 [-2.1, 0.8]", you can do the following manipulations to the above approach:
diff <- -0.7
ci <- c(-2.1, 0.8)
# plot
plot(1, xlim = c(-4, 4), type = "n",
bty = "n", xlab = "Treatment A - Treatment B", ylab = "",
axes = FALSE)
points(x = -diff, y = 1, pch = 20)
segments(x0 = -ci[2], x1 = -ci[1], y0 = 1)
abline(v = 0, lty = 2)
abline(v = 2, lty = 1, col = "darkblue")
axis(1, at = seq(-4,4,1), labels = seq(4, -4, -1), col = "darkblue")
Let be a set of X_1, ..., X_n - independent identically distributed random variables with cumulative distribution function F(x). Let denote the empirical distribution function as F_n(x).
Lets introduce the value Dn (the so-called statistics of the Kolmogorov-Smirnov criterion):
I need to prove with plots that
Dn has a marginal distribution when n\to \infty
The asymptotic distribution of the Dn does not depend on the distribution function F(x)
I tried this, but I don't understand why I get wrong plots (I need basic gtaphic functions or lattice)
`
if (!require("latex2exp")) install.packages("latex2exp")
library("latex2exp")
# 1. Dn has a limit distribution for n -> inf
DNorm <- function(x, mean = 0, sd = 1) {
emp.cdf <- ecdf(x)
n = length(x)
df <- data.frame(emp.cdf = emp.cdf(x), pnorm = pnorm(x, mean, sd))
vec <- (abs((df$emp.cdf - df$pnorm)))
res <- max(vec)* sqrt(n)
}
DnNorm <- function(n, mean = 0, sd = 1) {
x <- sapply(10:n, rnorm, mean, sd)
res <- sapply(x, RNorm, mean, sd)
}
pdf(file="1.pdf")
par(mfrow=c(2,2))
hist(DnNorm(100), breaks = 10, xlim = c(0, 3), col = "cyan1", main = "n = 100", xlab = "Dn")
hist(DnNorm(1000), breaks = 15, xlim = c(0, 3), col = "cyan1", main = "n = 1000", xlab = "Dn")
hist(DnNorm(5000), breaks = 15, xlim = c(0, 3), col = "cyan1", main = "n = 5000", xlab = "Dn")
dev.off()
# 2. Asymptotic distribution of Dn is independent of the distribution function F(x).
pdf(file="2.pdf")
par(mfrow=c(3,1))
hist(DnNorm(3000), breaks = 15, xlim = c(0, 3), col = "cyan1", main = "N(0, 1)", xlab = "Dn")
hist(DnNorm(3000, 50, 4), breaks = 15, xlim = c(0, 3), col = "cyan1", main = "N(50, 4)", xlab = "Dn")
hist(DnNorm(3000, 1), breaks = 15, xlim = c(0, 3), col = "cyan1", main = "EXP(1)", xlab = "Dn")
dev.off()
`
#Yana Sal Maybe you need to correct the line in function DnNorm with:
{res <- sapply(x, DNorm, mean, sd)} # replace RNorm with your function DNorm()`
I am plotting the density of F(1,49) in R. It seems that the simulated plot does not match the theoretical plot when values approach the zero.
set.seed(123)
val <- rf(1000, df1=1, df2=49)
plot(density(val), yaxt="n",ylab="",xlab="Observation",
main=expression(paste("Density plot (",italic(n),"=1000, ",italic(df)[1],"=1, ",italic(df)[2],"=49)")),
lwd=2)
curve(df(x, df1=1, df2=49), from=0, to=10, add=T, col="red",lwd=2,lty=2)
legend("topright",c("Theoretical","Simulated"),
col=c("red","black"),lty=c(2,1),bty="n")
Using density(val, from = 0) gets you much closer, although still not perfect. Densities near boundaries are notoriously difficult to calculate in a satisfactory way.
By default, density uses a Gaussian kernel to estimate the probability density at a given point. Effectively, this means that at each point an observation was found, a normal density curve is placed there with its center at the observation. All these normal densities are added up, then the result is normalized so that the area under the curve is 1.
This works well if observations have a central tendency, but gives unrealistic results when there are sharp boundaries (Try plot(density(runif(1000))) for a prime example).
When you have a very high density of points close to zero, but none below zero, the left tail of all the normal kernels will "spill over" into the negative values, giving a Gaussian-type which doesn't match the theoretical density.
This means that if you have a sharp boundary at 0, you should remove values of your simulated density that are between zero and about two standard deviations of your smoothing kernel - anything below this will be misleading.
Since we can control the standard deviation of our smoothing kernel with the bw parameter of density, and easily control which x values are plotted using ggplot, we will get a more sensible result by doing something like this:
library(ggplot2)
ggplot(as.data.frame(density(val), bw = 0.1), aes(x, y)) +
geom_line(aes(col = "Simulated"), na.rm = TRUE) +
geom_function(fun = ~ df(.x, df1 = 1, df2 = 49),
aes(col = "Theoretical"), lty = 2) +
lims(x = c(0.2, 12)) +
theme_classic(base_size = 16) +
labs(title = expression(paste("Density plot (",italic(n),"=1000, ",
italic(df)[1],"=1, ",italic(df)[2],"=49)")),
x = "Observation", y = "") +
scale_color_manual(values = c("black", "red"), name = "")
The kde1d and logspline packages are not bad for such densities.
sims <- rf(1500, 1, 49)
library(kde1d)
kd <- kde1d(sims, bw = 1, xmin = 0)
plot(kd, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)
library(logspline)
fit <- logspline(sims, lbound = 0, knots = c(0, 0.5, 1, 1.5, 2))
plot(fit, col = "red", xlim = c(0, 2), ylim = c(0, 2))
curve(df(x, 1, 49), add = TRUE)
I am trying to compare two measurement methods with Bland-Altman plot, which is basically this:
method.1 <- rnorm(20)
method.2 <- rnorm(20)
plot((method.1 + method.2)/2, method.1 - method.2)
I've found a package that I like:
devtools::install_github("deepankardatta/blandr")
library(blandr)
blandr.draw(method.1, method.2, plotter = "rplot")
Which gives me the following result:
Bland-Altman plot with blandr package
The upper band is Mean + 1.96 SD (+/- 95% CI)
The lower band is Mean - 1.96 SD (+/- 95% CI)
The middle band is Mean +/- 95% CI
I like the way it is, although I wish I could change the bands colours, line types, points shape or include the legend.
I wish I could overwrite the blandr.draw() function or just create my own plot ( same as blandr.draw() ) using base R so I can customize it the way I want. I failed to contact the package author...
Additionally - ggplot version of similar plot ( blandr.draw(method.1, method.2) ) will be appreciated.
So here is my self-made Bland-Altman plot - maybe it will be useful for others.
Sample Bland-Altman plot
All calculations (Lines of agreement and 95% Confidence Intervals) based on Bland and Altman paper from 1999: Measuring agreement in method comparision studies.
I still don't know how to shade bands between Confidence Intervals - probably with rect() function.
# Sample data:
method.1 <- rnorm(100)
method.2 <- rnorm(100)
df <- data.frame(
X = (method.1 + method.2)/2,
Y = (method.1 - method.2)
)
# Number of measurements to calculate degrees of freedom for t-distribution:
n = length(df$Y)
t = qt(0.975, df = n - 1) # t-distribution
mean <- mean(df$Y)
LoA <- 1.96*sd(df$Y) # Lines of Agreement
# 95% Confidence Intervals:
LoA_CI <- t * sqrt( (1/n + 3.8416/(2*(n - 1))) ) * sd(df$Y)
mean_CI <- t * sd(df$Y)/sqrt(n)
# To calculate position of partition lines:
LoA_up_plus <- mean + LoA + LoA_CI
LoA_up <- mean + LoA
LoA_up_minus <- mean + LoA - LoA_CI
mean_plus <- mean + mean_CI
mean_minus <- mean - mean_CI
LoA_down_plus <- mean - LoA + LoA_CI
LoA_down <- mean - LoA
LoA_down_minus <- mean - LoA - LoA_CI
# Save PNG file:
png(filename = "BA_norm.png",
width = 3000, height = 2100, units = "px", res = 300)
# Plot:
plot(Y ~ X, df,
# When I have a lot of data my points are overlapping each other
# that's why I make them semi-transparent with 'alpha':
col = rgb(0, 0, 0, alpha = 0.5), pch = 16, cex = 0.75,
main = "Bland-Altman plot for Mathod 1 and Method 2",
xlab = "Mean of results",
ylab = "Method 1 - Method 2 difference"
)
# Background colour for your plot, if you don't want it
# just skip following four lines of code:
rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4],
col = "#c2f0f0") #here you can put desired background colour hex
points(Y ~ X, df,
col = rgb(0, 0, 0, alpha = 0.5), pch = 16, cex = 0.75)
# Adding lines:
abline(h = 0, lwd = 0.7) # solid line for Y = 0
# Display rounded values of partition lines positions:
text(x = 1.5, y = LoA_up_plus, # x and y position of text
paste(round(LoA_up, 2), "\u00B1", round(LoA_CI, 2)), pos = 1)
abline(h = LoA_up_plus, col = "#68cbf8", lty = "dotted")
abline(h = LoA_up, col = "blue", lty = "dashed")
abline(h = LoA_up_minus, col = "#68cbf8", lty = "dotted")
text(x = 1.5, y = mean_plus,
paste(round(mean, 2), "\u00B1", round(mean_CI, 2)), pos = 3)
abline(h = mean_plus, col = "#ff9e99", lty = "dotted")
abline(h = mean, col = "red", lty = "longdash")
abline(h = mean_minus, col = "#ff9e99", lty = "dotted")
text(x = 1.5, y = LoA_down_plus,
paste(round(LoA_down, 2), "\u00B1", round(LoA_CI, 2)), pos = 3)
abline(h = LoA_down_plus, col = "#68cbf8", lty = "dotted")
abline(h = LoA_down, col = "blue", lty = "dashed")
abline(h = LoA_down_minus, col = "#68cbf8", lty = "dotted")
# Close saving PNG file function:
dev.off()
I guess it is possible to easily condense all those abline() functions.
I want to compute a posterior density plot with conjugate prior
I have data with known parameters (mean =30 , sd =10)
I have two priors one with normal distribution with known parameter ( mean =10 , sd=5) and other with t distribution with same mean and sd but degree of freedom 4
I want a graph with density plots for prior,data and posterior ?
can you help me with r code for this problem ?
Plus I am getting wrong density function for posterior in my opinion..Here is my code so far
x=seq(from=-90, to=90, by= 1)
data=dnorm(x,mean=30,sd =10)
prior = dnorm(x,mean=10,sd =5)
posterior = dnorm(x,mean=10,sd =5)*dnorm(x,mean=30,sd =10) # prior* data #Prior*data
plot(x,data , type="l", col="blue")
lines(x,prior, type="l", col="red")
lines(x,posterior , type="l", col="green")
You need to add the two distributions together not multiply. I attach an example below that uses equal weight between the two distributions:
x <- seq(from = -90, to = 90, by = 1)
data <- dnorm(x, mean = 30, sd = 10)
prior <- dnorm(x, mean = 10, sd = 5)
posterior <- 0.5 * dnorm(x, mean = 10, sd = 5) + 0.5 * dnorm(x, mean = 30, sd = 10)
plot(x, prior, type = "l", col = "red")
lines(x, posterior, type = "l", col = "green")
lines(x, data , type = "l", col = "blue")