I am trying to compare two measurement methods with Bland-Altman plot, which is basically this:
method.1 <- rnorm(20)
method.2 <- rnorm(20)
plot((method.1 + method.2)/2, method.1 - method.2)
I've found a package that I like:
devtools::install_github("deepankardatta/blandr")
library(blandr)
blandr.draw(method.1, method.2, plotter = "rplot")
Which gives me the following result:
Bland-Altman plot with blandr package
The upper band is Mean + 1.96 SD (+/- 95% CI)
The lower band is Mean - 1.96 SD (+/- 95% CI)
The middle band is Mean +/- 95% CI
I like the way it is, although I wish I could change the bands colours, line types, points shape or include the legend.
I wish I could overwrite the blandr.draw() function or just create my own plot ( same as blandr.draw() ) using base R so I can customize it the way I want. I failed to contact the package author...
Additionally - ggplot version of similar plot ( blandr.draw(method.1, method.2) ) will be appreciated.
So here is my self-made Bland-Altman plot - maybe it will be useful for others.
Sample Bland-Altman plot
All calculations (Lines of agreement and 95% Confidence Intervals) based on Bland and Altman paper from 1999: Measuring agreement in method comparision studies.
I still don't know how to shade bands between Confidence Intervals - probably with rect() function.
# Sample data:
method.1 <- rnorm(100)
method.2 <- rnorm(100)
df <- data.frame(
X = (method.1 + method.2)/2,
Y = (method.1 - method.2)
)
# Number of measurements to calculate degrees of freedom for t-distribution:
n = length(df$Y)
t = qt(0.975, df = n - 1) # t-distribution
mean <- mean(df$Y)
LoA <- 1.96*sd(df$Y) # Lines of Agreement
# 95% Confidence Intervals:
LoA_CI <- t * sqrt( (1/n + 3.8416/(2*(n - 1))) ) * sd(df$Y)
mean_CI <- t * sd(df$Y)/sqrt(n)
# To calculate position of partition lines:
LoA_up_plus <- mean + LoA + LoA_CI
LoA_up <- mean + LoA
LoA_up_minus <- mean + LoA - LoA_CI
mean_plus <- mean + mean_CI
mean_minus <- mean - mean_CI
LoA_down_plus <- mean - LoA + LoA_CI
LoA_down <- mean - LoA
LoA_down_minus <- mean - LoA - LoA_CI
# Save PNG file:
png(filename = "BA_norm.png",
width = 3000, height = 2100, units = "px", res = 300)
# Plot:
plot(Y ~ X, df,
# When I have a lot of data my points are overlapping each other
# that's why I make them semi-transparent with 'alpha':
col = rgb(0, 0, 0, alpha = 0.5), pch = 16, cex = 0.75,
main = "Bland-Altman plot for Mathod 1 and Method 2",
xlab = "Mean of results",
ylab = "Method 1 - Method 2 difference"
)
# Background colour for your plot, if you don't want it
# just skip following four lines of code:
rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4],
col = "#c2f0f0") #here you can put desired background colour hex
points(Y ~ X, df,
col = rgb(0, 0, 0, alpha = 0.5), pch = 16, cex = 0.75)
# Adding lines:
abline(h = 0, lwd = 0.7) # solid line for Y = 0
# Display rounded values of partition lines positions:
text(x = 1.5, y = LoA_up_plus, # x and y position of text
paste(round(LoA_up, 2), "\u00B1", round(LoA_CI, 2)), pos = 1)
abline(h = LoA_up_plus, col = "#68cbf8", lty = "dotted")
abline(h = LoA_up, col = "blue", lty = "dashed")
abline(h = LoA_up_minus, col = "#68cbf8", lty = "dotted")
text(x = 1.5, y = mean_plus,
paste(round(mean, 2), "\u00B1", round(mean_CI, 2)), pos = 3)
abline(h = mean_plus, col = "#ff9e99", lty = "dotted")
abline(h = mean, col = "red", lty = "longdash")
abline(h = mean_minus, col = "#ff9e99", lty = "dotted")
text(x = 1.5, y = LoA_down_plus,
paste(round(LoA_down, 2), "\u00B1", round(LoA_CI, 2)), pos = 3)
abline(h = LoA_down_plus, col = "#68cbf8", lty = "dotted")
abline(h = LoA_down, col = "blue", lty = "dashed")
abline(h = LoA_down_minus, col = "#68cbf8", lty = "dotted")
# Close saving PNG file function:
dev.off()
I guess it is possible to easily condense all those abline() functions.
Related
I compare two treatments A and B. The objective is to show that A is not inferior to B. The non inferiority margin delta =-2
After comparing Treatment A - Treatment B I have these results
Mean difference and 95% CI = -0.7 [-2.1, 0.8]
I would like to plot this either with a package or manually. I have no idea how to do it.
Welch Two Sample t-test
data: mydata$outcome[mydata$traitement == "Bras S"] and mydata$outcome[mydata$traitement == "B"]
t = 0.88938, df = 258.81, p-value = 0.3746
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.133224 0.805804
sample estimates:
mean of x mean of y
8.390977 9.054688
I want to create this kind of plot:
You could abstract the relevant data from the t.test results and then plot in base R using segments and points to plot the data and abline to draw in the relevant vertical lines. Since there were no reproducible data, I made some up but the process is generally the same.
#sample data
set.seed(123)
tres <- t.test(runif(10), runif(10))
# get values to plot from t test results
ci <- tres$conf.int
ests <- tres$estimate[1] - tres$estimate[2]
# plot
plot(x = ci, ylim = c(0,2), xlim = c(-4, 4), type = "n", # blank plot
bty = "n", xlab = "Treatment A - Treatment B", ylab = "",
axes = FALSE)
points(x = ests, y = 1, pch = 20) # dot for point estimate
segments(x0 = ci[1], x1 = ci[2], y0 = 1) #CI line
abline(v = 0, lty = 2) # vertical line, dashed
abline(v = 2, lty = 1, col = "darkblue") # vertical line, solid, blue
axis(1, col = "darkblue") # add in x axis, blue
EDIT:
If you wanted to more accurately recreate your figure with the x axis in descending order and using your statement "Mean difference and 95% CI = -0.7 [-2.1, 0.8]", you can do the following manipulations to the above approach:
diff <- -0.7
ci <- c(-2.1, 0.8)
# plot
plot(1, xlim = c(-4, 4), type = "n",
bty = "n", xlab = "Treatment A - Treatment B", ylab = "",
axes = FALSE)
points(x = -diff, y = 1, pch = 20)
segments(x0 = -ci[2], x1 = -ci[1], y0 = 1)
abline(v = 0, lty = 2)
abline(v = 2, lty = 1, col = "darkblue")
axis(1, at = seq(-4,4,1), labels = seq(4, -4, -1), col = "darkblue")
I have created the following fanchart using the fanplot package. I'm trying to add axis ticks and labels to the y axis, however it's only giving me the decimals and not the full number. Looking for a solution to display the full number (e.g 4.59 and 4.61) on the y axis
I am also unsure of how to specify the breaks and number of decimal points for the labels on the y-axis using plot(). I know doing all of this in ggplot2 it would look something like this scale_y_continuous(breaks = seq(min(data.ts$Index),max(data.ts$Index),by=0.02)) . Any ideas on how to specify the breaks in the y axis as well as the number of decimal points using the base plot() feature in R?
Here is a reproductible of my dataset data.ts
structure(c(4.6049904235401, 4.60711076016453, 4.60980084146652,
4.61025389170935, 4.60544515681515, 4.60889021700954, 4.60983993107244,
4.61091608826696, 4.61138799159174, 4.61294431148318, 4.61167545843765,
4.61208284263432, 4.61421991328081, 4.61530485425155, 4.61471465043043,
4.6155992084451, 4.61195799200607, 4.61178486640435, 4.61037927954796,
4.60744590947049, 4.59979957741728, 4.59948551500254, 4.60078678080182,
4.60556092645471, 4.60934962087565, 4.60981147563749, 4.61060477704678,
4.61158365084251, 4.60963435263623, 4.61018215733317, 4.61209710959768,
4.61231368335184, 4.61071363571141, 4.61019496497916, 4.60948652606191,
4.61068813487859, 4.6084092003352, 4.60972706132393, 4.60866915174087,
4.61192565195909, 4.60878767339377, 4.61341471281265, 4.61015272152397,
4.6093479714315, 4.60750965935653, 4.60768790690338, 4.60676463096309,
4.60746490411374, 4.60885670935448, 4.60686846708382, 4.60688947889575,
4.60867708110485, 4.60448791268212, 4.60387348166032, 4.60569806689426,
4.6069320880709, 4.6087143894128, 4.61059688801283, 4.61065399116698,
4.61071421014339), .Tsp = c(2004, 2018.75, 4), class = "ts")
and here is a reproductible of the code I'm using
# # Install and Load Packages
## pacman::p_load(forecast,fanplot,tidyverse,tsbox,lubridate,readxl)
# Create an ARIMA Model using the auto.arima function
model <- auto.arima(data.ts)
# Simulate forecasts for 4 quarters (1 year) ahead
forecasts <- simulate(model, n=4)
# Create a data frame with the parameters needed for the uncertainty forecast
table <- ts_df(forecasts) %>%
rename(mode=value) %>%
mutate(time0 = rep(2019,4)) %>%
mutate(uncertainty = sd(mode)) %>%
mutate(skew = rep(0,4))
y0 <- 2019
k <- nrow(table)
# Set Percentiles
p <- seq(0.05, 0.95, 0.05)
p <- c(0.01, p, 0.99)
# Simulate a qsplitnorm distribution
fsval <- matrix(NA, nrow = length(p), ncol = k)
for (i in 1:k)
fsval[, i] <- qsplitnorm(p, mode = table$mode[i],
sd = table$uncertainty[i],
skew = table$skew[i])
# Create Plot
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
title(ylab = 'Log AFSI',main = 'Four-Quarter Ahead Forecast Fan - AFSI',
xlab = 'Date')
rect(y0 - 0.25, par("usr")[3] - 1, y0 + 2, par("usr")[4] + 1,
border = "gray90", col = "gray90")
fan(data = fsval, data.type = "values", probs = p,
start = y0, frequency = 4,
anchor = data.ts[time(data.ts) == y0 - .25],
fan.col = colorRampPalette(c("#75002B", "pink")),
ln = NULL, rlab = NULL)
# Add axis labels and ticks
axis(1, at = y0-2:y0 + 2, tcl = 0.5)
axis(1, at = seq(y0-2, y0 + 2, 0.25), labels = FALSE, tcl = 0.25)
abline(v = y0 - 0.25, lty = 1)
abline(v = y0 + 0.75, lty = 2)
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
range(blah) will only return two values (the minimum and maximum). The at parameter of axis() requires a sequence of points at which you require axis labels. Hence, these are the only two y values you have on your plot. Take a look at using pretty(blah) or seq(min(blah), max(blah), length.out = 10).
The suggestions of #Feakster are worth looking at, but the problem here is that the y-axis margin isn't wide enough. You could do either of two things. You could round the labels so they fit within the margins, for example you could replace this
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
with this
axis(2, at = range(fsval, data.ts),
labels = sprintf("%.3f", range(fsval, data.ts)), las = 2, tcl = 0.5)
Or, alternatively you could increase the y-axis margin before you make the plot by specifying:
par(mar=c(5,5,4,2)+.1)
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
Then everything below that should work. The mar element of par sets the number of lines printed in the margin of each axis. The default is c(5,4,4,2).
I'm trying to figure out how to modify a scatter-plot that contains two groups of data along a continuum separated by a large gap. The graph needs a break on the x-axis as well as on the regression line.
This R code using the ggplot2 library accurately presents the data, but is unsightly due to the vast amount of empty space on the graph. Pearson's correlation is -0.1380438.
library(ggplot2)
p <- ggplot(, aes(x = dis, y = result[, 1])) + geom_point(shape = 1) +
xlab("X-axis") +
ylab("Y-axis") + geom_smooth(color = "red", method = "lm", se = F) + theme_classic()
p + theme(plot.title = element_text(hjust = 0.5, size = 14))
This R code uses gap.plot to produce the breaks needed, but the regression line doesn't contain a break and doesn't reflect the slope properly. As you can see, the slope of the regression line isn't as sharp as the graph above and there needs to be a visible distinction in the slope of the line between those disparate groups.
library(plotrix)
gap.plot(
x = dis,
y = result[, 1],
gap = c(700, 4700),
gap.axis = "x",
xlab = "X-Axis",
ylab = "Y-Axis",
xtics = seq(0, 5575, by = 200)
)
abline(v = seq(700, 733) , col = "white")
abline(lm(result[, 1] ~ dis), col = "red", lwd = 2)
axis.break(1, 716, style = "slash")
Using MS Paint, I created an approximation of what the graph should look like. Notice the break marks on the top as well as the discontinuity between on the regression line between the two groups.
One solution is to plot the regression line in two pieces, using ablineclip to limit what's plotted each time. (Similar to #tung's suggestion, although it's clear that you want the appearance of a single graph rather than the appearance of facets.) Here's how that would work:
library(plotrix)
# Simulate some data that looks roughly like the original graph.
dis = c(rnorm(100, 300, 50), rnorm(100, 5000, 100))
result = c(rnorm(100, 0.6, 0.1), rnorm(100, 0.5, 0.1))
# Store the location of the gap so we can refer to it later.
x.axis.gap = c(700, 4700)
# gap.plot() works internally by shifting the location of the points to be
# plotted based on the gap size/location, and then adjusting the axis labels
# accordingly. We'll re-compute the second half of the regression line in the
# same way; these are the new values for the x-axis.
dis.alt = dis - x.axis.gap[1]
# Plot (same as before).
gap.plot(
x = dis,
y = result,
gap = x.axis.gap,
gap.axis = "x",
xlab = "X-Axis",
ylab = "Y-Axis",
xtics = seq(0, 5575, by = 200)
)
abline(v = seq(700, 733), col = "white")
axis.break(1, 716, style = "slash")
# Add regression line in two pieces: from 0 to the start of the gap, and from
# the end of the gap to infinity.
ablineclip(lm(result ~ dis), col = "red", lwd = 2, x2 = x.axis.gap[1])
ablineclip(lm(result ~ dis.alt), col = "red", lwd = 2, x1 = x.axis.gap[1] + 33)
I have created a barplot using barplot and then I want to show the linear trend. I use abline but the linear trend line does not show in the figure. I wonder what the problem is. Thanks.
set.seed(100)
Mydata=rnorm(65)
Year=1950:2014
barplot(Mydata)
fit=lm(Mydata~Year)
abline(fit)
As #G5W points out, fit=lm(Mydata~I(Year-1950)). But the new problem is that the trend line is too "long". As shown in the second figure, the trend line goes beyond the barplot. Is there any advice?
If you can use ggplot:
library(ggplot2)
df <- data.frame(Mydata, Year)
ggplot(df, aes(x = Year, y = Mydata)) +
geom_bar(stat = "identity") +
geom_smooth(method = "lm")
To expand on #bouncyball's comment, use a higher value of line width (lwd) to resemble barplot if you want.
plot(Year, Mydata, type = 'h',lwd=5,col = "grey")
abline(fit, lty =2)
EDIT
First copy this function
barplot2 <- function(x, y, lty = 1, lwd = 1, col = "grey", border = "black"){
w = ((max(x) - min(x))/length(x)) * 0.75
plot(x, y, type = 'p', pch = NA, yaxt = "n", xaxt = "n", xlab = "", ylab = "")
for (i in 1:length(x)){
x1 = x[i] - w/2
x2 = x[i] + w/2
y1 = 0
y2 = y[i]
polygon(x = c(x1,x2,x2,x1), y = c(y1,y1,y2,y2), lty = lty, lwd = lwd, col = col, border = border)
}
}
Then make the barplot
barplot2(Year,Mydata)
Then add the ablineclip from plotrix library
ablineclip(fit, x1 = min(Year), x2 = max(Year), y1 = min(Mydata), y2 = max(Mydata))
I want to create 3 plots for illustration purposes:
- normal distribution
- right skewed distribution
- left skewed distribution
This should be an easy task, but I found only this link, which only shows a normal distribution. How do I do the rest?
If you are not too tied to normal, then I suggest you use beta distribution which can be symmetrical, right skewed or left skewed based on the shape parameters.
hist(rbeta(10000,5,2))
hist(rbeta(10000,2,5))
hist(rbeta(10000,5,5))
Finally I got it working, but with both of your help, but I was relying on this site.
N <- 10000
x <- rnbinom(N, 10, .5)
hist(x,
xlim=c(min(x),max(x)), probability=T, nclass=max(x)-min(x)+1,
col='lightblue', xlab=' ', ylab=' ', axes=F,
main='Positive Skewed')
lines(density(x,bw=1), col='red', lwd=3)
This is also a valid solution:
curve(dbeta(x,8,4),xlim=c(0,1))
title(main="posterior distrobution of p")
just use fGarch package and these functions:
dsnorm(x, mean = 0, sd = 1, xi = 1.5, log = FALSE)
psnorm(q, mean = 0, sd = 1, xi = 1.5)
qsnorm(p, mean = 0, sd = 1, xi = 1.5)
rsnorm(n, mean = 0, sd = 1, xi = 1.5)
** mean, sd, xi location parameter mean, scale parameter sd, skewness parameter xi.
Examples
## snorm -
# Ranbdom Numbers:
par(mfrow = c(2, 2))
set.seed(1953)
r = rsnorm(n = 1000)
plot(r, type = "l", main = "snorm", col = "steelblue")
# Plot empirical density and compare with true density:
hist(r, n = 25, probability = TRUE, border = "white", col = "steelblue")
box()
x = seq(min(r), max(r), length = 201)
lines(x, dsnorm(x), lwd = 2)
# Plot df and compare with true df:
plot(sort(r), (1:1000/1000), main = "Probability", col = "steelblue",
ylab = "Probability")
lines(x, psnorm(x), lwd = 2)
# Compute quantiles:
round(qsnorm(psnorm(q = seq(-1, 5, by = 1))), digits = 6)