Related
I need to calculate the area between two curves. One curve - country’s GDP per capita, other curve - GDP trend. I tried to use the integrate function in my code below but the calculated area is not accurate.
My code includes unnecessary area before and after the intersection points. I need to calculate the area where the GDP curve is below the GDP trend curve (1).
GDP_GR <- ts(GR, start = c(2000, 1), frequency = 4)
gdp_gr <- log(GDP_GR)
y.pot_gr <- hpfilter(gdp_gr, freq = 1600)$trend
ts.plot(gdp_gr)
y.pot_gr
lines(y.pot_gr, col = "blue")
# area:
y.c_gr <- window(gdp_gr, start = c(2020, 1), end = c(2021, 1))
y.pot.c_gr <- window(y.pot_gr, start = c(2020, 1), end = c(2021, 1))
n <- length(y.c_gr)
x <- seq(1, n, 1)
plot(x, y.c_gr, type = "l", lwd = 1.5,
ylim = c(min(c(y.c_gr, y.pot.c_gr)), max(c(y.c_gr, y.pot.c_gr))),
xlab = "Time", ylab = "GDP", xaxt = "n")
lines(x, y.pot.c_gr, lty = 2, lwd = 1.5)
axis(1, at = 1:n, labels = ENTRY[81:85])
polygon(c(x, rev(x)),c(y.c_gr, rev(y.pot.c_gr)),
col = "lightgrey", border = NA)
function_1 <- approxfun(x, y.c_gr - y.pot.c_gr)
function_2 <- function(x) { abs(function_1(x)) }
integrate(function_2, 1, n)
How can I improve my code?
I am trying to compare two measurement methods with Bland-Altman plot, which is basically this:
method.1 <- rnorm(20)
method.2 <- rnorm(20)
plot((method.1 + method.2)/2, method.1 - method.2)
I've found a package that I like:
devtools::install_github("deepankardatta/blandr")
library(blandr)
blandr.draw(method.1, method.2, plotter = "rplot")
Which gives me the following result:
Bland-Altman plot with blandr package
The upper band is Mean + 1.96 SD (+/- 95% CI)
The lower band is Mean - 1.96 SD (+/- 95% CI)
The middle band is Mean +/- 95% CI
I like the way it is, although I wish I could change the bands colours, line types, points shape or include the legend.
I wish I could overwrite the blandr.draw() function or just create my own plot ( same as blandr.draw() ) using base R so I can customize it the way I want. I failed to contact the package author...
Additionally - ggplot version of similar plot ( blandr.draw(method.1, method.2) ) will be appreciated.
So here is my self-made Bland-Altman plot - maybe it will be useful for others.
Sample Bland-Altman plot
All calculations (Lines of agreement and 95% Confidence Intervals) based on Bland and Altman paper from 1999: Measuring agreement in method comparision studies.
I still don't know how to shade bands between Confidence Intervals - probably with rect() function.
# Sample data:
method.1 <- rnorm(100)
method.2 <- rnorm(100)
df <- data.frame(
X = (method.1 + method.2)/2,
Y = (method.1 - method.2)
)
# Number of measurements to calculate degrees of freedom for t-distribution:
n = length(df$Y)
t = qt(0.975, df = n - 1) # t-distribution
mean <- mean(df$Y)
LoA <- 1.96*sd(df$Y) # Lines of Agreement
# 95% Confidence Intervals:
LoA_CI <- t * sqrt( (1/n + 3.8416/(2*(n - 1))) ) * sd(df$Y)
mean_CI <- t * sd(df$Y)/sqrt(n)
# To calculate position of partition lines:
LoA_up_plus <- mean + LoA + LoA_CI
LoA_up <- mean + LoA
LoA_up_minus <- mean + LoA - LoA_CI
mean_plus <- mean + mean_CI
mean_minus <- mean - mean_CI
LoA_down_plus <- mean - LoA + LoA_CI
LoA_down <- mean - LoA
LoA_down_minus <- mean - LoA - LoA_CI
# Save PNG file:
png(filename = "BA_norm.png",
width = 3000, height = 2100, units = "px", res = 300)
# Plot:
plot(Y ~ X, df,
# When I have a lot of data my points are overlapping each other
# that's why I make them semi-transparent with 'alpha':
col = rgb(0, 0, 0, alpha = 0.5), pch = 16, cex = 0.75,
main = "Bland-Altman plot for Mathod 1 and Method 2",
xlab = "Mean of results",
ylab = "Method 1 - Method 2 difference"
)
# Background colour for your plot, if you don't want it
# just skip following four lines of code:
rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4],
col = "#c2f0f0") #here you can put desired background colour hex
points(Y ~ X, df,
col = rgb(0, 0, 0, alpha = 0.5), pch = 16, cex = 0.75)
# Adding lines:
abline(h = 0, lwd = 0.7) # solid line for Y = 0
# Display rounded values of partition lines positions:
text(x = 1.5, y = LoA_up_plus, # x and y position of text
paste(round(LoA_up, 2), "\u00B1", round(LoA_CI, 2)), pos = 1)
abline(h = LoA_up_plus, col = "#68cbf8", lty = "dotted")
abline(h = LoA_up, col = "blue", lty = "dashed")
abline(h = LoA_up_minus, col = "#68cbf8", lty = "dotted")
text(x = 1.5, y = mean_plus,
paste(round(mean, 2), "\u00B1", round(mean_CI, 2)), pos = 3)
abline(h = mean_plus, col = "#ff9e99", lty = "dotted")
abline(h = mean, col = "red", lty = "longdash")
abline(h = mean_minus, col = "#ff9e99", lty = "dotted")
text(x = 1.5, y = LoA_down_plus,
paste(round(LoA_down, 2), "\u00B1", round(LoA_CI, 2)), pos = 3)
abline(h = LoA_down_plus, col = "#68cbf8", lty = "dotted")
abline(h = LoA_down, col = "blue", lty = "dashed")
abline(h = LoA_down_minus, col = "#68cbf8", lty = "dotted")
# Close saving PNG file function:
dev.off()
I guess it is possible to easily condense all those abline() functions.
I am working with the R programming language. I am trying to plot some categorical and continuous data that I am working with, but I am getting an error that tells me that such plots are only possible with "only numeric variables".
library(survival)
library(ggplot2)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
str(data)
#plot
mycolours <- rainbow(length(unique(data$sex)), end = 0.6)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(4, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, 5), ylim = range(data[, 1:6]) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "Variable", ylab = "Standardised value")
axis(1, 1:5, labels = colnames(data)[1:6])
abline(v = 1:5, col = "#00000033", lwd = 2)
abline(h = seq(-2.5, 2.5, 0.5), col = "#00000022", lty = 2)
for (i in 1:nrow(data)) lines(as.numeric(data[i, 1:6]), col = mycolours[as.numeric(data$sex[i])])
legend("topright", c("Female", "Male"), lwd = 2, col = mycolours, bty = "n")
# dev.off()
Does anyone know if this is possible to do with both categorical and continuous data?
Thanks
Sources: R: Parallel Coordinates Plot without GGally
Yup. You just have to be careful with the values. Remember how the factors are coded internally: they are just spicy integer variables with value labels (similar to names). You can losslessly cast it to character or to numeric. For the sake of plotting, you need numbers for line coordinates, so the factor-y nature of your variables will come at the end.
Remember that the quality of your visualisation and the information content depends on the order of your variables in you data set. For factors, labels are absolutely necessary. Help the reader by doing some completely custom improvements impossible in ggplot2 in small steps!
I wrote a custom function allowing anyone to add super-legible text on top of the values that are not so obvious to interpret. Give meaningful names, choose appropriate font size, pass all those extra parameters to the custom function as an ellipsis (...)!
Here you can see that most of the dead patients are female and most of the censored ones are males. Maybe adding some points with slight jitter will give the reader idea about the distributions of these variables.
library(survival)
data(lung)
# Data preparation
lung.scaled <- apply(lung, 2, scale)
drop.column.index <- which(colnames(lung) == "sex")
lung.scaled <- lung.scaled[, -drop.column.index] # Dropping the split variable
split.var <- lung[, drop.column.index]
lung <- lung[, -drop.column.index]
mycolours <- rainbow(length(unique(split.var)), end = 0.6, v = 0.9, alpha = 0.4)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(5.5, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, ncol(lung.scaled)), ylim = range(lung.scaled, na.rm = TRUE) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "", ylab = "Standardised value")
axis(1, 1:ncol(lung.scaled), labels = colnames(lung), cex.axis = 0.95, las = 2)
abline(v = 1:ncol(lung), col = "#00000033", lwd = 2)
abline(h = seq(round(min(lung.scaled, na.rm = TRUE)), round(max(lung.scaled, na.rm = TRUE), 0.5)), col = "#00000022", lty = 2)
for (i in 1:nrow(lung.scaled)) lines(as.numeric(lung.scaled[i, ]), col = mycolours[as.numeric(split.var[i])])
legend("topleft", c("Female", "Male"), lwd = 3, col = mycolours, bty = "n")
# Labels for some categorical variables with a white halo for readability
labels.with.halo <- function(varname, data.scaled, labels, nhalo = 32, col.halo = "#FFFFFF44", hscale = 0.04, vscale = 0.04, ...) {
offsets <- cbind(cos(seq(0, 2*pi, length.out = nhalo + 1)) * hscale, sin(seq(0, 2*pi, length.out = nhalo + 1)) * vscale)[-(nhalo + 1), ]
ind <- which(colnames(data.scaled) == varname)
yvals <- sort(unique(data.scaled[, ind]))
for (i in 1:nhalo) text(rep(ind, length(yvals)) + offsets[i, 1], yvals + offsets[i, 2], labels = labels, col = col.halo, ...)
text(rep(ind, length(yvals)), yvals, labels = labels, ...)
}
labels.with.halo("status", lung.scaled, c("Censored", "Dead"), pos = 3)
labels.with.halo("ph.ecog", lung.scaled, c("Asymptomatic", "Symp. but ambul.", "< 50% bed", "> 50% bed"), pos = 3, cex = 0.9)
# dev.off()
I have created the following fanchart using the fanplot package. I'm trying to add axis ticks and labels to the y axis, however it's only giving me the decimals and not the full number. Looking for a solution to display the full number (e.g 4.59 and 4.61) on the y axis
I am also unsure of how to specify the breaks and number of decimal points for the labels on the y-axis using plot(). I know doing all of this in ggplot2 it would look something like this scale_y_continuous(breaks = seq(min(data.ts$Index),max(data.ts$Index),by=0.02)) . Any ideas on how to specify the breaks in the y axis as well as the number of decimal points using the base plot() feature in R?
Here is a reproductible of my dataset data.ts
structure(c(4.6049904235401, 4.60711076016453, 4.60980084146652,
4.61025389170935, 4.60544515681515, 4.60889021700954, 4.60983993107244,
4.61091608826696, 4.61138799159174, 4.61294431148318, 4.61167545843765,
4.61208284263432, 4.61421991328081, 4.61530485425155, 4.61471465043043,
4.6155992084451, 4.61195799200607, 4.61178486640435, 4.61037927954796,
4.60744590947049, 4.59979957741728, 4.59948551500254, 4.60078678080182,
4.60556092645471, 4.60934962087565, 4.60981147563749, 4.61060477704678,
4.61158365084251, 4.60963435263623, 4.61018215733317, 4.61209710959768,
4.61231368335184, 4.61071363571141, 4.61019496497916, 4.60948652606191,
4.61068813487859, 4.6084092003352, 4.60972706132393, 4.60866915174087,
4.61192565195909, 4.60878767339377, 4.61341471281265, 4.61015272152397,
4.6093479714315, 4.60750965935653, 4.60768790690338, 4.60676463096309,
4.60746490411374, 4.60885670935448, 4.60686846708382, 4.60688947889575,
4.60867708110485, 4.60448791268212, 4.60387348166032, 4.60569806689426,
4.6069320880709, 4.6087143894128, 4.61059688801283, 4.61065399116698,
4.61071421014339), .Tsp = c(2004, 2018.75, 4), class = "ts")
and here is a reproductible of the code I'm using
# # Install and Load Packages
## pacman::p_load(forecast,fanplot,tidyverse,tsbox,lubridate,readxl)
# Create an ARIMA Model using the auto.arima function
model <- auto.arima(data.ts)
# Simulate forecasts for 4 quarters (1 year) ahead
forecasts <- simulate(model, n=4)
# Create a data frame with the parameters needed for the uncertainty forecast
table <- ts_df(forecasts) %>%
rename(mode=value) %>%
mutate(time0 = rep(2019,4)) %>%
mutate(uncertainty = sd(mode)) %>%
mutate(skew = rep(0,4))
y0 <- 2019
k <- nrow(table)
# Set Percentiles
p <- seq(0.05, 0.95, 0.05)
p <- c(0.01, p, 0.99)
# Simulate a qsplitnorm distribution
fsval <- matrix(NA, nrow = length(p), ncol = k)
for (i in 1:k)
fsval[, i] <- qsplitnorm(p, mode = table$mode[i],
sd = table$uncertainty[i],
skew = table$skew[i])
# Create Plot
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
title(ylab = 'Log AFSI',main = 'Four-Quarter Ahead Forecast Fan - AFSI',
xlab = 'Date')
rect(y0 - 0.25, par("usr")[3] - 1, y0 + 2, par("usr")[4] + 1,
border = "gray90", col = "gray90")
fan(data = fsval, data.type = "values", probs = p,
start = y0, frequency = 4,
anchor = data.ts[time(data.ts) == y0 - .25],
fan.col = colorRampPalette(c("#75002B", "pink")),
ln = NULL, rlab = NULL)
# Add axis labels and ticks
axis(1, at = y0-2:y0 + 2, tcl = 0.5)
axis(1, at = seq(y0-2, y0 + 2, 0.25), labels = FALSE, tcl = 0.25)
abline(v = y0 - 0.25, lty = 1)
abline(v = y0 + 0.75, lty = 2)
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
range(blah) will only return two values (the minimum and maximum). The at parameter of axis() requires a sequence of points at which you require axis labels. Hence, these are the only two y values you have on your plot. Take a look at using pretty(blah) or seq(min(blah), max(blah), length.out = 10).
The suggestions of #Feakster are worth looking at, but the problem here is that the y-axis margin isn't wide enough. You could do either of two things. You could round the labels so they fit within the margins, for example you could replace this
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
with this
axis(2, at = range(fsval, data.ts),
labels = sprintf("%.3f", range(fsval, data.ts)), las = 2, tcl = 0.5)
Or, alternatively you could increase the y-axis margin before you make the plot by specifying:
par(mar=c(5,5,4,2)+.1)
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
Then everything below that should work. The mar element of par sets the number of lines printed in the margin of each axis. The default is c(5,4,4,2).
I have this sample 10-year regression in the future.
date<-as.Date(c("2015-12-31", "2014-12-31", "2013-12-31", "2012-12-31"))
value<-c(16348, 14136, 12733, 10737)
#fit linear regression
model<-lm(value~date)
#build predict dataframe
dfuture<-data.frame(date=seq(as.Date("2016-12-31"), by="1 year", length.out = 10))
#predict the futurne
predict(model, dfuture, interval = "prediction")
How can I add confidence bands to this?
The following code will generate good-looking regression plot for you. My comments along the code should explain everything clear. The code will use value, model as in your question.
## all date you are interested in, 4 years with observations, 10 years for prediction
all_date <- seq(as.Date("2012-12-31"), by="1 year", length.out = 14)
## compute confidence bands (for all data)
pred.c <- predict(model, data.frame(date=all_date), interval="confidence")
## compute prediction bands (for new data only)
pred.p <- predict(model, data.frame(date=all_date[5:14]), interval="prediction")
## set up regression plot (plot nothing here; only set up range, axis)
ylim <- range(range(pred.c[,-1]), range(pred.p[,-1]))
plot(1:nrow(pred.c), numeric(nrow(pred.c)), col = "white", ylim = ylim,
xaxt = "n", xlab = "Date", ylab = "prediction",
main = "Regression Plot")
axis(1, at = 1:nrow(pred.c), labels = all_date)
## shade 95%-level confidence region
polygon(c(1:nrow(pred.c),nrow(pred.c):1), c(pred.c[, 2], rev(pred.c[, 3])),
col = "grey", border = NA)
## plot fitted values / lines
lines(1:nrow(pred.c), pred.c[, 1], lwd = 2, col = 4)
## add 95%-level confidence bands
lines(1:nrow(pred.c), pred.c[, 2], col = 2, lty = 2, lwd = 2)
lines(1:nrow(pred.c), pred.c[, 3], col = 2, lty = 2, lwd = 2)
## add 95%-level prediction bands
lines(4 + 1:nrow(pred.p), pred.p[, 2], col = 3, lty = 3, lwd = 2)
lines(4 + 1:nrow(pred.p), pred.p[, 3], col = 3, lty = 3, lwd = 2)
## add original observations on the plot
points(1:4, rev(value), pch = 20)
## finally, we add legend
legend(x = "topleft", legend = c("Obs", "Fitted", "95%-CI", "95%-PI"),
pch = c(20, NA, NA, NA), lty = c(NA, 1, 2, 3), col = c(1, 4, 2, 3),
text.col = c(1, 4, 2, 3), bty = "n")
The JPEG is generated by code:
jpeg("regression.jpeg", height = 500, width = 600, quality = 100)
## the above code
dev.off()
## check your working directory for this JPEG
## use code getwd() to see this director if you don't know
As you can see from the plot,
Confidence band grows wider as you try to make prediction further away from you observed data;
Prediction interval is wider than confidence interval.
If you want to know more about how predict.lm() computes confidence / prediction intervals internally, read How does predict.lm() compute confidence interval and prediction interval?, and my answer there.
Thanks to Alex's demonstration of simple use of visreg package; but I still prefer to using R base.
You can simply use visreg::visreg
library(visreg)
visreg(model)
If you are interested in the values:
> head(visreg(model)$fit)
date value visregFit visregLwr visregUpr
1 2012-12-31 13434.5 10753.10 9909.073 11597.13
2 2013-01-10 13434.5 10807.81 9974.593 11641.02
3 2013-01-21 13434.5 10862.52 10040.033 11685.00
4 2013-02-01 13434.5 10917.22 10105.389 11729.06
5 2013-02-12 13434.5 10971.93 10170.658 11773.21
6 2013-02-23 13434.5 11026.64 10235.837 11817.44