I need to calculate the area between two curves. One curve - country’s GDP per capita, other curve - GDP trend. I tried to use the integrate function in my code below but the calculated area is not accurate.
My code includes unnecessary area before and after the intersection points. I need to calculate the area where the GDP curve is below the GDP trend curve (1).
GDP_GR <- ts(GR, start = c(2000, 1), frequency = 4)
gdp_gr <- log(GDP_GR)
y.pot_gr <- hpfilter(gdp_gr, freq = 1600)$trend
ts.plot(gdp_gr)
y.pot_gr
lines(y.pot_gr, col = "blue")
# area:
y.c_gr <- window(gdp_gr, start = c(2020, 1), end = c(2021, 1))
y.pot.c_gr <- window(y.pot_gr, start = c(2020, 1), end = c(2021, 1))
n <- length(y.c_gr)
x <- seq(1, n, 1)
plot(x, y.c_gr, type = "l", lwd = 1.5,
ylim = c(min(c(y.c_gr, y.pot.c_gr)), max(c(y.c_gr, y.pot.c_gr))),
xlab = "Time", ylab = "GDP", xaxt = "n")
lines(x, y.pot.c_gr, lty = 2, lwd = 1.5)
axis(1, at = 1:n, labels = ENTRY[81:85])
polygon(c(x, rev(x)),c(y.c_gr, rev(y.pot.c_gr)),
col = "lightgrey", border = NA)
function_1 <- approxfun(x, y.c_gr - y.pot.c_gr)
function_2 <- function(x) { abs(function_1(x)) }
integrate(function_2, 1, n)
How can I improve my code?
Related
I compare two treatments A and B. The objective is to show that A is not inferior to B. The non inferiority margin delta =-2
After comparing Treatment A - Treatment B I have these results
Mean difference and 95% CI = -0.7 [-2.1, 0.8]
I would like to plot this either with a package or manually. I have no idea how to do it.
Welch Two Sample t-test
data: mydata$outcome[mydata$traitement == "Bras S"] and mydata$outcome[mydata$traitement == "B"]
t = 0.88938, df = 258.81, p-value = 0.3746
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.133224 0.805804
sample estimates:
mean of x mean of y
8.390977 9.054688
I want to create this kind of plot:
You could abstract the relevant data from the t.test results and then plot in base R using segments and points to plot the data and abline to draw in the relevant vertical lines. Since there were no reproducible data, I made some up but the process is generally the same.
#sample data
set.seed(123)
tres <- t.test(runif(10), runif(10))
# get values to plot from t test results
ci <- tres$conf.int
ests <- tres$estimate[1] - tres$estimate[2]
# plot
plot(x = ci, ylim = c(0,2), xlim = c(-4, 4), type = "n", # blank plot
bty = "n", xlab = "Treatment A - Treatment B", ylab = "",
axes = FALSE)
points(x = ests, y = 1, pch = 20) # dot for point estimate
segments(x0 = ci[1], x1 = ci[2], y0 = 1) #CI line
abline(v = 0, lty = 2) # vertical line, dashed
abline(v = 2, lty = 1, col = "darkblue") # vertical line, solid, blue
axis(1, col = "darkblue") # add in x axis, blue
EDIT:
If you wanted to more accurately recreate your figure with the x axis in descending order and using your statement "Mean difference and 95% CI = -0.7 [-2.1, 0.8]", you can do the following manipulations to the above approach:
diff <- -0.7
ci <- c(-2.1, 0.8)
# plot
plot(1, xlim = c(-4, 4), type = "n",
bty = "n", xlab = "Treatment A - Treatment B", ylab = "",
axes = FALSE)
points(x = -diff, y = 1, pch = 20)
segments(x0 = -ci[2], x1 = -ci[1], y0 = 1)
abline(v = 0, lty = 2)
abline(v = 2, lty = 1, col = "darkblue")
axis(1, at = seq(-4,4,1), labels = seq(4, -4, -1), col = "darkblue")
I have the following code and graph:
bleeds <- read.csv("C:/Users/aaron/OneDrive/MSc Operational Research/Dissertation/Bleeds.csv", header = TRUE)
bleeds
#Change date to object(date)
bleeds$Month <- as.Date(bleeds$Month, format = "%b")
## Transform data into time series
bleedts <- ts(bleeds$Number.of.Bleeds, frequency = 12, start = c(2019, 01), end = c(2021, 08))
bleedts
plot(bleedts)
bleeds$Month <- as.Date(bleeds$Month, format="%b")
str(bleeds$Month)
##Training and Test sets
training = window(bleedts, start = c(2019, 1), end = c(2020, 12))
testing = window(bleedts, start = c(2021, 1))
##Baseline (seasonal Naive) forecast
naive_forecast = snaive(training, h = length(testing), level = 95)
MAPE(naive$mean, testing) * 100
plot(bleedts, xlab = "Year", ylab = "Number of Bleeds", main = "Seasonal Naive Forecast",
col = "blue", lwd=2)
lines(naive_forecast$mean, col = "red", lwd=2)
legend( x = "topleft",
legend = c("Observed Number of bleeds", "Seasonal Naive Forecast"),
col = c("Blue", "red"),
lty = c(1, 1),
lwd = c(2, 2),
cex = 0.65)
and get the following graph
Naive forecast vs Testing Set
I'm trying to change the time scale to months (Jan 2019 - August 2020) instead of the "2019, 2019.5" etc.
I am working with the R programming language. I am trying to plot some categorical and continuous data that I am working with, but I am getting an error that tells me that such plots are only possible with "only numeric variables".
library(survival)
library(ggplot2)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
str(data)
#plot
mycolours <- rainbow(length(unique(data$sex)), end = 0.6)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(4, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, 5), ylim = range(data[, 1:6]) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "Variable", ylab = "Standardised value")
axis(1, 1:5, labels = colnames(data)[1:6])
abline(v = 1:5, col = "#00000033", lwd = 2)
abline(h = seq(-2.5, 2.5, 0.5), col = "#00000022", lty = 2)
for (i in 1:nrow(data)) lines(as.numeric(data[i, 1:6]), col = mycolours[as.numeric(data$sex[i])])
legend("topright", c("Female", "Male"), lwd = 2, col = mycolours, bty = "n")
# dev.off()
Does anyone know if this is possible to do with both categorical and continuous data?
Thanks
Sources: R: Parallel Coordinates Plot without GGally
Yup. You just have to be careful with the values. Remember how the factors are coded internally: they are just spicy integer variables with value labels (similar to names). You can losslessly cast it to character or to numeric. For the sake of plotting, you need numbers for line coordinates, so the factor-y nature of your variables will come at the end.
Remember that the quality of your visualisation and the information content depends on the order of your variables in you data set. For factors, labels are absolutely necessary. Help the reader by doing some completely custom improvements impossible in ggplot2 in small steps!
I wrote a custom function allowing anyone to add super-legible text on top of the values that are not so obvious to interpret. Give meaningful names, choose appropriate font size, pass all those extra parameters to the custom function as an ellipsis (...)!
Here you can see that most of the dead patients are female and most of the censored ones are males. Maybe adding some points with slight jitter will give the reader idea about the distributions of these variables.
library(survival)
data(lung)
# Data preparation
lung.scaled <- apply(lung, 2, scale)
drop.column.index <- which(colnames(lung) == "sex")
lung.scaled <- lung.scaled[, -drop.column.index] # Dropping the split variable
split.var <- lung[, drop.column.index]
lung <- lung[, -drop.column.index]
mycolours <- rainbow(length(unique(split.var)), end = 0.6, v = 0.9, alpha = 0.4)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(5.5, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, ncol(lung.scaled)), ylim = range(lung.scaled, na.rm = TRUE) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "", ylab = "Standardised value")
axis(1, 1:ncol(lung.scaled), labels = colnames(lung), cex.axis = 0.95, las = 2)
abline(v = 1:ncol(lung), col = "#00000033", lwd = 2)
abline(h = seq(round(min(lung.scaled, na.rm = TRUE)), round(max(lung.scaled, na.rm = TRUE), 0.5)), col = "#00000022", lty = 2)
for (i in 1:nrow(lung.scaled)) lines(as.numeric(lung.scaled[i, ]), col = mycolours[as.numeric(split.var[i])])
legend("topleft", c("Female", "Male"), lwd = 3, col = mycolours, bty = "n")
# Labels for some categorical variables with a white halo for readability
labels.with.halo <- function(varname, data.scaled, labels, nhalo = 32, col.halo = "#FFFFFF44", hscale = 0.04, vscale = 0.04, ...) {
offsets <- cbind(cos(seq(0, 2*pi, length.out = nhalo + 1)) * hscale, sin(seq(0, 2*pi, length.out = nhalo + 1)) * vscale)[-(nhalo + 1), ]
ind <- which(colnames(data.scaled) == varname)
yvals <- sort(unique(data.scaled[, ind]))
for (i in 1:nhalo) text(rep(ind, length(yvals)) + offsets[i, 1], yvals + offsets[i, 2], labels = labels, col = col.halo, ...)
text(rep(ind, length(yvals)), yvals, labels = labels, ...)
}
labels.with.halo("status", lung.scaled, c("Censored", "Dead"), pos = 3)
labels.with.halo("ph.ecog", lung.scaled, c("Asymptomatic", "Symp. but ambul.", "< 50% bed", "> 50% bed"), pos = 3, cex = 0.9)
# dev.off()
I have created the following fanchart using the fanplot package. I'm trying to add axis ticks and labels to the y axis, however it's only giving me the decimals and not the full number. Looking for a solution to display the full number (e.g 4.59 and 4.61) on the y axis
I am also unsure of how to specify the breaks and number of decimal points for the labels on the y-axis using plot(). I know doing all of this in ggplot2 it would look something like this scale_y_continuous(breaks = seq(min(data.ts$Index),max(data.ts$Index),by=0.02)) . Any ideas on how to specify the breaks in the y axis as well as the number of decimal points using the base plot() feature in R?
Here is a reproductible of my dataset data.ts
structure(c(4.6049904235401, 4.60711076016453, 4.60980084146652,
4.61025389170935, 4.60544515681515, 4.60889021700954, 4.60983993107244,
4.61091608826696, 4.61138799159174, 4.61294431148318, 4.61167545843765,
4.61208284263432, 4.61421991328081, 4.61530485425155, 4.61471465043043,
4.6155992084451, 4.61195799200607, 4.61178486640435, 4.61037927954796,
4.60744590947049, 4.59979957741728, 4.59948551500254, 4.60078678080182,
4.60556092645471, 4.60934962087565, 4.60981147563749, 4.61060477704678,
4.61158365084251, 4.60963435263623, 4.61018215733317, 4.61209710959768,
4.61231368335184, 4.61071363571141, 4.61019496497916, 4.60948652606191,
4.61068813487859, 4.6084092003352, 4.60972706132393, 4.60866915174087,
4.61192565195909, 4.60878767339377, 4.61341471281265, 4.61015272152397,
4.6093479714315, 4.60750965935653, 4.60768790690338, 4.60676463096309,
4.60746490411374, 4.60885670935448, 4.60686846708382, 4.60688947889575,
4.60867708110485, 4.60448791268212, 4.60387348166032, 4.60569806689426,
4.6069320880709, 4.6087143894128, 4.61059688801283, 4.61065399116698,
4.61071421014339), .Tsp = c(2004, 2018.75, 4), class = "ts")
and here is a reproductible of the code I'm using
# # Install and Load Packages
## pacman::p_load(forecast,fanplot,tidyverse,tsbox,lubridate,readxl)
# Create an ARIMA Model using the auto.arima function
model <- auto.arima(data.ts)
# Simulate forecasts for 4 quarters (1 year) ahead
forecasts <- simulate(model, n=4)
# Create a data frame with the parameters needed for the uncertainty forecast
table <- ts_df(forecasts) %>%
rename(mode=value) %>%
mutate(time0 = rep(2019,4)) %>%
mutate(uncertainty = sd(mode)) %>%
mutate(skew = rep(0,4))
y0 <- 2019
k <- nrow(table)
# Set Percentiles
p <- seq(0.05, 0.95, 0.05)
p <- c(0.01, p, 0.99)
# Simulate a qsplitnorm distribution
fsval <- matrix(NA, nrow = length(p), ncol = k)
for (i in 1:k)
fsval[, i] <- qsplitnorm(p, mode = table$mode[i],
sd = table$uncertainty[i],
skew = table$skew[i])
# Create Plot
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
title(ylab = 'Log AFSI',main = 'Four-Quarter Ahead Forecast Fan - AFSI',
xlab = 'Date')
rect(y0 - 0.25, par("usr")[3] - 1, y0 + 2, par("usr")[4] + 1,
border = "gray90", col = "gray90")
fan(data = fsval, data.type = "values", probs = p,
start = y0, frequency = 4,
anchor = data.ts[time(data.ts) == y0 - .25],
fan.col = colorRampPalette(c("#75002B", "pink")),
ln = NULL, rlab = NULL)
# Add axis labels and ticks
axis(1, at = y0-2:y0 + 2, tcl = 0.5)
axis(1, at = seq(y0-2, y0 + 2, 0.25), labels = FALSE, tcl = 0.25)
abline(v = y0 - 0.25, lty = 1)
abline(v = y0 + 0.75, lty = 2)
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
range(blah) will only return two values (the minimum and maximum). The at parameter of axis() requires a sequence of points at which you require axis labels. Hence, these are the only two y values you have on your plot. Take a look at using pretty(blah) or seq(min(blah), max(blah), length.out = 10).
The suggestions of #Feakster are worth looking at, but the problem here is that the y-axis margin isn't wide enough. You could do either of two things. You could round the labels so they fit within the margins, for example you could replace this
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
with this
axis(2, at = range(fsval, data.ts),
labels = sprintf("%.3f", range(fsval, data.ts)), las = 2, tcl = 0.5)
Or, alternatively you could increase the y-axis margin before you make the plot by specifying:
par(mar=c(5,5,4,2)+.1)
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
Then everything below that should work. The mar element of par sets the number of lines printed in the margin of each axis. The default is c(5,4,4,2).
I am trying to replicate this plot in R.
I have some difficulties :
I don't know the code to create the line 90 min (it is not a continuous line, it starts at 2000)
How can I add the text R1, R2 and R3 in my plot?
How can I add the title for each line in my plot?
This is my current code :
plot(Year,Duration, abline(v=2000, col="green"))
Thanks
Here's a stab at it, base graphics.
set.seed(42)
dat <- data.frame(
Year = sample(1990:2040, size=100, replace=TRUE),
Duration = runif(n=100, min=-1, max=1)
)
plot(Duration ~ Year, data = dat, pch = 16, yaxt = 'n', xaxt = 'n', col = "orange")
abline(v = 2000, lwd = 2, col = "green4")
lines(c(2000, par("usr")[2]), rep(0, 2), lwd = 2, col = "green4")
text(c(1995, 2025, 2025), c(0, -0.5, 0.5), c("R1", "R2", "R3"))
mtext("90min", side = 4)