Related
My example data is as follows:
df <- data.frame(study = c("Hodaie","Kerrigan","Lee","Andrade","Lim"), SR = c(0.5460, 0.2270, 0.7540, 0.6420, 0.5000), SE = c(12.30, 15.70, 12.80, 13.80, 9.00), Patients = c(5, 5, 3, 6, 4))
I want to conduct the meta-analysis with SR(single group percentage), SE (standard error that I can compute based on sample size and percentage), and patients(sample size for each study), and I hope I could get the following forest plot (I found this example in an article, and they also have one group percentage data, but I can't find which R statement or argument they used):
Could anyone tell me which R statement or argument that I could use to conduct the meta-analysis and generate the forest plot above? Thank you!
I am sure there are plenty of ways to do this using packages but it can be accomplished in base R (and there are likely more elegant solutions using base R). The way I do it is to first build a blank plot much larger than the needed graphing portion, then overlay the relevant elements on it. I find one has more control over it this way. A basic example that could get you started is below. If you are new to R (based on your name NewRUser), I suggest running it line-by-line to see how it all works. Again, this is only one way and there are likely better approaches. Good luck!
Sample Data
#### Sample Data (modified from OP)
df <- data.frame(Study = c("Hodaie","Kerrigan","Lee","Andrade","Lim"),
SR = c(0.5460, 0.2270, 0.7540, 0.6420, 0.5000),
SE = c(12.30, 15.70, 12.80, 13.80, 9.00),
Patients = c(5, 5, 3, 6, 4),
ci_lo = c(30, -8.0, 50, 37, 32),
ci_hi = c(78, 53, 100, 91, 67))
### Set up plotting elements
n.studies <- nrow(df)
yy <- n.studies:1
seqx <- seq(-100, 100, 50)
## blank plot much larger than needed
plot(range(-550, 200), range(0, n.studies), type = 'n', axes = F, xlab = '', ylab = '') #blank plot, much bigger than plotting portion needed
# Set up axes
axis(side = 1, at = seqx, labels = seqx, cex.axis = 1, mgp = c(2, 1.5, 1)) # add axis and label (bottom)
mtext(side = 1, at = 0, 'Seizure Reduction', line = 2.5, cex = 0.85, padj = 1)
axis(side = 3, at = seqx, labels = seqx, cex.axis = 1, mgp = c(2, 1.5, 1)) # add axis and label (top)
mtext(side = 3, at = 0, 'Seizure Reduction', line = 2.5, cex = 0.85, padj = -1)
## add lines and dots
segments(df[, "ci_lo"], yy, df[,"ci_hi"], yy) # add lines
points(df[,"SR"]*100, yy, pch = 19) # add points
segments(x0 = 0, y0 = max(yy), y1 = 0, lty = 3, lwd = 0.75) #vertical line # 0
### Add text information
par(xpd = TRUE)
text(x = -550, y = yy, df[,"Study"], pos = 4)
text(x = -450, y = yy, df[,"SR"]*100, pos = 4)
text(x = -350, y = yy, df[,"SE"], pos = 4)
text(x = -250, y = yy, df[,"Patients"], pos = 4)
text(x = 150, y = yy, paste0(df[,"ci_lo"], "-", df[,"ci_hi"]), pos = 4)
text(x = c(seq(-550, -250, 100), 150), y = max(yy)+0.75,
c(colnames(df)[1:4], "CI"), pos = 4, font = 2)
# Add legend
legend(x = 50, y = 0.5, c("Point estimate", "95% Confidence interval"),
pch = c(19, NA), lty = c(NA, 19), bty = "n", cex = 0.65)
I need to calculate the area between two curves. One curve - country’s GDP per capita, other curve - GDP trend. I tried to use the integrate function in my code below but the calculated area is not accurate.
My code includes unnecessary area before and after the intersection points. I need to calculate the area where the GDP curve is below the GDP trend curve (1).
GDP_GR <- ts(GR, start = c(2000, 1), frequency = 4)
gdp_gr <- log(GDP_GR)
y.pot_gr <- hpfilter(gdp_gr, freq = 1600)$trend
ts.plot(gdp_gr)
y.pot_gr
lines(y.pot_gr, col = "blue")
# area:
y.c_gr <- window(gdp_gr, start = c(2020, 1), end = c(2021, 1))
y.pot.c_gr <- window(y.pot_gr, start = c(2020, 1), end = c(2021, 1))
n <- length(y.c_gr)
x <- seq(1, n, 1)
plot(x, y.c_gr, type = "l", lwd = 1.5,
ylim = c(min(c(y.c_gr, y.pot.c_gr)), max(c(y.c_gr, y.pot.c_gr))),
xlab = "Time", ylab = "GDP", xaxt = "n")
lines(x, y.pot.c_gr, lty = 2, lwd = 1.5)
axis(1, at = 1:n, labels = ENTRY[81:85])
polygon(c(x, rev(x)),c(y.c_gr, rev(y.pot.c_gr)),
col = "lightgrey", border = NA)
function_1 <- approxfun(x, y.c_gr - y.pot.c_gr)
function_2 <- function(x) { abs(function_1(x)) }
integrate(function_2, 1, n)
How can I improve my code?
I am working with the R programming language. I am trying to plot some categorical and continuous data that I am working with, but I am getting an error that tells me that such plots are only possible with "only numeric variables".
library(survival)
library(ggplot2)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
str(data)
#plot
mycolours <- rainbow(length(unique(data$sex)), end = 0.6)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(4, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, 5), ylim = range(data[, 1:6]) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "Variable", ylab = "Standardised value")
axis(1, 1:5, labels = colnames(data)[1:6])
abline(v = 1:5, col = "#00000033", lwd = 2)
abline(h = seq(-2.5, 2.5, 0.5), col = "#00000022", lty = 2)
for (i in 1:nrow(data)) lines(as.numeric(data[i, 1:6]), col = mycolours[as.numeric(data$sex[i])])
legend("topright", c("Female", "Male"), lwd = 2, col = mycolours, bty = "n")
# dev.off()
Does anyone know if this is possible to do with both categorical and continuous data?
Thanks
Sources: R: Parallel Coordinates Plot without GGally
Yup. You just have to be careful with the values. Remember how the factors are coded internally: they are just spicy integer variables with value labels (similar to names). You can losslessly cast it to character or to numeric. For the sake of plotting, you need numbers for line coordinates, so the factor-y nature of your variables will come at the end.
Remember that the quality of your visualisation and the information content depends on the order of your variables in you data set. For factors, labels are absolutely necessary. Help the reader by doing some completely custom improvements impossible in ggplot2 in small steps!
I wrote a custom function allowing anyone to add super-legible text on top of the values that are not so obvious to interpret. Give meaningful names, choose appropriate font size, pass all those extra parameters to the custom function as an ellipsis (...)!
Here you can see that most of the dead patients are female and most of the censored ones are males. Maybe adding some points with slight jitter will give the reader idea about the distributions of these variables.
library(survival)
data(lung)
# Data preparation
lung.scaled <- apply(lung, 2, scale)
drop.column.index <- which(colnames(lung) == "sex")
lung.scaled <- lung.scaled[, -drop.column.index] # Dropping the split variable
split.var <- lung[, drop.column.index]
lung <- lung[, -drop.column.index]
mycolours <- rainbow(length(unique(split.var)), end = 0.6, v = 0.9, alpha = 0.4)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(5.5, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, ncol(lung.scaled)), ylim = range(lung.scaled, na.rm = TRUE) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "", ylab = "Standardised value")
axis(1, 1:ncol(lung.scaled), labels = colnames(lung), cex.axis = 0.95, las = 2)
abline(v = 1:ncol(lung), col = "#00000033", lwd = 2)
abline(h = seq(round(min(lung.scaled, na.rm = TRUE)), round(max(lung.scaled, na.rm = TRUE), 0.5)), col = "#00000022", lty = 2)
for (i in 1:nrow(lung.scaled)) lines(as.numeric(lung.scaled[i, ]), col = mycolours[as.numeric(split.var[i])])
legend("topleft", c("Female", "Male"), lwd = 3, col = mycolours, bty = "n")
# Labels for some categorical variables with a white halo for readability
labels.with.halo <- function(varname, data.scaled, labels, nhalo = 32, col.halo = "#FFFFFF44", hscale = 0.04, vscale = 0.04, ...) {
offsets <- cbind(cos(seq(0, 2*pi, length.out = nhalo + 1)) * hscale, sin(seq(0, 2*pi, length.out = nhalo + 1)) * vscale)[-(nhalo + 1), ]
ind <- which(colnames(data.scaled) == varname)
yvals <- sort(unique(data.scaled[, ind]))
for (i in 1:nhalo) text(rep(ind, length(yvals)) + offsets[i, 1], yvals + offsets[i, 2], labels = labels, col = col.halo, ...)
text(rep(ind, length(yvals)), yvals, labels = labels, ...)
}
labels.with.halo("status", lung.scaled, c("Censored", "Dead"), pos = 3)
labels.with.halo("ph.ecog", lung.scaled, c("Asymptomatic", "Symp. but ambul.", "< 50% bed", "> 50% bed"), pos = 3, cex = 0.9)
# dev.off()
I have created the following fanchart using the fanplot package. I'm trying to add axis ticks and labels to the y axis, however it's only giving me the decimals and not the full number. Looking for a solution to display the full number (e.g 4.59 and 4.61) on the y axis
I am also unsure of how to specify the breaks and number of decimal points for the labels on the y-axis using plot(). I know doing all of this in ggplot2 it would look something like this scale_y_continuous(breaks = seq(min(data.ts$Index),max(data.ts$Index),by=0.02)) . Any ideas on how to specify the breaks in the y axis as well as the number of decimal points using the base plot() feature in R?
Here is a reproductible of my dataset data.ts
structure(c(4.6049904235401, 4.60711076016453, 4.60980084146652,
4.61025389170935, 4.60544515681515, 4.60889021700954, 4.60983993107244,
4.61091608826696, 4.61138799159174, 4.61294431148318, 4.61167545843765,
4.61208284263432, 4.61421991328081, 4.61530485425155, 4.61471465043043,
4.6155992084451, 4.61195799200607, 4.61178486640435, 4.61037927954796,
4.60744590947049, 4.59979957741728, 4.59948551500254, 4.60078678080182,
4.60556092645471, 4.60934962087565, 4.60981147563749, 4.61060477704678,
4.61158365084251, 4.60963435263623, 4.61018215733317, 4.61209710959768,
4.61231368335184, 4.61071363571141, 4.61019496497916, 4.60948652606191,
4.61068813487859, 4.6084092003352, 4.60972706132393, 4.60866915174087,
4.61192565195909, 4.60878767339377, 4.61341471281265, 4.61015272152397,
4.6093479714315, 4.60750965935653, 4.60768790690338, 4.60676463096309,
4.60746490411374, 4.60885670935448, 4.60686846708382, 4.60688947889575,
4.60867708110485, 4.60448791268212, 4.60387348166032, 4.60569806689426,
4.6069320880709, 4.6087143894128, 4.61059688801283, 4.61065399116698,
4.61071421014339), .Tsp = c(2004, 2018.75, 4), class = "ts")
and here is a reproductible of the code I'm using
# # Install and Load Packages
## pacman::p_load(forecast,fanplot,tidyverse,tsbox,lubridate,readxl)
# Create an ARIMA Model using the auto.arima function
model <- auto.arima(data.ts)
# Simulate forecasts for 4 quarters (1 year) ahead
forecasts <- simulate(model, n=4)
# Create a data frame with the parameters needed for the uncertainty forecast
table <- ts_df(forecasts) %>%
rename(mode=value) %>%
mutate(time0 = rep(2019,4)) %>%
mutate(uncertainty = sd(mode)) %>%
mutate(skew = rep(0,4))
y0 <- 2019
k <- nrow(table)
# Set Percentiles
p <- seq(0.05, 0.95, 0.05)
p <- c(0.01, p, 0.99)
# Simulate a qsplitnorm distribution
fsval <- matrix(NA, nrow = length(p), ncol = k)
for (i in 1:k)
fsval[, i] <- qsplitnorm(p, mode = table$mode[i],
sd = table$uncertainty[i],
skew = table$skew[i])
# Create Plot
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
title(ylab = 'Log AFSI',main = 'Four-Quarter Ahead Forecast Fan - AFSI',
xlab = 'Date')
rect(y0 - 0.25, par("usr")[3] - 1, y0 + 2, par("usr")[4] + 1,
border = "gray90", col = "gray90")
fan(data = fsval, data.type = "values", probs = p,
start = y0, frequency = 4,
anchor = data.ts[time(data.ts) == y0 - .25],
fan.col = colorRampPalette(c("#75002B", "pink")),
ln = NULL, rlab = NULL)
# Add axis labels and ticks
axis(1, at = y0-2:y0 + 2, tcl = 0.5)
axis(1, at = seq(y0-2, y0 + 2, 0.25), labels = FALSE, tcl = 0.25)
abline(v = y0 - 0.25, lty = 1)
abline(v = y0 + 0.75, lty = 2)
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
range(blah) will only return two values (the minimum and maximum). The at parameter of axis() requires a sequence of points at which you require axis labels. Hence, these are the only two y values you have on your plot. Take a look at using pretty(blah) or seq(min(blah), max(blah), length.out = 10).
The suggestions of #Feakster are worth looking at, but the problem here is that the y-axis margin isn't wide enough. You could do either of two things. You could round the labels so they fit within the margins, for example you could replace this
axis(2, at = range(fsval, data.ts), las = 2, tcl = 0.5)
with this
axis(2, at = range(fsval, data.ts),
labels = sprintf("%.3f", range(fsval, data.ts)), las = 2, tcl = 0.5)
Or, alternatively you could increase the y-axis margin before you make the plot by specifying:
par(mar=c(5,5,4,2)+.1)
plot(data.ts, type = "l", col = "#75002B", lwd = 4,
xlim = c(y0 - 2,y0 + 0.75), ylim = range(fsval, data.ts),
xaxt = "n", yaxt = "n", ylab = "",xlab='',
main = '')
Then everything below that should work. The mar element of par sets the number of lines printed in the margin of each axis. The default is c(5,4,4,2).
I am quite new in R.
I am doing a part of my MSc thesis and wanna make some diurnal plots of for instance methane production in a period of time.
Now I a wanna see its variation in time and its correlation with another factor in the same time. Then I have two questions.
First:
How to define the xlim and ylim to increase by 2 hours. It has its own default and when I give it for example:
xlim = c(0, 23)
then it starts from 0 and goes up in 5 hours. I want it to go up in 2 hours.
Second:
How to put another variable which might be correlated to my first variable in the same time period. Let's say methane production in 23 hours could be related to oxygen consumption, just as an example. How can I put oxygen and methane in the same axis(y) against time (x)?
I will be so appreciated if you could help me with this.
Kinds,
Farhad
You can use at and labels arguments in axis function call to customize labels and tick locations.
You can use axis function with argument side = 4 to create custom y-axis on the right of you graph.
Please see the code below illustrating the above mentioned points:
set.seed(123)
x <- 0:23
df<- data.frame(
x,
ch4 = 1000 - x ^ 2,
o2 = 2000 - 2 * (x - 10) ^ 2
)
par(mar = c(5, 5, 2, 5))
with(df, plot(x, ch4,
type = "l", col = "red3",
ylab = "CH4 emission",
lwd = 3,
xlim = c(0, 23),
xlab = "",
xaxt = "n"))
axis(1, at = seq(0, 23, 2), labels = seq(0, 23, 2))
par(new = TRUE)
with(df, plot(x, o2,
pch = 16, axes = FALSE,
xlab = NA, ylab = NA, cex = 1.2))
axis(side = 4)
mtext(side = 4, line = 3, "O2 consumption")
legend("topright",
legend = c("O2", "CH4"),
lty = c(1, 0),
lwd = c(3, NA),
pch = c(NA, 16),
col = c("red3", "black"))
Output: