i am looking for a way to display estimates of a meta-analysis with lots of comparisons in a wide format instead of a forestplot. I came across a timberplot as displayed in this publication in figure 1:
https://www.researchgate.net/publication/283078594_Translational_failure_of_anti-inflammatory_compounds_for_myocardial_infarction_A_meta-Analysis_of_large_animal_models
So far, I was not able to find any r-code to create timberplots. Any hints would be highly appreciated.
As an example, here is a snippet of my current data:
structure(list(Author = c("Zuloaga 2014", "Kelly-Cobbs 2013",
"Kurita 2020", "Li (a) 2010", "Li (b) 2010", "Luo 2017", "Zhang 2016",
"Chen 2011", "Iwata 2015", "Guan 2011", "Mishiro 2014", "Zhang 2016",
"Rewell 2010", "Desilles 2017", "Cai 2018", "Yang 2015", "Augestad 2020",
"Kumas 2016", "Li 2004", "Pintana 2019", "Gao 2010", "Zhu 2016",
"Li 2013", "Chen 2019", "Iwata 2014"), Effect.size = c(35.200386286818,
-83.4784185709104, 36.1567339277335, -67.2836145890038, -66.2782956058588,
50.6942625098245, 2.16606498194945, 34.0909090909091, 34.6207954981455,
-75.7847533632287, 3.79249627522687, 33.8242513500245, 20.4,
53.381981476284, 55.8256496227997, 37.7068384829404, 35.7624831309042,
34.2436848134081, 44.0740740740741, 11.3382899628253, 78.1728075845723,
43.7891335083821, 32.0754716981132, 24.8822975517891, 56.9998933755769
), Standard.error = c(12.4780629739639, 35.8172017746254, 2.51216141038517,
45.4714925944508, 14.9052728665095, 15.9630454594002, 12.7738671567103,
7.27627754260179, 6.95739967875146, 6.46735654871385, 6.32805324709443,
4.51368516355712, 11.6488966431553, 12.4958199880194, 13.0017602415415,
12.1147303263766, 33.7832025707735, 21.5383168322688, 13.0893311456905,
21.8148377078391, 17.226146227274, 2.16584647411636, 6.82104394943358,
17.2913669783741, 4.81056206059614)), row.names = c(NA, 25L), class = "data.frame")
I ran a meta-analysis using the metagen() command from the meta package with the following code:
ma_results <- metagen(
`Effect.size`,
`Standard.error`,
sm = "NMD",
data = df,
studlab = Author,
random = TRUE,
method.tau = "REML",
prediction = TRUE
)
In the following metagen() object, the effect sizes are stored in ma_results$TE and the lower and upper bounds in ma_results$lower and ma_results$upper.
Following the suggestion of Alan Cameron (see below) my current code looks like:
ggplot(within(ma_results[order(ma_results$TE), ], id <- seq(nrow(25))), aes(id, TE)) +
geom_point(size = 0.5) +
geom_linerange(aes(ymin = lower, ymax = upper)) +
geom_hline(yintercept = TE.random, linetype = 2) +
theme_bw()
Here I get an error because of wrong number of dimensions within ma_results[order(ma_results$TE),].
It's fairly easy to create a plot like this using geom_linerange in ggplot. Here's an example with made up data. Whether you will be able to do this with your own data can't be known without a reproducible example:
library(ggplot2)
set.seed(1)
df <- data.frame(mean = runif(200), CI = runif(200))
ggplot(within(df[order(df$mean), ], id <- seq(nrow(df))), aes(id, mean)) +
geom_point(size = 0.5) +
geom_linerange(aes(ymin = mean - CI, ymax = mean + CI)) +
geom_hline(yintercept = mean(df$mean), linetype = 2) +
theme_bw()
EDIT
With the sample data, we can now do the following:
Make the papers a factor variable, with the ordering of the factor being from the lowest to highest effect size
Add upper and lower columns representing one standard error above and one standard error below the effect size. If you want this to be a 95% confidence interval instead, do effect size +/- 1.96 times the standard error.
First, we need to make sure every paper is uniquely identified. At the moment, your sample data contains two different papers with the same name (Zhang 2016), so we need to change one of them to mark it as unique:
df$Author[12] <- "Zhang (b) 2016"
Now let's get the papers arranged by effect size, and add our lower and upper bounds for each paper:
df$Author <- factor(df$Author, df$Author[order(df$Effect.size)])
df$lower <- df$Effect.size - df$Standard.error
df$upper <- df$Effect.size + df$Standard.error
The plot itself is then just:
ggplot(df, aes(Author, Effect.size)) +
geom_point() +
geom_linerange(aes(ymin = lower, ymax = upper)) +
geom_hline(yintercept = mean(df$Effect.size), linetype = 2) +
annotate(geom = 'text', x = 1, y = mean(df$Effect.size), vjust = -0.5,
label = paste('Mean =', round(mean(df$Effect.size), 1)), hjust = 0) +
theme_light() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Related
I have a dataset structured as follows, where I am tracking collective action mentions by subReddit by month, relative to a policy treatment which is introduced in Feb 17th, 2012. As a result, the period "Feb 2012" appears twice in my dataset where the "pre" period refers to the Feb 2012 days before treatment, and "post" otherwise.
treatment_status month_year collective_action_percentage
pre Dec 2011 5%
pre Jan 2012 8%
pre Feb 2012 10%
post Feb 2012 3%
post March 2012 10%
However, I am not sure how to best visualize this indicator by month, but I made the following graph but I was wondering if presenting this pattern/variable by week&year, rather than month&year basis would be clearer if I am interested in showing how collective action mentions decline after treatment?
ggplot(data = df1, aes(x = as.Date(month_year), fill = collective_action_percentage ,y = collective_action_percentage)) +
geom_bar(stat = "identity", position=position_dodge()) +
scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
xlab("Criticism by individuals active before and after treatment") +
theme_classic()+
theme(plot.title = element_text(size = 10, face = "bold"),
axis.text.x = element_text(angle = 90, vjust = 0.5))
output:
I created the month_year variable as follows using the Zoo package
df<- df %>%
mutate(month_year = zoo::as.yearmon(date))
Finally, I tried aggregating the data by weekly-basis as follows, however, given that I have multiple years in my dataset, I want to ideally aggregate data by week&year, and not simply by week
df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))
Plot a point for each row and connect them with a line so that it is clear what the order is. We also color the pre and post points differently and make treatment status a factor so that we can order the pre level before the post level.
library(ggplot2)
library(zoo)
df2 <- transform(df1, month_year = as.yearmon(month_year, "%b %Y"),
treatment_status = factor(treatment_status, c("pre", "post")))
ggplot(df2, aes(month_year, collective_action_percentage)) +
geom_point(aes(col = treatment_status), cex = 4) +
geom_line()
Note
We assume df1 is as follows. We have already removed % .
df1 <-
structure(list(treatment_status = c("pre", "pre", "pre", "post",
"post"), month_year = c("Dec 2011", "Jan 2012", "Feb 2012", "Feb 2012",
"March 2012"), collective_action_percentage = c(5L, 8L, 10L,
3L, 10L)), class = "data.frame", row.names = c(NA, -5L))
I have a question about the restricted representation of a plot. The limits of the y axis should range from 0-5. Due to ceiling and ground effects the plot is now partly not displayed correctly. See attachment. How can I get the plots to be displayed completely without having to change the scaling? Thank you, you are very helpful!
# visual inspection of data
fit<-Anxiety_full
# Plot model
plot_Anxiety <- plot_model(fit, type = "eff", terms = c("Condition", "Group"))+ #geom_line(size = 1)
coord_cartesian(xlim = c(0.5, NA), clip = "off") + theme_tq() +scale_colour_tq() + scale_fill_tq(light) +
labs(
title = "",
y = "Anxiety Score [ 0-5 ]",
x = "") + xlim(c("baseline", "60 bpm", "16 bpm", "10 bpm", "6 bpm", "random")) +
ylim(c(-0.5,5.5)
)+ ggplot2::labs(colour = "Group") + scale_color_manual(values=c('Red','Black'))
plot_Anxiety<- plot_Anxiety + theme_apa()
plot_Anxiety
I have time data, and I want to plot the frequency per hour on a 24hr clock.
The data are transformed to circular, and the estimates for 'periodic mean' mu and 'concentration' kappa are calculated with mle.vonmises().
The graph is generated with ggplot2, using geom_hist() and coord_polar(). The periodic mean is drawn on the plot with a simple call to geom_vline().
Question
I want to draw a confidence interval of 95% around the mean. Then, I would like to visually check whether a given timestamp (e.g. "22:00:00") lies within the CI or not.
How do I do this with a von mises distribution and ggplot2?
The code below shows how far I got.
The data
timestamps <- c("08:43:48", "09:17:52", "12:56:22", "12:27:32", "10:59:23",
"07:22:45", "11:13:59", "10:13:26", "10:07:01", "06:09:56",
"12:43:17", "07:07:35", "09:36:44", "10:45:00", "08:27:36",
"07:55:35", "11:32:56", "13:18:35", "11:09:51", "09:46:33",
"06:59:12", "10:19:36", "09:39:47", "09:39:46", "18:23:54")
The code
library(lubridate)
library(circular)
library(ggplot2)
## Convert from char to hours
timestamps_hrs <- as.numeric(hms(timestamps)) / 3600
## Convert to class circular
timestamps_hrs_circ <- circular(timestamps_hrs, units = "hours", template = "clock24")
## Estimate the periodic mean and the concentration
## from the von Mises distribution
estimates <- mle.vonmises(timestamps_hrs_circ)
periodic_mean <- estimates$mu %% 24
concentration <- estimates$kappa
## Clock plot // Circular Histogram
clock01 <- ggplot(data.frame(timestamps_hrs_circ), aes(x = timestamps_hrs_circ)) +
geom_histogram(breaks = seq(0, 24), colour = "blue", fill = "lightblue") +
coord_polar() +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), minor_breaks = NULL) +
theme_light()
clock01
## Add the periodic_mean
clock01 +
geom_vline(xintercept = as.numeric(periodic_mean), color = "red", linetype = 3, size = 1.25)
This yields the following graph:
I think I found an approximation of the solution. As we know the parameters mu and kappa (resp. the periodic mean and the concentration), we know the distribution. This, in turn, means we know the densities of given timestamps, and we can calculate the cutoff for the 95% confidence level.
Once we have that, we can generate the timestamps for every minute of a day. We transform the timestamps as needed, calculate the densities, and compare against the cutoff value.
This way we know on the 1 minute level, whether we are in the Confidence Interval or not.
The code
(it is assumed that the code in the question has been run)
quantile <- qvonmises((1 - 0.95)/2, mu = periodic_mean, kappa = concentration)
cutoff <- dvonmises(quantile, mu = periodic_mean, kappa = concentration)
## generate a timestamp for every minute in a day
## then the transformations needed
ts_1min <- format(seq.POSIXt(as.POSIXct(Sys.Date()),
as.POSIXct(Sys.Date()+1),
by = "1 min"),
"%H:%M:%S", tz = "GMT")
ts_1min_hrs <- as.numeric(hms(ts_1min)) / 3600
ts_1min_hrs_circ <- circular(ts_1min_hrs, units = "hours", template = "clock24")
## generate densities to compare with the cutoff
dens_1min <- dvonmises(ts_1min_hrs_circ, mu = periodic_mean, kappa = concentration)
## compare: vector of FALSE/TRUE
feat_1min <- dens_1min >= cutoff
df_1min_feat <- data.frame(ts = ts_1min_hrs_circ,
feature = feat_1min)
## get the min and max time of the CI
CI <- df_1min_feat %>%
filter(feature == TRUE) %>%
summarise(min = min(ts), max= max(ts))
CI
# min max
# 5.283333 14.91667
With the information above, and using geom_rect(), we can get what we want:
ggplot(data.frame(timestamps_hrs_circ), aes(x = timestamps_hrs_circ)) +
coord_polar() +
scale_x_continuous("", limits = c(0, 24), breaks = seq(0, 24), minor_breaks = NULL) +
geom_vline(xintercept = as.numeric(CI), color = "darkgreen", linetype = 1, size = 1.5) +
geom_rect(xmin = CI$min, xmax = CI$max, ymin = 0, ymax = 5, alpha = .5, fill = "lightgreen") +
ggtitle(label = "Circular Histogram", subtitle = "periodic mean in red,\n95%-CI in green" ) +
geom_histogram(breaks = seq(0, 24), colour = "blue", fill = "lightblue") +
geom_vline(xintercept = as.numeric(periodic_mean), color = "red", linetype = 2, size = 1.5) +
theme_light()
Resulting in the following graph:
I hope someone may benefit from this as well.
I have a simple R script to create a forecast based on a file.
Data has been recorded since 2014 but I am having trouble trying to accomplish below two goals:
Plot only a subset of the forecast information (starting on 11/2017 onwards).
Include month and year in a specific format (i.e. Jun 17).
Here is the link to the dataset and below you will find the code made by me so far.
# Load required libraries
library(forecast)
library(ggplot2)
# Load dataset
emea <- read.csv(file="C:/Users/nsoria/Downloads/AMS Globales/EMEA_Depuy_Finanzas.csv", header=TRUE, sep=';', dec=",")
# Create time series object
ts_fin <- ts(emea$Value, frequency = 26, start = c(2014,11))
# Pull out the seasonal, trend, and irregular components from the time series
model <- stl(ts_fin, s.window = "periodic")
# Predict the next 3 bi weeks of tickets
pred <- forecast(model, h = 5)
# Plot the results
plot(pred, include = 5, showgap = FALSE, main = "Ticket amount", xlab = "Timeframe", ylab = "Quantity")
I appreciate any help and suggestion to my two points and a clean plot.
Thanks in advance.
Edit 01/10 - Issue 1:
I added the screenshot output for suggested code.
Plot1
Edit 01/10 - Issue 2:
Once transformed with below code, it somehow miss the date count and mess with the results. Please see two screenshots and compare the last value.
Screenshot 1
Screenshot 2
Plotting using ggplot2 w/ ggfortify, tidyverse, lubridate and scales packages
library(lubridate)
library(tidyverse)
library(scales)
library(ggfortify)
# Convert pred from list to data frame object
df1 <- fortify(pred) %>% as_tibble()
# Convert ts decimal time to Date class
df1$Date <- as.Date(date_decimal(df1$Index), "%Y-%m-%d")
str(df1)
# Remove Index column and rename other columns
# Select only data pts after 2017
df1 <- df1 %>%
select(-Index) %>%
filter(Date >= as.Date("2017-01-01")) %>%
rename("Low95" = "Lo 95",
"Low80" = "Lo 80",
"High95" = "Hi 95",
"High80" = "Hi 80",
"Forecast" = "Point Forecast")
df1
### Updated: To connect the gap between the Data & Forecast,
# assign the last non-NA row of Data column to the corresponding row of other columns
lastNonNAinData <- max(which(complete.cases(df1$Data)))
df1[lastNonNAinData, !(colnames(df1) %in% c("Data", "Fitted", "Date"))] <- df1$Data[lastNonNAinData]
# Or: use [geom_segment](http://ggplot2.tidyverse.org/reference/geom_segment.html)
plt1 <- ggplot(df1, aes(x = Date)) +
ggtitle("Ticket amount") +
xlab("Time frame") + ylab("Quantity") +
geom_ribbon(aes(ymin = Low95, ymax = High95, fill = "95%")) +
geom_ribbon(aes(ymin = Low80, ymax = High80, fill = "80%")) +
geom_point(aes(y = Data, colour = "Data"), size = 4) +
geom_line(aes(y = Data, group = 1, colour = "Data"),
linetype = "dotted", size = 0.75) +
geom_line(aes(y = Fitted, group = 2, colour = "Fitted"), size = 0.75) +
geom_line(aes(y = Forecast, group = 3, colour = "Forecast"), size = 0.75) +
scale_x_date(breaks = scales::pretty_breaks(), date_labels = "%b %y") +
scale_colour_brewer(name = "Legend", type = "qual", palette = "Dark2") +
scale_fill_brewer(name = "Intervals") +
guides(colour = guide_legend(order = 1), fill = guide_legend(order = 2)) +
theme_bw(base_size = 14)
plt1
I'm trying to create a custom function to return a chart object. This function seems to be having an error with calculating min/max/etc in the ggplot object.
If I run the code for the ggplot not inside a custom function it works.
To reproduce this error after I need to clear the memory with (rm(list = ls())) and reload the function and data.
How can I change my function to work correctly?
Sample data:
Date<-seq(as.Date("2000/1/1"), by = "week", length.out = 53*4)
ThousandBarrelsADay<-sample(1:1000, 53*4, replace=F)
yAxisTitle<-"Thousand Barrels per Day"
titleChart<-"test"
Function call:
p<-LinePlotTestStatsLine(Date, ThousandBarrelsADay, titleChart, yAxisTitle)
Error:
p
Error in eval(expr, envir, enclos) : object 'MinVal' not found
The code for the function:
LinePlotTestStatsLine<- function(xDateValues, yValues, titleChart, yAxisTitle) {
dfTemp=0
#the sub title outlining the data range
subtitleChart = paste("(Data set from ", min(xDateValues), " to ", max(xDateValues), ")", sep="")
#create a base dataframe
Week<- as.numeric(str_sub(ISOweek(xDateValues),start=-2))
dfTemp<-data.frame(xDateValues, Week, yValues)
dfTemp<- dfTemp[order(dfTemp$xDateValues),]
#Summary Stat by week
dfTemp_Out<-describeBy(dfTemp$yValues, dfTemp$Week, mat = TRUE)
colnames(dfTemp_Out)[2]<-"Week"
#get the last year's of data Use 53 weeks because some years have 53 weeks
tempLast53<- tail(dfTemp, 53-length(dfTemp$yValues))
LableDateMinMax<-tempLast53$xDateValues[13]
LableDateMedian<-tempLast53$xDateValues[20]
#Chrate a base table for charting
ChartData1<-join(dfTemp_Out, tempLast53, type="inner")
#make sure the chart table is sorted
ChartData1<- ChartData1[order(ChartData1$xDateValues),]
#find the max Date
MaxDate<- max(dfTemp$xDateValues)
maxYR<- max(year(dfTemp$xDateValues))
#min, Median, mean & max for hoizontal lines
MinVal<-min(dfTemp$yValues)
rMin<-max(which(dfTemp$yValues== MinVal, arr.ind = TRUE))
MinD<- dfTemp$xDateValues[rMin]
MaxVal<-max(dfTemp$yValues)
rMax<-max(which(dfTemp$yValues== MaxVal, arr.ind = TRUE))
MaxD<- dfTemp$xDateValues[rMax]
#Set the chart data
ChartData1_Plot <-ChartData1[,c("xDateValues","Week","yValues")]
ChartData1_Plot$Statistic<-paste("Past YR at ", MaxDate, sep="")
MedianVal<-median(dfTemp$yValues)
MeanVal<-mean(dfTemp$yValues)
stDev<- sd(dfTemp$yValues)
#ribbon to show one st. Dev.
Ribbon<-data.frame(ChartData1[,c("xDateValues")])
Ribbon$Lower<-MeanVal-stDev
Ribbon$Higher<-MeanVal+stDev
colnames(Ribbon)<-c("xDateValues", "Lower", "Higher")
Ribbon$mean<-ChartData1$mean
#Set the seasons for charting
#Spring March 20, 2013
dSpring <- as.Date(paste("03/20/",maxYR, sep=""), "%m/%d/%Y")
if (MaxDate<=dSpring) {
dSpring <- as.Date(paste("03/20/",maxYR-1, sep=""), "%m/%d/%Y")
}
#summer June 21, 2013
dSummer<-as.Date(paste("06/21/",maxYR, sep=""), "%m/%d/%Y")
if (MaxDate<=dSummer) {
dSummer<- as.Date(paste("06/21/",maxYR-1, sep=""), "%m/%d/%Y")
}
#Autumn September 22, 2013
dAutumn<-as.Date(paste("09/23/",maxYR, sep=""), "%m/%d/%Y")
if (MaxDate<=dAutumn) {
dAutumn<- as.Date(paste("09/23/",maxYR-1, sep=""), "%m/%d/%Y")
}
# winter December 21, 2013
dWinter<-as.Date(paste("12/21/",maxYR, sep=""), "%m/%d/%Y")
if (MaxDate<=dWinter) {
dWinter<- as.Date(paste("12/21/",maxYR-1, sep=""), "%m/%d/%Y")
}
ChartData_Plot <- ChartData1_Plot
p1<-ggplot(ChartData_Plot, aes(x=xDateValues,y=yValues))+
geom_line(aes(group=Statistic, colour=Statistic))+
scale_color_manual(values=c("black"))+
geom_ribbon(data=Ribbon, aes(group = 1, y=mean, x=xDateValues, ymin=Lower, ymax=Higher), alpha=0.1, fill="blue")+
geom_hline(aes(yintercept=MinVal), color="red", linetype="dashed")+
geom_hline(aes(yintercept=MaxVal), color="red", linetype="dashed")+
annotate(geom="text", x = LableDateMinMax, y = MinVal-MaxVal/90, label = paste("Min as at ", MinD, sep=""), colour = "red", size = 4)+
annotate(geom="text", x = LableDateMinMax, y = MaxVal+MaxVal/40, label = paste("Max as at ", MaxD, sep=""), colour = "red", size = 4)+
geom_hline(aes(yintercept=MedianVal), color="darkgreen", linetype="dashed")+
geom_hline(aes(yintercept=MeanVal), color="blue", linetype="dashed")+
annotate(geom="text", x = LableDateMinMax, y = MeanVal+MaxVal/40, label = paste("Mean", sep=""), colour = "blue", size = 4)+
annotate(geom="text", x = LableDateMedian, y = MedianVal+MaxVal/40, label = paste("Median", sep=""), colour = "darkgreen", size = 4)+
theme(legend.position="bottom")+
geom_vline(xintercept = as.numeric(dSpring),colour="darkgrey")+
geom_vline(xintercept = as.numeric(dSummer),colour="darkgrey")+
geom_vline(xintercept = as.numeric(dAutumn),colour="darkgrey")+
geom_vline(xintercept = as.numeric(dWinter),colour="darkgrey")+
annotate(geom="text", x = c(dWinter+45, dSpring+45, dSummer+45, dAutumn+45), y = MaxVal+MaxVal/10, label = c("Winter",
"Spring", "Summer", "Autumn"), colour = "darkgrey",
size = 4)+
ggtitle(bquote(atop(.(titleChart), atop(italic(.(subtitleChart)), ""))))+
labs(x = "")+
scale_x_date(breaks="4 weeks",labels = date_format("%b-%Y"))+
scale_y_continuous(labels = comma)+expand_limits(y = 0)+
theme(axis.text.x = element_text(size=10,angle=45,colour="black",vjust=1,hjust=1))+
labs(y = yAxisTitle)+
theme(legend.position="none")
Footnote<-"Note: Shaded area represents one standard deviation from the mean"
#p1<-arrangeGrob(p1, sub = textGrob(Footnote, x = 0, hjust = -0.1, vjust=0.1, gp = gpar(fontface = "italic", fontsize = 10)))
return(p1)
}
I have figured out the problem. I have not used environment before but it was a very simple fix.
To get the function to work:
.e <- environment() #at the start of the function
...
ggplot(ChartData_Plot, aes(x=xDateValues,y=yValues), environment = .e)
Local Variables Within aes
For a discussions on environments see:
http://www.r-bloggers.com/environments-in-r/
http://adv-r.had.co.nz/Environments.html
The function geom_hline is looking for MinVal inside the ChartData_Plot data frame, where it does not exist. Add data = NULL to the geom_hline functions and it should be able to find the value.