I want to generate multiple forest plots using the forestplot() function in R (from the package R/forestplot), and want to ensure that I can line up the text and graph sections in each so they can be usefully shown as a stacked plot, like this:
(taken from R forestplot package blank lines with section headings)
but with the possibility that there may be different scales in each subplot, lining up the zero effect line in each plot.
Which attributes need changing in the forestplot() call to ensure that this occurs?
EDIT: to provide a minimum code example
library(forestplot)
library(tidyr)
cohort <- data.frame(Age = c(43, 39, 34, 55, 70, 59, 44, 83, 76, 44,
75, 60, 62, 50, 44, 40, 41, 42, 37, 35, 55, 46),
Status = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L), levels = c("-", "+"), class = "factor"),
Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("1","2"), class = "factor"))
age.lm <- lm(Age ~ Group, data = cohort)
status.lm <- glm(Status ~ Group, data = cohort, family=binomial(link=logit))
age.data <- summary(age.lm)$coefficients[2,]
status.data <- summary(status.lm)$coefficients[2,]
age.data <- rbind(c(0,0,0,1,"Group 1", "n=15"),
c(age.data[1], age.data[1]-age.data[2]*1.95, age.data[1]+age.data[2]*1.95, age.data[4], "Group 2", "n=7"))
status.data <- rbind(c(0,0,0,1,"Group 1", "[+13,-2]"),
c(status.data[1], status.data[1]-status.data[2]*1.95, status.data[1]+status.data[2]*1.95, status.data[4], "Group 2", "[+2,-5]"))
colnames(age.data) <- c("mean","lower","upper","p-val","labeltext","numbers")
colnames(status.data) <- c("mean","lower","upper","p-val","labeltext","numbers")
age.data <- data.frame(age.data)
status.data <- data.frame(status.data)
age.data$mean <- as.numeric(age.data$mean)
age.data$lower <- as.numeric(age.data$lower)
age.data$upper <- as.numeric(age.data$upper)
status.data$mean <- exp(as.numeric(status.data$mean))
status.data$lower <- exp(as.numeric(status.data$lower))
status.data$upper <- exp(as.numeric(status.data$upper))
age.plot <- forestplot(age.data,
labeltext = c(labeltext,numbers),
boxsize = 0.1,
xlog = FALSE,
clip=c(-20,20),
xticks=c(-20,-10,0,10,20),
txt_gp = fpTxtGp(ticks=gpar(cex=1)),
align=c("l","c","l"))
status.plot <- forestplot(status.data,
labeltext = c(labeltext,numbers),
boxsize = 0.1,
xlog = TRUE,
clip=c(1/100,100),
xticks=c(log(1e-2),log(1e-1),0,log(1e1),log(1e2)),
txt_gp = fpTxtGp(ticks=gpar(cex=1)),
align=c("l","c","l"))
Note that the age plot is a linear model and the status plot is a logistic model:
I want to be able to arrange the relative sizes of the text to the left and the plot to the right in order that the zero-effect lines (at 0 and at 1 respectively) line up so that the forest plots stack cleanly.
With the align argument you could left "l" align the parts of your plot, so the text can be left aligned. If you want to align your zero-effect lines you could use mar and play with the units to adjust one of the graphs. Here is a reproducible example:
library(forestplot)
library(tidyr)
age.plot <- forestplot(age.data,
labeltext = c(labeltext,numbers),
boxsize = 0.1,
xlog = FALSE,
clip=c(-20,20),
xticks=c(-20,-10,0,10,20),
txt_gp = fpTxtGp(ticks=gpar(cex=1)),
align=c("l","l","l")
)
status.plot <- forestplot(status.data,
labeltext = c(labeltext,numbers),
boxsize = 0.1,
xlog = TRUE,
clip=c(1/100,100),
xticks=c(log(1e-2),log(1e-1),0,log(1e1),log(1e2)),
txt_gp = fpTxtGp(ticks=gpar(cex=1)),
align=c("l","l","l"),
mar = unit(c(0,5,0,10.5), "mm")
)
library(grid)
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
pushViewport(viewport(layout.pos.row = 1))
plot(age.plot)
popViewport()
pushViewport(viewport(layout.pos.row = 2))
plot(status.plot)
popViewport(2)
Created on 2022-12-28 with reprex v2.0.2
Related
I am trying to animate a line graph with multiple lines. It seems that there is an error with the gganimate package involving transition_reveal() that is causing the final frame to revert for all of the lines but one. This error is not present when not using gganimate. Here is the code:
df <- read.csv("test.csv", stringsAsFactors = TRUE)
anim <- ggplot(df, aes(Day, Accidents, group = State, color = State)) +
geom_line() +
transition_reveal(Day) +
ease_aes('cubic-in-out')
jiff <- animate(anim, fps = 24, duration = 5, start_pause = 0, end_pause = 72, height = 4, width = 7, units = "in", res = 150)
jiff
Here is the dput of the dataframe:
structure(list(State = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L), levels = c("A", "B", "C", "D"), class = "factor"),
Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
Accidents = c(5L, 2L, 5L, 6L, 1L, 2L, 6L, 8L, 4L, 10L, 2L,
4L)), class = "data.frame", row.names = c(NA, -12L))
Here is the output:
Regardless of the ending pause or how many values I have along the x-axis, the final frame will always look like this with only one line appearing as updated. Does anyone know why this might be happening?
UPDATE: Reverting the gganimate package from 1.0.8 to 1.0.7 did seem to do the trick after all.
The issue is in this line start_pause = 0, end_pause = 72,. Remove or adapt it:
anim <- ggplot(df, aes(Day, Accidents, group= State, color = State)) +
geom_line() +
transition_reveal(Day) +
ease_aes('cubic-in-out')
animate(anim, fps = 24, duration = 5,
height = 4, width = 7, units = "in", res = 150)
I looked at other threads where this question is answered,but i couldn't adapt the code. I plot the following graph below. I try then to order from lowest to highest according to the blue color (education==3) when time is at 0. I use the following code to create the order.
country_order <- df %>%
filter(education == 3 & time==0) %>%
arrange(unemployment) %>%
ungroup() %>%
mutate(order = row_number())
However, i am not sure how to introduce the new variable order into ggplot to get the ordering i want. Could someone help?
Here is the plot
ggplot(df, aes(y=unemployment, x=time, fill= education)) +
geom_col(, color = "black") +
facet_wrap(~ country)
Here is the data:
df= structure(list(time = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
unemployment = structure(c(25, 35, 40, 10, 20, 70, 20, 25,
55, 23, 17, 60), format.stata = "%9.0g"), education = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), country = structure(c(1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2), format.stata = "%9.0g")), row.names = c(NA,
-12L), class = "data.frame")
I think you can use fct_reorder() to reorder factor levels of the desired variable by sorting along another variable.
df %>%
ggplot(aes(y=unemployment, x=time, fill= fct_reorder(education, unemployment, .desc = T))) +
geom_col(, color = "black") +
facet_wrap(~ country)
I am trying to do a stacked bar plot based on count, but with the labels showing the percentage on the plot. I have produced the plot below. However the percentage is based on all of the data. What I am after is the percentage by team (such that the sum of the percentages for Australia = 100% and the percentages for England = 100%).
The code for achieving this is the following function. This function counts the number of different roles in each team across 5 matches (I have had to divide the result by 10 as a players role appears twice for each match (5 matches x 2 appearances):
team_roles_Q51 <- function(){
ashes_df <- tidy_data()
graph <- ggplot(ashes_df %>%
count(team, role) %>% #Groups by team and role
mutate(pct=n/sum(n)), #Calculates % for each role
aes(team, n, fill=role)) +
geom_bar(stat="identity") +
scale_y_continuous(labels=function(x)x/10) + #Needs to be a better way than dividing by 10
ylab("Number of Participants") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")),
position=position_stack(vjust=0.5)) +
ggtitle("England & Australia Team Make Up") +
theme_bw()
print(graph)
}
An example of the dataframe that is imported is:
Structure for the first 10 rows of the dataframe as follows:
structure(list(batter = c("Ali", "Anderson", "Bairstow", "Ball",
"Bancroft", "Bird", "Broad", "Cook", "Crane", "Cummins"), team = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("Australia",
"England"), class = "factor"), role = structure(c(1L, 3L, 4L,
3L, 2L, 3L, 3L, 2L, 3L, 3L), .Label = c("allrounder", "batsman",
"bowler", "wicketkeeper"), class = "factor"), innings = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("test_1_innings_1",
"test_1_innings_2", "test_2_innings_1", "test_2_innings_2", "test_3_innings_1",
"test_3_innings_2", "test_4_innings_1", "test_4_innings_2", "test_5_innings_1",
"test_5_innings_2"), class = "factor"), batting_num = c(6, 11,
7, 10, 1, NA, 9, 1, NA, 9), score = c(38, 5, 9, 14, 5, NA, 20,
2, NA, 42), balls_faced = c(102, 9, 24, 11, 19, NA, 32, 10, NA,
120)), row.names = c(NA, 10L), class = "data.frame")
Any help would be appreciated. Thanks
You need to group_by team to calculate the proportion and use pct in aes :
library(dplyr)
library(ggplot2)
ashes_df %>%
count(team, role) %>%
group_by(team) %>%
mutate(pct= prop.table(n) * 100) %>%
ggplot() + aes(team, pct, fill=role) +
geom_bar(stat="identity") +
ylab("Number of Participants") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct),"%")),
position=position_stack(vjust=0.5)) +
ggtitle("England & Australia Team Make Up") +
theme_bw()
I'm having some trouble setting readable tick marks on my axes. The problem is that my data are at different magnitudes, so I'm not really sure how to go about it.
My data include ~400 different products, with 3/4 variables each, from two machines. I've pre-processed it into a data.table and used gather to convert it to long form- that part is fine.
Overview: Data is discrete, each X_________ on the x-axis represents a separate reading, and its relative values from machine 1/2 - the idea is to compare the two. The graphical format is perfect for my needs, I would just like to set the ticks at say, every 10 products on the x-axes, and at reasonable values on the y-axis.
Y_1: from 150 to 250
Y_2: from say, 1.5* to 2.5
Y_3: from say, 0.8* to 2.3
Y_4: from say, 0.4* to 1.5
*Bottom value, rounded down
Here's the code I'm using so far
var.Parameter <- c("Var1", "Var2", "Var3", "Var4")
MProduct$Parameter <- factor(MProduct$Parameter,
labels = var.Parameter)
labels_x <- MProduct$Lot[seq(0, 1626, by= 20)]
labels_y <- MProduct$Value[seq(0, 1626, by= 15)]
plot.MProduct <- ggplot(MProduct, aes(x = Lot,
y = Value,
colour = V4)) +
facet_grid(Parameter ~.,
scales = "free_y") +
scale_x_discrete(breaks=labels_x) +
scale_y_discrete(breaks=labels_y) +
geom_point() +
labs(title = "Product: Select Trends | 2018",
x = "Time (s)",
y = "Value") +
theme(axis.text.x = element_text (angle = 90,
hjust = 1,
vjust = 0.5))
# ggsave("MProduct.png")
plot.MProduct
Anyone knows how to possibly render this graph more readable? Setting labels/breaks manually greatly limits flexibility and readability - there should be an option to set it to every X ticks, right? Same with y.
I need to apply this as a function to multiple datasets, so I'm not very happy about having to specify the column length of the "gathered" dataset every time either, which, in this case is 1626.
Since I'm here, I would also like to take the opportunity to ask about this code:
var.Parameter <- c("Var1", "Var2", "Var3", "Var4")
More often than not, I need to label my data in a specific order, which is not necessarily alphabetical. R, however, defaults to some kind of odd behaviour whereupon I have to plot and verify that the labels are indeed where they should be. Any clue how I could force them to be presented in order? As it is, my solution is to keep shifting their position in that line of code until it produces the graph correctly.
Many thanks.
Okay. I'm going to ignore the y axis labels because the defaults seem to work just fine as long as you don't try to overwrite them with your custom labels_y thing. Just let the defaults do their work. For the X axis, we'll give a couple options:
(A) label every N products on X-axis. Looking at ?scale_x_discrete, we can set the labels to a function that takes all the level of the factor and returns the labels we want. So we'll write a functional that returns a function that returns every Nth label:
every_n_labeler = function(n = 3) {
function (x) {
ind = ((1:length(x)) - 1) %% n == 0
x[!ind] = ""
return(x)
}
}
Now let's use that as the labeler:
ggplot(df, aes(x = Lot,
y = Value,
colour = Machine)) +
facet_grid(Parameter ~ .,
scales = "free_y") +
geom_point() +
scale_x_discrete(labels = every_n_labeler(3)) +
labs(title = "Product: Select Trends | 2018",
x = "Time (s)",
y = "Value") +
theme(axis.text.x = element_text (
angle = 90,
hjust = 1,
vjust = 0.5
))
You can change the every_n_labeler(3) to (10) to make it every 10th label.
(B) Maybe more appropriate, it seems like your x-axis is actually numeric, it just happens to have "X" in front of it, let's convert it to numeric and let the defaults do the labeling work:
df$time = as.numeric(gsub(pattern = "X", replacement = "", x = df$Lot))
ggplot(df, aes(x = time,
y = Value,
colour = Machine)) +
facet_grid(Parameter ~ .,
scales = "free_y") +
geom_point() +
labs(title = "Product: Select Trends | 2018",
x = "Time (s)",
y = "Value") +
theme(axis.text.x = element_text (
angle = 90,
hjust = 1,
vjust = 0.5
))
With your full x range, I imagine that would look nice.
(C) But who wants to read those 9-digit numbers? You're labeling the x-axis a "Time (s)", which makes me think it's actual a time, measured in seconds from some start time. I'll make up that your start time is 2010-01-01 and covert these seconds to actual times, and then we get a nice date-time scale:
ggplot(df_s, aes(x = as.POSIXct(time, origin = "2010-01-01"),
y = Value,
colour = Machine)) +
facet_grid(Parameter ~ .,
scales = "free_y") +
geom_point() +
labs(title = "Product: Select Trends | 2018",
x = "Time (s)",
y = "Value") +
theme(axis.text.x = element_text (
angle = 90,
hjust = 1,
vjust = 0.5
))
If this is the real meaning behind your data, then using a date-time axis is a big step up for readability. (Again, notice that we are not specifying the breaks, the defaults work quite well.)
Using this data (I subset your sample data down to 2 facets and used dput to make it copy/pasteable):
df = structure(list(Lot = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L), .Label = c("X180106482", "X180126485", "X180306523",
"X180526326"), class = "factor"), Value = c(201, 156, 253, 211,
178, 202.5, 203.4, 204.3, 205.2, 2.02, 2.17, 1.23, 1.28, 1.54,
1.28, 1.45, 1.61, 2.35, 1.34, 1.36, 1.67, 2.01, 2.06, 2.07, 2.19,
1.44, 2.19), Parameter = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Var 1", "Var 2", "Var 3", "Var 4"
), class = "factor"), Machine = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Machine 1", "Machine 2"), class = "factor"),
time = c(180106482, 180126485, 180306523, 180526326, 180106482,
180126485, 180306523, 180526326, 180106482, 180106482, 180126485,
180306523, 180526326, 180106482, 180126485, 180306523, 180526326,
180106482, 180106482, 180126485, 180306523, 180526326, 180106482,
180126485, 180306523, 180526326, 180106482)), row.names = c(NA,
-27L), class = "data.frame")
I am creating a barplot in ggplot2 3 which includes facet_grid and position_dodge2(preserve="single") (= same bar width in all facets) as well as geom_text for labeling. It works all fine except when I change the width of the bars with width, e.g to 1.2 (otherwise the bars are rather slim).
Two problems occur:
the labels of geom_text don't align any longer with the bars;
the bars aren't centered on the x axis as they should.
Any solution to this? A workaround with hjust doesn't seem to work since labels are not evenly misaligned when changing width. Or am I getting something wrong regarding the purpose of width ?
This seems related to my question.
Data:
x <- structure(list(SessionLastStage = structure(1:20, .Label = c("1998-1999",
"1999-2000", "2000-2001", "2001-2002", "2002-2003", "2003-2004",
"2004-2005", "2005-2006", "2006-2007", "2007-2008", "2008-2009",
"2009-2010", "2010-2011", "2011-2012", "2012-2013", "2013-2014",
"2014-2015", "2015-2016", "2016-2017", "2017-2018"), class = "factor"),
freq = c(0, 2, 18, 8, 6, 0, 0, 0, 2, 14, 8, 16, 30, 4, 12,
10, 11, 30, 1, 0), Phase = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L
), .Label = c("Introduction", "Maintenance", "Deconsolidation"
), class = "factor")), class = "data.frame", row.names = c(NA,
-20L), .Names = c("SessionLastStage", "freq", "Phase"))
plot command:
x %>%
ggplot()+
geom_bar(aes(x=SessionLastStage, y=freq),
stat="identity",
width=1.2,
position = position_dodge2(preserve="single"))+
geom_text(data=x %>% filter(freq>0),
aes(x=SessionLastStage, y=freq+1, label=freq))+
facet_grid(.~Phase,
scales="free_x",
space = "free_x")+
theme_minimal()+
theme(axis.text=element_text(angle=90))
Output: