Using geom_pointrange() to plot means and standard errors - r

I have three groups (categorical variable) who completed a test and a dataframe with the mean and standard error on the test per group. I would like to plot their means as a single point in the plot accompanied by a short horizontal line indicating the standard error (i.e., error bars). I'm using R with ggplot2.
My x-axis represents all the possible scores in the test (from -218 to 218) and the groups are plotted on the y-axis (I used coord_flip() for this).
I was able to create the graph but the standard error lines don't show up, so I don't know what I'm doing wrong. I think it has to do with my use of geom_pointrange(), but I have no idea what I'm supposed to change.
This is my code:
ggplot(descriptive_blp_data) +
aes(x = group, y = mean_blp, colour = group, size = 5) +
geom_pointrange(aes(ymin = mean_blp - se_blp, ymax = mean_blp + se_blp), width=.2,
position=position_dodge(.9)) +
scale_color_manual(
values = list(
Group_2 = "#9EBCDA",
Group_3 = "#8856A7",
Group_1 = "#E0ECF4"
)
) +
labs(y = "Mean BLP score (SE)") +
coord_flip() +
theme_classic() +
theme(legend.position = "none", axis.title.y = element_blank()) +
ylim(-218, 218)
And this is my graph so far:

It is easier to check this, if you can provide the actual dataframe descriptive_blp_data. Running your code with some arbitrary dataset does work as intended and produces error bars, so there is nothing really wrong with the ggplot part.
There may be a few reasons why this does not work with your actual dataset - maybe the standard errors are too small to show up with a point size of 5?
descriptive_blp_data <- data.frame(
"group" = c("Group_3", "Group_2", "Group_1"),
"mean_blp" = c(150, 50, -50),
"se_blp" = c(40, 20, 30)
)
library(ggplot2)
ggplot(descriptive_blp_data) +
aes(x = group, y = mean_blp, colour = group, size = 5) +
geom_pointrange(aes(ymin = mean_blp - se_blp, ymax = mean_blp + se_blp), width=.2,
position=position_dodge(.9)) +
scale_color_manual(
values = list(
Group_2 = "#9EBCDA",
Group_3 = "#8856A7",
Group_1 = "#E0ECF4"
)
) +
labs(y = "Mean BLP score (SE)") +
coord_flip() +
theme_classic() +
theme(legend.position = "none", axis.title.y = element_blank()) +
ylim(-218, 218)

Related

Geom_errorbar adding multiple error bars to bar plot (ggplot)

I'm trying to create a ggplot based on the means of various groups in my data. The means were found using aggregate;
f1 <- function(x) c(Mean = mean(x), std_error=std.error(x)) #create function for mean and standard error coverboard_stat<-aggregate(no_count ~ EEM.or.NCOS + Species+Period..Oct.Sep.+Habitat, data = coverboard, f1)
so it now looks like this (as an example; the real data-set is much larger)
Group
no_count[,"Mean"
no_count[,"std_error"
Type 1
1
.05
Type 2
2
.75
this is my barplot code:
ggplot(aes(x = Species, y = no_count[,"Mean"]), data = post_rest) +
geom_bar(aes(fill=EEM.or.NCOS), stat = "identity", position = "dodge") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.9, hjust=1)) +
labs(fill="Site") +
scale_fill_brewer(palette = "YlOrRd", labels=c("South Parcel", "North Campus Open Space")) +
labs(x="", y="Encounter rate") +
theme(legend.position="bottom")+
geom_errorbar(aes(ymin=no_count[,"Mean"]-no_count[,"std_error"],
ymax=no_count[,"Mean"]+no_count[,"std_error"],
fill=EEM.or.NCOS),
width=.2,
position=position_dodge(.9))
And this is the result I get:
Obviously this isn't correct; I just want one error bar for each bar.
I tried formatting the (aes) differently and using summary instead of aggregate (which lead me down a whole different wormhole of errors).

Why is fullrange=TRUE not working for geom_smooth in ggplot2?

I have a plot where I am plotting both the linear regressions for each level of a variable as well as the linear regression for the total sample.
library(ggplot2);library(curl)
df<-read.csv(curl("https://raw.githubusercontent.com/megaraptor1/mydata/main/example.csv"))df$group<-as.factor(df$group)
ggplot(df,aes(x,y))+
geom_point(size=2.5,shape=21,aes(fill=group),col="black")+
geom_smooth(formula=y~x,aes(col=group,group=group),method="lm",size=1,se=F)+
geom_smooth(formula=y~x,method="lm",col="black",size=1,fullrange=T,se=F)+
theme_classic()+
theme(legend.position = "none")
I am trying to extend the black line (which represents all specimens) to span the full range of the axes using the command fullrange=T. However, I have found the command fullrange=T is not working on this graph regardless of what I try. This is especially strange as I have not called any limits for the graph or set any additional global factors.
This question was the closest I was able to find to my current problem, but it does not appear to be describing the same issue because that issue had to do with how the limits of the graph were called.
This seems a bit heavy handed but allows you to extent your regression line to whatever limits you choose for the x axis.
The argument fullrange is not really documented very helpfully. If you have a look at http://www.mosaic-web.org/ggformula/reference/gf_smooth.html it appears that "fullrange" applies to the points in the dataframe that is used to generate the regression line. So in your case your regression line is extending to the "fullrange". It's just that your definition of "fullrange" is not quite the same as that used by geom_smooth.
library(ggplot2)
library(dplyr)
library(curl)
lm_formula <- lm(formula = y~x, data = df)
f_lm <- function(x){lm_formula$coefficients[1] + lm_formula$coefficients[2] * x}
df_lim <-
data.frame(x = c(0, 5)) %>%
mutate(y = f_lm(x))
ggplot(df,aes(x,y))+
geom_point(size=2.5,shape=21,aes(fill=group),col="black")+
geom_smooth(formula=y~x,aes(col=group,group=group),method="lm",size=1,se=F)+
geom_line(data = df_lim)+
coord_cartesian(xlim = df_lim$x, ylim = df_lim$y, expand = expansion(mult = 0))+
theme_classic()+
theme(legend.position = "none")
data
df<-read.csv(curl("https://raw.githubusercontent.com/megaraptor1/mydata/main/example.csv"))
df$group<-as.factor(df$group)
Created on 2021-04-05 by the reprex package (v1.0.0)
I had the same issue. Despite setting fullrange = TRUE, the line of best fit was only being drawn in the data range.
ggplot(data = df, aes(x = diameter, y = height)) +
geom_point(size = 2) +
geom_smooth(method = lm, se = FALSE, fullrange = TRUE) +
labs(x = "Diameter", y = "Height", title = "Tree Height vs. Diameter") +
theme(plot.title = element_text(hjust = 0.5, size = 15, face = 'bold'))
Bad plot: 1
Using scale_x_continuous() and scale_y_continuous() worked for me (thank you #markus). I added two lines of code, below geom_smooth(), to fix the issue.
ggplot(data = df, aes(x = diameter, y = height)) +
geom_point(size = 2) +
geom_smooth(method = lm, se = FALSE, fullrange = TRUE) +
scale_x_continuous(expand = c(0,0), limits=c(5, 32)) + #expand = c(num1,num2) => line of best fit stops being drawn at x = 32 + (32 - 5)*num1 + num2 = 32 + (32 - 5)*0 + 0 = 32
scale_y_continuous(expand = c(0,0), limits=c(7, 25)) + #expand = c(num1,num2) => line of best fit stops being drawn at y = 25 + (25 - 7)*num1 + num2 = 25 + (25 - 7)*0 + 0 = 25
labs(x = "Diameter", y = "Height", title = "Tree Height vs. Diameter") +
theme(plot.title = element_text(hjust = 0.5, size = 15, face = 'bold'))
Good plot: 2
Source: How does ggplot scale_continuous expand argument work?

How do I represent percent of a variable in a filled barplot?

I have a data frame(t1) and I want to illustrate the shares of companies in relation to their size
I added a Dummy variable in order to make a filled barplot and not 3:
t1$row <- 1
The size of companies are separated in medium, small and micro:
f_size <- factor(t1$size,
ordered = TRUE,
levels = c("medium", "small", "micro"))
The plot is build up with the economic_theme:
ggplot(t1, aes(x = "Size", y = prop.table(row), fill = f_size)) +
geom_col() +
geom_text(aes(label = as.numeric(f_size)),
position = position_stack(vjust = 0.5)) +
theme_economist(base_size = 14) +
scale_fill_economist() +
theme(legend.position = "right",
legend.title = element_blank()) +
theme(axis.title.y = element_text(margin = margin(r = 20))) +
ylab("Percentage") +
xlab(NULL)
How can I modify my code to get the share for medium, small and micro in the middle of the three filled parts in the barplot?
Thanks in advance!
Your question isn't quite clear to me and I suggest you re-phrase it for clarity. But I believe you're trying to get the annotations to be accurately aligned on the Y-axis. For this use, pre-calculate the labels and then use annotate
library(data.table)
library(ggplot2)
set.seed(3432)
df <- data.table(
cat= sample(LETTERS[1:3], 1000, replace = TRUE)
, x= rpois(1000, lambda = 5)
)
tmp <- df[, .(pct= sum(x) / sum(df[,x])), cat][, cumsum := cumsum(pct)]
ggplot(tmp, aes(x= 'size', y= pct, fill= cat)) + geom_bar(stat='identity') +
annotate('text', y= tmp[,cumsum] - 0.15, x= 1, label= as.character(tmp[,pct]))
But this is a poor decision graphically. Stacked bar charts, by definition sum to 100%. Rather than labeling the components with text, just let the graphic do this for you via the axis labels:
ggplot(tmp, aes(x= cat, y= pct, fill= cat)) + geom_bar(stat='identity') + coord_flip() +
scale_y_continuous(breaks= seq(0,1,.05))

Circular histogram in ggplot2 with even spacing of bars and no extra lines

I'm working on making a circular histogram in ggplot2 that shows how the number of calls varies over 24 hours. My dataset starts at 0 and goes to 23, with the number of calls per hour:
df = data.frame(xvar = 0:23,
y = c(468,520,459,256,397,241,117,120,45,100,231,398,340,276,151,134,157,203,308,493,537,462,448,383))
I'm using the following code to create the circular histogram:
ggplot(df, aes(xvar, y)) +
coord_polar(theta = "x", start = -.13, direction = 1) +
geom_bar(stat = "identity", fill = "maroon4", width = .9) +
geom_hline(yintercept = seq(0, 500, by = 100), color = "grey80", size = 0.3) +
scale_x_continuous(breaks = seq(0, 24), labels = seq(0, 24)) +
xlab("Hour") +
ylab("Number of Calls") +
ggtitle("Number of Calls per Hour") +
theme_bw()
I really like the resulting plot:
but I can't figure out how to get the same spacing between the 23 and 0 bars as is present for the other bars. Right now, those two bars are flush against one another and nothing I've tried so far will separate them. I'm also interested in removing the lines between the different hours (ex. the line between 21 and 22) since it's somewhat distracting and doesn't convey any information. Any advice would be much appreciated, particularly on spacing the 23 and 0 bars!
You can use the expand parameter of scale_x_continuous to adjust. Simplified a little,
ggplot(df, aes(x = xvar, y = y)) +
coord_polar(theta = "x", start = -.13) +
geom_bar(stat = "identity", fill = "maroon4", width = .9) +
geom_hline(yintercept = seq(0, 500, by = 100), color = "grey80", size = 0.3) +
scale_x_continuous(breaks = 0:24, expand = c(.002,0)) +
labs(x = "Hour", y = "Number of Calls", title = "Number of Calls per Hour") +
theme_bw()

How to plot 95 percentile and 5 percentile on ggplot2 plot with already calculated values?

I have this dataset and use this R code:
library(reshape2)
library(ggplot2)
library(RGraphics)
library(gridExtra)
long <- read.csv("long.csv")
ix <- 1:14
ggp2 <- ggplot(long, aes(x = id, y = value, fill = type)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = numbers), vjust=-0.5, position = position_dodge(0.9), size = 3, angle = 0) +
scale_x_continuous("Nodes", breaks = ix) +
scale_y_continuous("Throughput (Mbps)", limits = c(0,1060)) +
scale_fill_discrete(name="Legend",
labels=c("Inside Firewall (Dest)",
"Inside Firewall (Source)",
"Outside Firewall (Dest)",
"Outside Firewall (Source)")) +
theme_bw() +
theme(legend.position="right") +
theme(legend.title = element_text(colour="black", size=14, face="bold")) +
theme(legend.text = element_text(colour="black", size=12, face="bold")) +
facet_grid(type ~ .) +
plot(ggp2)
to get the following result:
Now I need to add the 95 percentile and 5 percentile to the plot. The numbers are calculated in this dataset (NFPnumbers (95 percentile) and FPnumbers (5 percentile) columns).
It seems boxplot() may work here but I am not sure how to use it with ggplot.
stat_quantile(quantiles = c(0.05,0.95)) could work as well, but the function calculates the numbers itself. Can I use my numbers here?
I also tried:
geom_line(aes(x = id, y = long$FPnumbers)) +
geom_line(aes(x = id, y = long$NFPnumbers))
but the result did not look good enough.
geom_boxplot() did not work as well:
geom_boxplot(aes(x = id, y = long$FPnumbers)) +
geom_boxplot(aes(x = id, y = long$NFPnumbers))
When you want to set the parameters for a boxplot, you also need ymin and ymax values. As they are not in the dataset, I calculated them.
ggplot(long, aes(x = factor(id), y = value, fill = type)) +
geom_boxplot(aes(lower = FPnumbers, middle = value, upper = NFPnumbers, ymin = FPnumbers*0.5, ymax = NFPnumbers*1.2, fill = type), stat = "identity") +
xlab("Nodes") +
ylab("Throughput (Mbps)") +
scale_fill_discrete(name="Legend",
labels=c("Inside Firewall (Dest)", "Inside Firewall (Source)",
"Outside Firewall (Dest)", "Outside Firewall (Source)")) +
theme_bw() +
theme(legend.position="right",
legend.title = element_text(colour="black", size=14, face="bold"),
legend.text = element_text(colour="black", size=12, face="bold")) +
facet_grid(type ~ .)
The result:
In the dataset you provided, you gave the value, FPnumbers & NFPnumbers variables. As FPnumbers & NFPnumbers represent the 5 and 95 percentiles, I suppose that the mean is represented by value. For this solution to work, you'll need min and max values for each "Node". I guess you have them somewhere in your raw data.
However, as they are not provided in the dataset, I made them up by calculating them based on FPnumbers & NFPnumbers. The multiplication factors of 0.5 and 1.2 are arbitrary. It is just a way of creating fictitious min and max values.
There are several suitable geoms for that, geom_errorbar is one of them:
ggp2 + geom_errorbar(aes(ymax = NFPnumbers, ymin = FPnumbers), alpha = 0.5, width = 0.5)
I don't know if there's a way to get rid of the central line though.

Resources