This seems like to simplest thing to do, but I have not been able to figure this out on R. For descriptive purposes, I want to create one bar graph that show the means and error plots of multiple questions/variables. My data is based on anonymous responses so there is no grouping variables.
Is there a way to do this on R? Below is an example of what my data looks like. I would like to plot mean and standard deviation of each variable next to each other in the same bar graph.
dat <- data.frame(satisfaction = c(1, 2, 3, 4),
engaged = c(2, 3, 4, 2),
relevant = c(4, 1, 3, 2),
recommend = c(4, 1, 3, 3))
What you could do is reshape the data into long format with reshape2 (or data.table or tidyr) without specifying an id-variable and using all columns as measure variables. After that you can create a plot with for example ggplot2. Using:
library(reshape2)
library(ggplot2)
# reshape into long format
dat2 <- melt(dat, measure.vars = 1:4) # or just: melt(dat)
# create the plot
ggplot(dat2, aes(x = variable, y = value)) +
stat_summary(geom = 'bar', fun.y = 'mean', width = 0.7, fill = 'grey') +
stat_summary(geom = 'errorbar', width = 0.2, size = 1.5) +
theme_minimal(base_size = 14) +
theme(axis.title = element_blank())
gives:
Update: As #GavinSimpson pointed out in his answer: for visualizing means and standard errors, a barplot is not the best alternative. As an alternative you could also use geom_pointrange:
ggplot(dat2, aes(x = variable, y = value)) +
stat_summary(geom = 'pointrange', fatten = 5, size = 1.2) +
theme_minimal(base_size = 14) +
theme(axis.title = element_blank())
which gives:
Whilst I know you asked for a barplot, a dotplot of the data is an alternative visualisation that focuses on the means and standard errors. If the drawing of a bar all the way to 0 is not that informative, the dotplot is a good alternative.
Reusing the objects and code from #Procrastinatus Maximus' answer we have:
ggplot(dat2, aes(x = variable, y = value)) +
stat_summary(geom = 'point', fun.y = 'mean', size = 2) +
stat_summary(geom = 'errorbar', width = 0.2) +
xlab(NULL) +
theme_bw()
which produces
Related
I want to show the x-axis labels and the form of the line clearly on this plot. It is a point plot with a lot of categories along the x-axis which makes the plot very wide and very hard to read the x-axis.
Would it be possible to fold the plot in half and display it on two panels, one above the other? How would I do that? I thought about hacking around with facet_wrap but this got ugly with the ordered points (as I wish to maintain the order of the x-axis based on the value).
Or are there better ways of showing this data? The position of the categories along the x-axis is of interest, as is the shape of the line formed by the points.
I generated the example plot using this code:
library(stringi)
example <- data.frame(
cat = do.call(paste0, Map(stri_rand_strings, n=150, length=c(25, 14, 13), pattern = c('[A-Z]', '[0-9]', '[A-Z]'))),
val = rnorm(150, mean = 20)
)
cat_ordered_by_val <- example$cat[order(example$val)]
example$cat = factor(example$cat, levels=cat_ordered_by_val)
ggplot(example, aes(y = val, x = cat)) +
geom_point() +
ylab("Value") + xlab("Category") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=5))
ggsave("~/Desktop/what_a_plot.jpg")
This puts points in one of the two facets in alternating ways. You can also do mutate(facet = row_number() < nrow(example) / 2) to put the first half of the points in one facet and the other half in the other facet:
library(tidyverse)
example <- data.frame(
cat = do.call(paste0, Map(stri_rand_strings, n = 150, length = c(25, 14, 13), pattern = c("[A-Z]", "[0-9]", "[A-Z]"))),
val = rnorm(150, mean = 20)
)
cat_ordered_by_val <- example$cat[order(example$val)]
example$cat <- factor(example$cat, levels = cat_ordered_by_val)
example %>%
arrange(cat) %>%
mutate(facet = row_number() %% 2) %>%
ggplot(aes(y = val, x = cat)) +
geom_point() +
ylab("Value") +
xlab("Category") +
theme_bw() +
facet_wrap(~facet, ncol = 1, scales = "free") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1, size = 5))
I am trying to create a bar graph that plots rank, where lower values are better. I want larger bars to correspond to smaller values, so the "best" groups in the data receive more visual weight.
Reprex:
dat = data.frame("Group" = c(rep("Best",50),
rep("Middle",50),
rep("Worst",50)
),
"Rank" = c(rnorm(n = 50, mean = 1.5, sd = 0.5),
rnorm(n = 50, mean = 2.5, sd = 0.5),
rnorm(n = 50, mean = 3.5, sd = 0.5)
)
)
tibdat = as_tibble(dat) %>%
group_by(Group) %>%
summarise(Mean_Rank = mean(Rank,na.rm=T))
# creates simple rightside up bar graph
ggplot(data = tibdat, mapping = aes(Group, Mean_Rank, fill = Group)) +
geom_col() +
scale_y_continuous(breaks = c(1:4), limits = c(1,4), oob = scales::squish)
# my attempt below, simply reversing the breaks and limits
ggplot(data = tibdat, mapping = aes(Group, Mean_Rank, fill = Group)) +
geom_col() +
scale_y_continuous(breaks = c(4:1), limits = c(4,1), oob = scales::squish)
The graphing code at the end does succeed in flipping the axis, but the data disappears (the bars are not plotted).
Note that I do not want the graphs to originate from the top, which scale_y_reverse can achieve. I want the bars to originate from the bottom, at the y = 4 line (or below).
How is this achieved?
Edit: Added image below to show the original bar graph that works but is wrong.
I just transformed the labels. I don't know if that's what you searched.
ggplot(data = tibdat, mapping = aes(Group, Mean_Rank, fill = Group)) +
geom_col() +
scale_y_continuous(breaks = c(1:4), limits = c(1,4), oob = scales::squish, labels = function(x) 5 - x)
With another trick in the aes argument I think you can arrive to the wanted result. Maybe someone better than me knows a clean way to do it.
ggplot(data = tibdat, mapping = aes(Group, 5 - Mean_Rank, fill = Group)) +
geom_col() +
scale_y_continuous(breaks = c(1:4), limits = c(1,4), oob = scales::squish, labels = function(x) 5 - x)
And here is the result :
I have created one monthly plot with facet_wrap
.
So in the plot I have 3 rows and 4 columns. Now I want to set my common y axis for each rows e.g 1st row should have one common y values, same goes with the 2nd and 3rd rows.
I tried but not able to do it.
I used
ggplot(data = PB,
aes(x = new_date, y = Mean, group = 1)) +
geom_line(aes(color = experiment)) +
theme(legend.title = element_blank()) +
facet_wrap( ~MonthAbb, ncol = 4, scales = "free")
The issue is the scales = "free". Remove this and it will set a common scale across rows and columns (or use "free_y" or "free_x" to adjust accordingly).
If what you're looking for is a separate scale for each row, it will require a bit more work. Check this solution at R: How do I use coord_cartesian on facet_grid with free-ranging axis which layers invisible points on the plot to force the look you want. Otherwise a simple solution might to look at using gridExtra and plot each row separately, then merge into a grid.
Edit: a gridExtra solution would look something like:
library(gridExtra)
g1 <- ggplot(data = PB1, aes(x=new_date, y = Mean, group = 1)) +
geom_line(aes(color = experiment)) +
theme(legend.title = element_blank())
g2 <- ggplot(data = PB2, aes(x=new_date, y = Mean, group = 1)) +
geom_line(aes(color = experiment)) +
theme(legend.title = element_blank())
grid.arrange(g1, g2, nrow=2)
Here is an option to set these on a per-panel basis. It is based on a function I've put in a github package. I'm using some dummy data as example.
library(ggplot2)
library(ggh4x)
df <- data.frame(
x = rep(1:20, 9),
y = c(cumsum(rnorm(60)) + 90,
cumsum(rnorm(60)) - 90,
cumsum(rnorm(60))),
row = rep(LETTERS[1:3], each = 60),
col = rep(LETTERS[1:3], each = 20)
)
ggplot(df, aes(x, y)) +
geom_line() +
facet_wrap(row ~ col, scales = "free_y") +
facetted_pos_scales(
y = rep(list(
scale_y_continuous(limits = c(90, 100)),
scale_y_continuous(limits = c(-100, -80)),
scale_y_continuous(limits = c(0, 20))
), each = 3)
)
I want to separately plot data in a bubble plot like the image right (I make this in PowerPoint just to visualize).
At the moment I can only create a plot that looks like in the left where the bubble are overlapping. How can I do this in R?
b <- ggplot(df, aes(x = Year, y = Type))
b + geom_point(aes(color = Spp, size = value), alpha = 0.6) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(0.5, 12))
You can have the use of position_dodge() argument in your geom_point. If you apply it directly on your code, it will position points in an horizontal manner, so the idea is to switch your x and y variables and use coord_flip to get it in the right way:
library(ggplot2)
ggplot(df, aes(y = as.factor(Year), x = Type))+
geom_point(aes(color = Group, size = Value), alpha = 0.6, position = position_dodge(0.9)) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(1, 15)) +
coord_flip()
Does it look what you are trying to achieve ?
EDIT: Adding text in the middle of each points
To add labeling into each point, you can use geom_text and set the same position_dodge2 argument than for geom_point.
NB: I use position_dodge2 instead of position_dodge and slightly change values of width because I found position_dodge2 more adapted to this case.
library(ggplot2)
ggplot(df, aes(y = as.factor(Year), x = Type))+
geom_point(aes(color = Group, size = Value), alpha = 0.6,
position = position_dodge2(width = 1)) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(3, 15)) +
coord_flip()+
geom_text(aes(label = Value, group = Group),
position = position_dodge2(width = 1))
Reproducible example
As you did not provide a reproducible example, I made one that is maybe not fully representative of your original dataset. If my answer is not working for you, you should consider providing a reproducible example (see here: How to make a great R reproducible example)
Group <- c(LETTERS[1:3],"A",LETTERS[1:2],LETTERS[1:3])
Year <- c(rep(1918,4),rep(2018,5))
Type <- c(rep("PP",3),"QQ","PP","PP","QQ","QQ","QQ")
Value <- sample(1:50,9)
df <- data.frame(Group, Year, Value, Type)
df$Type <- factor(df$Type, levels = c("PP","QQ"))
I have a dataset e.g.
outcome <- c(rnorm(500, 45, 10), rnorm(250, 40, 12), rnorm(150, 38, 7), rnorm(1000, 35, 10), rnorm(100, 30, 7))
group <- c(rep("A", 500), rep("B", 250), rep("C", 150), rep("D", 1000), rep("E", 100))
reprex <- data.frame(outcome, group)
I can plot this as a "dynamite" plot with:
graph <- ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)
giving:
I would also like to add beneath each column a label specifying how many observations were in that group. However I can't work out how to do this. I tried:
graph + geom_label (aes(label=paste(..count.., "Obs.", sep=" ")), y=-0.75, size=3.5, color="black", fontface="bold")
which returns
Error in paste(count, "Obs.", sep = " ") :
cannot coerce type 'closure' to vector of type 'character'
I've also tried
graph + stat_summary(aes(label=paste(..y.., "Obs.", sep=" ")), fun.y=count, geom="label")
but this returns:
Error: stat_summary requires the following missing aesthetics: y
I know that I can do this if I just make a dataframe of summary statistics first but that will result in me creating a new dataframe every time I need a graph and therefore I'd ideally like to be able to plot this using stat_summary() from the original dataset.
Does anyone know how to do this?
Without to create a new dataframe, you can get the count by using dplyr and calculating it ("on the fly") as follow:
library(dplyr)
library(ggplot2)
ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)+
geom_label(inherit.aes = FALSE, data = . %>% group_by(group) %>% count(),
aes(label = paste0(n, " Obs."), x = group), y = -0.5)
You cannot use stat="count" when there's already a y variable declared.. I would say the easiest way would be to create a small dataframe for counts:
label_df = reprex %>% group_by(group) %>% summarise(outcome=mean(outcome),n=n())
Then plot using that
ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)+
geom_text(data=label_df,aes(label=paste(n, "Obs.", sep=" ")), size=3.5, color="black", fontface="bold",nudge_y =1)