Related
I want to create a side by side barplot using geom_bar() of this data frame,
> dfp1
value percent1 percent
1 (18,29] 0.20909091 0.4545455
2 (29,40] 0.23478261 0.5431034
3 (40,51] 0.15492958 0.3661972
4 (51,62] 0.10119048 0.1726190
5 (62,95] 0.05660377 0.1194969
With values on the x-axis and the percents as the side by side barplots. I have tried using this code,
p = ggplot(dfp1, aes(x = value, y= c(percent, percent1)), xlab="Age Group")
p = p + geom_bar(stat="identity", width=.5)
However, I get this error: Error: Aesthetics must either be length one, or the same length as the dataProblems:value. My percent and percent1 are the same length as value, so I am confused. Thanks for the help.
You will need to melt your data first over value. It will create another variable called value by default, so you will need to renames it (I called it percent). Then, plot the new data set using fill in order to separate the data into groups, and position = "dodge" in order put the bars side by side (instead of on top of each other)
library(reshape2)
library(ggplot2)
dfp1 <- melt(dfp1)
names(dfp1)[3] <- "percent"
ggplot(dfp1, aes(x = value, y= percent, fill = variable), xlab="Age Group") +
geom_bar(stat="identity", width=.5, position = "dodge")
Similar to David's answer, here is a tidyverse option using tidyr::pivot_longer to reshape the data before plotting:
library(tidyverse)
dfp1 %>%
pivot_longer(-value, names_to = "variable", values_to = "percent") %>%
ggplot(aes(x = value, y = percent, fill = variable), xlab="Age Group") +
geom_bar(stat = "identity", position = "dodge", width = 0.5)
I am using the fivethirtyeight bechdel dataset, located here https://github.com/rudeboybert/fivethirtyeight, and am attempting to recreate the first plot shown in the article here https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/. I am having trouble getting the years to group together similarly to how they did in the article.
This is the current code I have:
ggplot(data = bechdel, aes(year)) +
geom_histogram(aes(fill = clean_test), binwidth = 5, position = "fill") +
scale_fill_manual(breaks = c("ok", "dubious", "men", "notalk", "nowomen"),
values=c("red", "salmon", "lightpink", "dodgerblue",
"blue")) +
theme_fivethirtyeight()
I see where you were going with using the histogram geom but this really looks more like a categorical bar chart. Once you take that approach it's easier, after a bit of ugly code to get the correct labels on the year columns.
The bars are stacked in the wrong order on this one, and there needs to be some formatting applied to look like the 538 chart, but I'll leave that for you.
library(fivethirtyeight)
library(tidyverse)
library(ggthemes)
library(scales)
# Create date range column
bechdel_summary <- bechdel %>%
mutate(date.range = ((year %/% 10)* 10) + ((year %% 10) %/% 5 * 5)) %>%
mutate(date.range = paste0(date.range," - '",substr(date.range + 5,3,5)))
ggplot(data = bechdel_summary, aes(x = date.range, fill = clean_test)) +
geom_bar(position = "fill", width = 0.95) +
scale_y_continuous(labels = percent) +
theme_fivethirtyeight()
ggplot
I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.
Here is a subset of the data to work on:
gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)
The code I wrote is this
p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
geom_bar(stat="identity",position = position_dodge(0.9)) +
labs(x="Drug",y="Prevalence") +
geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The figure I obtain is this one where bars are all on top of each other while I want them "dodge".
I also obtain the following warning:
ymax not defined: adjusting position using y instead
Warning message:
position_dodge requires non-overlapping x intervals
Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.
Also should I be concerned about the warning (which I clearly do not fully understand)?
I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.
Thank you
ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here:
library(dplyr)
# calculate x-axis position for bars of varying width
gr <- gr %>%
group_by(drug) %>%
arrange(practice) %>%
mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
ungroup()
x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]
ggplot(gr,
aes(x = pos, y = prevalence,
fill = country, width = prop,
ymin = low.CI, ymax = high.CI)) +
geom_col(col = "black") +
geom_errorbar(size = 0.25, colour = "black") +
facet_wrap(~drug) +
scale_fill_manual(values = c("c1" = "gray79",
"c2" = "gray60",
"c3" = "gray39"),
guide = F) +
scale_x_continuous(name = "Drug",
labels = x.labels,
breaks = x.pos) +
labs(title = "Drug usage by country and practice", y = "Prevalence") +
theme_classic()
There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
gr$drug <- paste("Drug", gr$drug)
p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence),
ymax=high.CI,ymin = low.CI,
position="dodge",fill=practice, width=prop))
p + theme_bw()+ facet_grid(drug~country, scales="free") +
geom_bar(stat="identity") +
labs(x="Practice",y="Prevalence") +
geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The width is too small in the C1 country and as you indicated the one clinic is quite influential.
Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.
I have got a data frame with several 1000 rows in the form of
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
and would like to make a kind of heatmap in which one axes has a continuous scale (position). The color column is categorical. However due to the large amount of data points I want to use binning, i.e. use it as a continuous variable.
This is more or less how the plot should look like:
I can't think of a way to create such a plot using ggplot2/R. I have tried several geometries, e.g. geom_point()
ggplot(data=df, aes(x=strain, y=pos, color=color)) +
geom_point() +
scale_colour_gradientn(colors=c("yellow", "black", "orange"))
Thanks for your help in advance.
Does this help you?
library(ggplot2)
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))
Looks like this
Improved version with 3 color gradient if you like
library(scales)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))+ scale_fill_gradientn(colours=c("orange","black","yellow"),values=rescale(c(1, 2, 3)),guide="colorbar")
I want to visualize the difference between two points with a line/bar in ggplot2.
Suppose we have some data on income and spending as a time series.
We would like to visualize not only them, but the balance (=income - spending) as well.
Furthermore, we would like to indicate whether the balance was positive (=surplus) or negative (=deficit).
I have tried several approaches, but none of them produced a satisfying result. Here we go with a reproducible example.
# Load libraries and create LONG data example data.frame
library(dplyr)
library(ggplot2)
library(tidyr)
df <- data.frame(year = rep(2000:2009, times=3),
var = rep(c("income","spending","balance"), each=10),
value = c(0:9, 9:0, rep(c("deficit","surplus"), each=5)))
df
1.Approach with LONG data
Unsurprisingly, it doesn't work with LONG data,
because the geom_linerange arguments ymin and ymax cannot be specified correctly. ymin=value, ymax=value is definately the wrong way to go (expected behaviour). ymin=income, ymax=spending is obviously wrong, too (expected behaviour).
df %>%
ggplot() +
geom_point(aes(x=year, y=value, colour=var)) +
geom_linerange(aes(x=year, ymin=value, ymax=value, colour=net))
#>Error in function_list[[i]](value) : could not find function "spread"
2.Approach with WIDE data
I almost got it working with WIDE data.
The plot looks good, but the legend for the geom_point(s) is missing (expected behaviour).
Simply adding show.legend = TRUE to the two geom_point(s) doesn't solve the problem as it overprints the geom_linerange legend. Besides, I would rather have the geom_point lines of code combined in one (see 1.Approach).
df %>%
spread(var, value) %>%
ggplot() +
geom_linerange(aes(x=year, ymin=spending, ymax=income, colour=balance)) +
geom_point(aes(x=year, y=spending), colour="red", size=3) +
geom_point(aes(x=year, y=income), colour="green", size=3) +
ggtitle("income (green) - spending (red) = balance")
3.Approach using LONG and WIDE data
Combining the 1.Approach with the 2.Approach results in yet another unsatisfying plot. The legend does not differentiate between balance and var (=expected behaviour).
ggplot() +
geom_point(data=(df %>% filter(var=="income" | var=="spending")),
aes(x=year, y=value, colour=var)) +
geom_linerange(data=(df %>% spread(var, value)),
aes(x=year, ymin=spending, ymax=income, colour=balance))
Any (elegant) way out of this dilemma?
Should I use some other geom instead of geom_linerange?
Is my data in the right format?
Try
ggplot(df[df$var != "balance", ]) +
geom_point(
aes(x = year, y = value, fill = var),
size=3, pch = 21, colour = alpha("white", 0)) +
geom_linerange(
aes(x = year, ymin = income, ymax = spending, colour = balance),
data = spread(df, var, value)) +
scale_fill_manual(values = c("green", "red"))
Output:
The main idea is that we use two different types of aesthetics for colours (fill for the points, with the appropriate pch, and colour for the lines) so that we get separate legends for each.