Gridlines between discrete values - r

When using a discrete values ggplot2 provides a gridline at the tick value at the centre of the value
library(reshape2)
ggplot(data=tips, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
How can I set the grid line from the x axis to appear between the discrete values (i.e. between 'Dinner' and 'Lunch')
I have tried to set panel.grid.minor.x however (I think) as it is discrete this does not work ... this is not a minor value for it to plot the girdline on.
ggplot(data=tips, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge()) +
theme(panel.grid.minor.x = element_line())

You can add a vertical line that will act as a grid line as follows:
geom_vline(xintercept=1.5, colour='white')
You can, of course, alter line width, colour, style, etc. as needed, and add multiple lines in the appropriate locations if you have several groups of bars that need to be separated by grid lines. For example, using some fake data:
set.seed(1)
dat = data.frame(total_bill=rnorm(100,80,10),
sex=rep(c("Male","Female"),50),
time=sample(c("Breakfast","Brunch","Lunch","Afternoon Tea","Dinner"),
100, replace=TRUE))
dat$time = factor(dat$time,
levels=c("Breakfast","Brunch","Lunch","Afternoon Tea","Dinner"),
ordered=TRUE)
ggplot(data=dat, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge()) +
geom_vline(xintercept=seq(1.5, length(unique(dat$time))-0.5, 1),
lwd=1, colour="black")

Had the same problem. My solution was to make the grid line bigger ..
set.seed(1)
dat = data.frame(total_bill=rnorm(100,80,10),
sex=rep(c("Male","Female"),50),
time=sample(c("Breakfast","Brunch","Lunch","Afternoon Tea","Dinner"),
100, replace=TRUE))
dat$time = factor(dat$time,
levels=c("Breakfast","Brunch","Lunch","Afternoon Tea","Dinner"),
ordered=TRUE)
ggplot(data=dat, aes(x=time, y=total_bill, color=sex)) +
geom_point(stat="identity", position=position_dodge()) +
theme(panel.grid.major.x = element_line(color = "white", size = 20))

Related

How to show the part of the errorbar lines which are within the plot margins using `ggplot2`?

I have a grid of plots, all with the same y and x-axis scale. The plots represent time in the x-axe and mean values in the y-axe with their standard errors. My problem is that some errorbars are not entirely within the plot margins, and I wonder if there is some way to represent the part of the errorlines that are within the plot margins. Below I give a fake example and code to play with:
df <- data.frame(time=seq(-15,15,1),
mean=c(0.49,0.5,0.53,0.55,0.57,0.59,0.61,0.63,0.65,0.67,0.69,0.71,0.73,0.75,0.77,0.79,0.77,0.75,0.73,0.71,0.69,0.67,0.65,0.63,0.61,0.59,0.57,0.55,0.53,0.51,0.49),
sd=c(0.09,0.087,0.082,0.08,0.023,0.011,0.010,0.009,0.008,0.007,0.006,0.005,0.004,0.003,0.002,0.001,0.002,0.003,0.004,0.005,0.006,0.007,0.008,0.009,0.010,0.011,0.023,0.08,0.084,0.087,0.09))
Plot <- ggplot(df, aes(x=time, y=mean)) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width=.3) +
geom_point(size=1) +
geom_line () +
theme_bw() +
scale_y_continuous(limits = c(0.49, 0.85), breaks = c(0.5, 0.65,0.8))
Plot
You need to set coord_cartesian limits rather than scale_y_continuous limits:
ggplot(df, aes(x=time, y=mean)) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width=.3) +
geom_point(size=1) +
geom_line () +
theme_bw() +
scale_y_continuous(breaks = c(0.5, 0.65,0.8)) +
coord_cartesian(ylim = c(0.49, 0.85))

Change the scale of x axis in ggplot

I have a ggplot bar and don't know how to change the scale of the x axis. At the moment it looks like on the image below. However I'd like to reorder the scale of the x axis so that 21% bar is higher than the 7% bar. How could I get the % to the axis? Thanks in advance!
df= data.frame("number" = c(7,21), "name" = c("x","y"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title")
Use the prop.table function to in y variable in the geom plot.
ggplot(df, aes(x=name, y=100*prop.table(number))) +
geom_bar(stat="identity", fill = "blue") +
xlab("Stichprobe") + ylab("Paketmenge absolut") +
ggtitle("Menge total")
If you want to have the character, % in the y axis, you can add scale_y_continuous to the plot as below:
library(scales)
ggplot(df, aes(x=name, y=prop.table(number))) +
geom_bar(stat="identity", fill = "blue") +
xlab("Stichprobe") + ylab("Paketmenge absolut") +
ggtitle("Menge total") +
scale_y_continuous(labels=percent)
The only way I am able to duplicate the original plot is, as #sconfluentus noted, for the 7% and 21% to be character strings. As an aside the data frame column names need not be quoted.
df= data.frame(number = c('7%','21%'), name = c("x","y"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title")
Changing the numbers to c(0.07, 0.21) and adding, as #Mohanasundaram noted, scale_y_continuous(labels = scales::percent) corrects the situation:
To be pedantic using breaks = c(0.07, 0.21) creates nearly an exact duplicate. See also here.3
Hope this is helpful.
library(ggplot2)
library(scales)
df= data.frame(number = c(0.07,0.21), name = c("KG","MS"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title") + scale_y_continuous(labels = scales::percent, breaks = c(.07, .21)))

Is there a possibility to combine position_stack and nudge_x in a stacked bar chart in ggplot2?

I want to add labels to a stacked bar chart to achieve something like this:
The goal is simple: I need to show market shares and changes versus previous year in the same graph. In theory, I would just add "nudge_x=0.5," to geom_text in the code but I get the error: "Specify either position or nudge_x/nudge_y". Is it possible to use some workaround, maybe another package? Thanks a lot in advance!
Code:
DashboardCategoryText <- c("Total Market","Small Bites","Bars","Total Market","Small Bites","Bars","Total Market","Small Bites","Bars")
Manufacturer <- c("Ferrero","Ferrero","Ferrero","Rest","Rest","Rest","Kraft","Kraft","Kraft")
MAT <- c(-1,5,-7,6,8,10,-10,5,8)
Measure_MATCurrent <- c(500,700,200,1000,600,80,30,60,100)
data <- data.frame(DashboardCategoryText,Manufacturer,MAT,Measure_MATCurrent)
library(dplyr)
groupedresult <- group_by(data,DashboardCategoryText)
groupedresult <- summarize(groupedresult,SUM=sum(Measure_MATCurrent))
groupedresult <- as.data.frame(groupedresult)
data <- merge(data,groupedresult,by="DashboardCategoryText")
data$percent <- data$Measure_MATCurrent/data$SUM
library(ggplot2)
ggplot(data, aes(x=reorder(DashboardCategoryText, SUM), y=percent, fill=Manufacturer)) +
geom_bar(stat = "identity", width = .7, colour="black", lwd=0.1) +
geom_text(aes(label=ifelse(percent >= 0.005, paste0(sprintf("%.0f", percent*100),"%"),"")),
position=position_stack(vjust=0.5), colour="white") +
geom_text(aes(label=MAT,y=percent),
nudge_x=0.5,
position=position_stack(vjust=0.8),
colour="black") +
coord_flip() +
scale_y_continuous(labels = percent_format()) +
labs(y="", x="")
I have a somewhat 'hacky' solution where you essentially just change the geom_text data in the underlying ggplot object before you plot it.
p <- ggplot(data, aes(x=reorder(DashboardCategoryText, SUM), y=percent, fill=Manufacturer)) +
geom_bar(stat = "identity", width = .7, colour="black", lwd=0.1) +
geom_text(aes(label=ifelse(percent >= 0.005, paste0(sprintf("%.0f", percent*100),"%"),"")),
position=position_stack(vjust=0.5), colour="white") +
geom_text(aes(label=MAT,y=percent),
position=position_stack(vjust=.5),
colour="black") +
coord_flip() +
scale_y_continuous(labels = percent_format()) +
labs(y="", x="")
q <- ggplot_build(p) # get the ggplot data
q$data[[3]]$x <- q$data[[3]]$x + 0.5 # change it to adjust the x position of geom_text
plot(ggplot_gtable(q)) # plot everything

ggplot in R, reordering the bars

I have the following plot:
score = c(5,4,8,5)
Group = c('A','A','B','B')
Time = c('1','2','1','2')
df = data.frame(score,Group,Time)
df$Group = factor(df$Group)
df$Time = factor(df$Time)
a = ggplot(df, aes(x=Time, y=score, fill=Group)) +
geom_bar(position=position_dodge(), stat="identity", width = 0.8, color = 'black')
How do I reorder the bars such that Group A will be grouped together, followed by Group B, and the x-axis will be labelled as Time 1,2,1,2 for each bar? As shown below:
Having repeated elements on an axis is kinda against the principles of how ggplot2 works. But we can cheat a bit. I would suggest you use #RLave suggestion of using faceting. But if that doesn't suit you, I tried to do without facetting:
df2 <- rbind(df, data.frame(score=NA, Group=c('A'), Time=c('9')))
df2$x <- as.character(interaction(df2$Group, df2$Time))
ggplot(df2, aes(x=x, y=score, fill=Group)) +
geom_col(position='dodge', colour='black') +
scale_x_discrete(labels=c('1','2','','1','2')) +
theme(axis.ticks.x = element_blank(), panel.grid.major.x = element_blank())
As you can see, we have to create a dummy variable for the x-axis, and manually put on the labels.
Now consider a better solution using facet:
ggplot(df, aes(x=Time, y=score, fill=Group)) +
geom_col(width = 1, color = 'black') +
facet_grid(~Group) +
theme(strip.background = element_blank(), strip.text = element_blank(), panel.spacing.x=grid::unit(3, 'pt'))
The distance between the panels is adjusted with the theme argument panel.spacing.x.

Use position_jitterdodge to plot points, and add highlighted points that are also dodged

I have some data where x is categorical, y is numeric, and color.var is another categorical variable that I would like to color by. My goal is to plot all of the points using position_jitterdodge(), and then highlight a couple of the points, draw a line between them, and add labels, while making sure these highlighted points line up with the corresponding strips of points that were plotted using position_jitterdodge(). The highlighted points are aligned properly when all factors are present in the variable used to dodge, but it does not work well when some factors are missing.
Minimal (non-)working example
library(ggplot2)
Generate some data
d = data.frame(x = c(rep('x1', 1000), rep('x2', 1000)),
y = runif(n=2000, min=0, max=1),
color.var= rep(c('color1', 'color2'), 1000),
facet.var = rep(c('facet1', 'facet1', 'facet2', 'facet2'), 500))
head(d)
dd = d[c(1,2,3,4,1997,1998, 1999,2000),]
dd
df1 = dd[dd$color.var=='color1',] ## data for first set of points, labels, and the line connecting them
df2 = dd[dd$color.var=='color2',] ## data for second set of points, labels, and the line connecting them
df1
dw = .75 ## Define the dodge.width
Plot all points
Here are all of the points, separated using position_jitterdodge() and the aesthetic fill.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill=color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
theme(axis.title = element_blank()) +
theme(legend.position="top")
That works well.
Additional highlighted points.
Here is the same plot, with additional points in dd added.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=dd, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4 ) +
geom_line(data=dd, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1 ) +
geom_label(data=dd, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5) +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
scale_color_manual(values=c( 'blue', 'gray40')) +
theme(axis.title = element_blank())+
theme(legend.position="top")
This is what I want it to look like. However, this only works properly if both factors of the color.var variable are in the set of points to highlight.
If both factors aren't present in the new data, the horizonal alignment fails.
Highlight points, only one factor present
Here is an example where only the 'color1' factor (blue) is present. Note that data=dd was replaced with data=df1 (data that only contains blue highlighted dots) in this code.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=df1, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4 ) +
geom_line(data=df1, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1 ) +
geom_label(data=df1, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5) +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
scale_color_manual(values=c( 'blue', 'gray40')) +
theme(axis.title = element_blank())+
theme(legend.position="top") +
scale_x_discrete(drop=F)
The highlight blue dots appear between the blue and gray dots, instead of aligned with the blue dots. Note that the additional code scale_x_discrete(drop=F) had no apparent effect on the alignment.
A manual solution
One possible fix is to edit the x coordinate manually, like this
ggplot(data=d, aes(x=x, y=y)) +
geom_point(aes(fill=color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=df1, aes(x=as.numeric(x)-dw/4, y=y), alpha=.9, size=4 , color='blue') + ## first set of points
geom_line( data=df1, aes(x=as.numeric(x)-dw/4, y=y , group=color.var ), color='blue', size=1) + ## first line
geom_label(data=df1, aes(x=as.numeric(x)-dw/4, y=y , label=round(y,1)), color='blue', vjust=-.25)+ ## first set of labels
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
theme(axis.title = element_blank() +
theme(legend.position="top")
An adjustment of 1/4 of the dodge.width seems to work. This works fine, but it seems like there should be a better way, especially since I will eventually want to do this with 4-5 sets of highlighted points/lines, which may all be all be the same color.var, like the blue 'color1' factor above. Repeating this 4-5 times would be cumbersome. I will also eventually want to do this will 5-10 different figures. I suppose dodge.width*1/4 will always work, and copying and pasting might do the trick, but would like to know if there is a better way.
Here is a solution based on #aosmith's comment. Basically, just need to add this code before using ggplot:
library(dplyr) ## needed for group_by()
library(tidyr) ## needed for complete()
df1 = df1 %>% group_by(facet.var, x) %>% complete(color.var)
That adds extra rows to the data so that all the levels of color.var are present. Then the code given in the question, along with a couple of small edits that fix the legend, can be used:
ggplot() +
geom_point(data=d , aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray', show.legend=T) +
geom_point(data=df1, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4, show.legend=T ) +
geom_line( data=df1, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1, show.legend=F ) +
geom_label(data=df1, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5, show.legend=F) +
facet_wrap(~facet.var) +
scale_fill_manual( values=c( 'lightblue','gray'), name='Background dots', guide=guide_legend(override.aes = list(color=c('lightblue', 'gray')))) +
scale_color_manual(values=c( 'blue', 'gray40') , name='Highlighted dots') +
theme(axis.title = element_blank())+
theme(legend.position="top")+
scale_x_discrete(drop=F)

Resources