Plot a line graph linked to a secondary y axis [duplicate] - r

This question already has an answer here:
Dual y axis (second axis) use in ggplot2
(1 answer)
Closed 4 years ago.
I a trying to replicate what can be easily done in an excel.
Using ggplot, I tried to plot the following:
Plot a barchart, where the left Y axis is represented in counts (0-600)
plot a line graph where the right Y axis is represented in % (0-100).
qn1 . Can someone explain to me, how can I link my percentage data to my secondary axis? Currently the line graph (which should represent the %) is plotted based on the primary Y axis using the counts scale.
qn2. How can i change the 2 scales independently?
qn3. How can i name the 2 scales independently?
ggplot() +
geom_bar(data=data,aes(x=sch,y=count,fill=category),stat = "identity")+
scale_fill_manual(values=c("darkcyan", "indianred1")) +
geom_line(data=data_percentage, aes(x=sch, y=count, group=1)) +
geom_point(data=data_percentage, aes(x=sch, y=count, group=1)) +
geom_text(data=data_percentage,aes(x=scht,label=paste(count,"%",sep="")),size=3) +
scale_y_continuous(sec.axis = sec_axis(~./2), name="%")+
theme(panel.background = element_blank(),
axis.line = element_line(colour = "black", size = 0.5, linetype = "solid"),
plot.title = element_text(size=11, face="bold", hjust=0.3),
legend.position = "top", legend.text = element_text(size=9)) +
labs(fill="") + guides(fill = guide_legend(reverse=TRUE))+
ylab("No. Recruited") + ggtitle("2. No. of students")

Answer1: You don't link the geom to an axis. Instead, you scale it up or down to be consistent with your secondary axis scale. In the example you provided, sec.axis is scaled by ~./2 then your y aesthetic in both geom_line and geom_point should be count*2. This will give and appearance that the line is linked to the secondary axis.
Answer2: You can't. In ggplot, the secondary axis should be a one-to-one transformation of the primary axis. I don't know if another package could do that.
Answer3: just move the name argument within the function scale_y_continuous to inside the function sec_axis as the example code shown below.
The code will look something like this:
ggplot() +
.
.
geom_line(data = data_percentage, aes(x=sch, y=count*2, group=1)) +
geom_point(data = data_percentage, aes(x=sch, y=count*2, group=1)) +
.
.
scale_y_continuous(sec.axis = sec_axis(~./2, name="%"))+
.
.
.

Related

Is it possible to add a 'break' to an axis that plots a categorical factor?

I am using ggplot2 to plot a mixed-design dataset in a violin plot.
The data was collected over three sessions: Baseline (collected on Day 1), Post-training (collected on Day 3) and Follow-up (collected on Day 30) and two groups: (1) Active and (2) Sham. For the sessions I have a categorical factor called 'Session' with the labels: Baseline, Post-training and Follow-up which are plotted on the x-axis. (Please ignore the rough state of the draft plot and dummy data for demonstration purposes).
level_order <- factor(tidied_data$Session, level = c('Baseline (Day 1)', 'Post-training (Day 3)', 'Follow-up (Day 30)'))
tidied_data %>%
ggplot(aes(x=level_order, y=Amplitude, fill=Group)) +
geom_violin(position=position_dodge(1), trim=FALSE) +
geom_jitter(binaxis='y', stackdir='center',
position=position_dodge(1)) +
stat_summary(fun = "mean", geom = "point",
size = 3, position=position_dodge(1), color="white") +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width=0.3, position=position_dodge(1), color="white") +
theme_bw() + # removes background colour
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + # removes grid lines
theme(panel.border = element_blank()) + # removes border lines
theme(axis.line = element_line(colour = "black")) + # adds axis lines
labs(title = "Group x Session",
x = "Session",
y = "Amplitude")
I want to demonstrate to the viewer that there is a different time-course between Baseline (Day 1), Post-training (Day 3) and follow-up (Day 30), it's a 30-day scale essentially.
From previous threads I've seen that this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.
I've come across the package 'ggbreak', where you can use the function 'scale_x_break' or scale_y_break' to set an axis break on a continuous variable. This doesn't work for the three time-points, presumably as it's a categorical factor.
Can anyone recommend a way that I can 'break' the axis to demonstrate the different length of time between the three sessions, or alternatively another way I could demonstrate this to the viewer? I've thought about adding custom spacing between bars, but I can only manage to set this to the same width for each bar, not different widths between different bars.
Any help would be greatly appreciated! Thanks in advance!
I can't recommend using discontinuous for this, but you can use facets to visually indicate small multiples. Example with a standard dataset below:
library(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(am))) +
geom_violin() +
facet_grid(~ cyl, scales = "free_x") +
theme_classic() +
theme(strip.text = element_blank()) # Hide strip text
Created on 2021-08-20 by the reprex package (v1.0.0)

How to adjust ggplot secondary y-axis to change position of bar plot layer of mixed plot?

I have created a mixed plot (2 lines and 1 bar plot) using GGPlot. The bars represent the percent difference between the two lines on the plot. It looks good except for the positioning of the bars relative to the lines. I would like to be able to adjust the secondary y-axis (i.e. the bar plot axis) to range from -50 to 50 (i.e. centre at 0) without affecting the primary y-axis, which I would like to have start at 0. Does anyone know a way to do this? The current code that I am working on and the resulting plot look like this:
year<-c(1995:2018)
value1<-c(1116.87,3030.85,2676.40,3809.81,2459.74,2666.61,2678.28,3303.45,1839.21,3567.79,4529.22,5838.21,7762.13,8079.70,9615.55,9645.06,8297.23,8974.69,12757.06,13052.86,13670.74,17598.57,17190.01,20192.92)
value2<-c(998.22,3551.52,2421.50,3647.22,2085.44,2558.46,2863.98,3332.18,1606.40,3445.12,4893.11,5486.48,7242.37,7356.78,8810.64,7787.83,7507.25,8442.26,10347.11,11002.82,8783.90,15604.60,14648.09,15368.58)
df1<-data.frame(year,value1,value2)
df1$diff<-((df1$value1-df1$value2)/df1$value1)*100
library(ggplot2)
plot1<-ggplot(df1, aes(x = year)) +
geom_bar(aes(x=year, y = (diff*200), fill="percent diff"), stat="identity") +
geom_line(aes(y=value1,color="A")) +
geom_line(aes(y=value2, color="B")) +
scale_y_continuous(sec.axis = sec_axis(~./200, name = "percent diff")) +
ylab("Value") +
theme(panel.grid.minor = element_blank(),legend.title = element_blank(), legend.spacing.y = unit(-0.1, "cm")) +
scale_color_manual(" ",values=c("A"="#619CFF", "B"="#F8766D", "percent diff"=NA)) +
scale_fill_manual(" ",values=c("percent diff"="gray"))
plot1
To be clear, the below plot (done in Excel) is what I am trying to accomplish.

ggplot2 - box plot questions on misalignment

I am having the following challenges making a plot:
I am trying to do a 'grouped' box plot - but it appears that the box plots are not showing up near the corresponding x axis group. So It's not easy to see which group each plot belongs to.
I am trying to add in an icon for the 'mean' value which right now is a triangle. However these aren't moving with the grouped boxplots.
I don't want the triangle icon for the mean value to show up in the legend - I can't figure out how to remove this.
Whenever I try to add text I just want one value either for the median or the mean - not something repeated 50x.
Boxplot
library(ggplot2)
library(ggthemes)
library(RColorBrewer)
library(reshape2)
ggplot(tips, aes(x = day, y = total_bill, fill=sex)) + #grouping factor, y variable
geom_boxplot(position = position_dodge(width = 1.2)) + # how to color
labs(title ='Barchart Plot', x=' xaxis label',y='ylabel') +
scale_fill_brewer(palette="Dark2")+ #use Dark2, Paired, Set1
theme(axis.text.x = element_text(colour="black",size=14,angle=45,
hjust=.5,vjust=.5,face="bold"),
axis.text.y = element_text(colour="grey20",size=16,angle=45,
hjust=1,vjust=0,face="plain"),
axis.title.x = element_text(colour="grey20",size=12,angle=45,
hjust=.5,vjust=0,face="plain"),
axis.title.y = element_text(colour="blue",size=16,angle=90,
hjust=0.5,vjust=.5,face='bold')) +
stat_summary(fun.y=mean, geom="point", shape=17, size=4) +
theme_base() +
geom_text(label='just the mean or median please - number only')

In ggplot2, how to add another y-axis in the figure?

Here is my data (called "data" and is a CSV format file):
attitude,order,min,max,mean,SpRate
Commanding,7,0.023005096,1.6517,0.681777825,5.66572238
Friendly,10,0.20565908,1.7535,0.843770095,6.191950464
Hostile,12,0.105828885,2.4161,1.128603777,6.493506494
Insincere,1,0.110689225,1.5551,0.730545923,5.115089514
Irony,4,0.089307133,2.2395,0.955312553,5.249343832
Joking,2,0.165717303,2.1871,0.94512688,5.141388175
Neutral,5,-0.044620705,1.5322,0.696879247,5.420054201
Polite,11,0.170151929,1.8467,0.873735105,6.191950464
Praising,8,0.192402573,2.0631,0.972857404,5.797101449
Rude,13,0.249746688,2.2885,1.100819511,6.644518272
Serious,6,0.011312206,1.7195,0.693606814,5.649717514
Sincere,9,-0.09135461,1.6409,0.659525513,5.813953488
Suggesting,3,0.072541529,1.8345,0.82999014,5.249343832
Here is my code:
library(ggplot2)
ggplot (data, aes(x=order))+
geom_rect(aes(xmin=order-0.1, xmax=order+0.1, ymin = min, ymax=max), size=1, alpha=0,color="black")+
geom_bar(aes(y=SpRate, fill="SpRate"),stat="identity", alpha=0.2, width=0.9)+
geom_point(aes(y=min, shape="min"), size=5, fill="white")+
geom_point(aes(y=mean, shape="mean"), size=5)+
geom_point(aes(y=max, shape="max"), size=5)+
scale_x_continuous(breaks=c(1:13), labels=c("Insincere","Joking","Suggesting","Irony","Neutral","Serious","Commanding","Praising","Sincere","Friendly","Polite","Hostile", "Rude"))+
xlab("")+ylab("")+theme_bw()+
theme(axis.text.x=element_text(size=25,angle=45, vjust=0.5, color="black"))+
theme(legend.text = element_text(size = 20))+
theme(legend.title = element_text(size = 20))+
labs(shape = "f0:", fill = "SpRate:")+
scale_shape_manual(values=c("min"=15,"max"=16,"mean"=18))+
scale_fill_manual(values= "black")+
theme(axis.text.y = element_text(size=20))
So, as you can see from the plot, there are two plots indeed: A rectanglular with points and a bar-plot, but the y-axis of bar-plot obviously not adapt into the y-axis presented well, so, how to add another y-axis in the right of the whole plot which could adjust for the bar-plot better? (i.e. I want the y-axis of rectangular presented from 0 to 2.5 and bar-plot from 0 to 7)
You could add the second y-axis in ggplot2.
Use this example for one panel plot (http://rpubs.com/kohske/dual_axis_in_ggplot2)
Use my example for multiple panel plot (Dual y axis in ggplot2 for multiple panel figure)

R: ggplot2: Adding count labels to histogram with density overlay

I have a time-series that I'm examining for data heterogeneity, and wish to explain some important facets of this to some data analysts. I have a density histogram overlayed by a KDE plot (in order to see both plots obviously). However the original data are counts, and I want to place the count values as labels above the histogram bars.
Here is some code:
$tix_hist <- ggplot(tix, aes(x=Tix_Cnt))
+ geom_histogram(aes(y = ..density..), colour="black", fill="orange", binwidth=50)
+ xlab("Bin") + ylab("Density") + geom_density(aes(y = ..density..),fill=NA, colour="blue")
+ scale_x_continuous(breaks=seq(1,1700,by=100))
tix_hist + opts(
title = "Ticket Density To-Date",
plot.title = theme_text(face="bold", size=18),
axis.title.x = theme_text(face="bold", size=16),
axis.title.y = theme_text(face="bold", size=14, angle=90),
axis.text.x = theme_text(face="bold", size=14),
axis.text.y = theme_text(face="bold", size=14)
)
I thought about extrapolating count values using KDE bandwidth, etc, . Is it possible to data frame the numeric output of a ggplot frequency histogram and add this as a 'layer'. I'm not savvy on the layer() function yet, but any ideas would be helpful. Many thanks!
if you want the y-axis to show the bin_count number, at the same time, adding a density curve on this histogram,
you might use geom_histogram() first and record the binwidth value! (this is very important!), next add a layer of geom_density() to show the fitting curve.
if you don't know how to choose the binwidth value, you can just calculate:
my_binwidth = (max(Tix_Cnt)-min(Tix_Cnt))/30;
(this is exactly what geom_histogram does in default.)
The code is given below:
(suppose the binwith value you just calculated is 0.001)
tix_hist <- ggplot(tix, aes(x=Tix_Cnt)) ;
tix_hist<- tix_hist + geom_histogram(aes(y=..count..),colour="blue",fill="white",binwidth=0.001);
tix_hist<- tix_hist + geom_density(aes(y=0.001*..count..),alpha=0.2,fill="#FF6666",adjust=4);
print(tix_hist);

Resources