Readjusting one plot from a two-plots graph - r

I have two plots in the same graph. I need to adjust the first plot so that the y axis is in function of the number of cases (which is lower than for the second plot). As for now the first plot is difficult to read because it is much smaller than the second one. I would want it to be proportional.
all.scores = rbind(UK_Together.scores, YesScotland.scores)
ggplot(data=all.scores) + # ggplot works on data.frames, always
geom_bar(mapping=aes(x=score, fill=Parti), binwidth=1) +
facet_grid(Parti~.) + # make a separate plot for each hashtag
theme_bw() + scale_fill_brewer() # plain display, nicer colors
Anyone can help?

Related

aes parameter to anchor start and end points for ggplot geom_smooth regression (loess)?

Is there a parameter that can anchor start and end points for a loess geom_smooth regression? If I increase the span (so that the regression isn't too wiggly), the starting and ending points seem to be drastically different (I have multiple lines on a graph, using as.factor) when in reality they are not (quite close together). I can't share my data as it is for confidential academic research, and I'm not sure how to reproduce an example for this... just wondering if this is possible with ggplot.
Here are some pictures that illustrate the problem, though...
Low span (span = 0.1), just the first 10 out of the 750 points to be graphed --> with this you can see the true starting points:
And then with the high span (span = 1.0), and all 750 points, the starting value and ending values are completely different. I'm not sure why this happens, but it is very misleading:
Basically, I want the smoothness of the second picture, but the specific and accurate starting points of the first when I graph all of the data (i.e., all 750 points). Let me know if there's any way to do this. Thanks for all your help.
Without seeing your code, I can already tell that you're setting your axis limits for the "span = 1.0" version using xlim(0,10) or scale_x_continuous(limits=c(0,10)) - is that correct? Change it to the following:
coord_cartesian(xlim = c(0, 10))
This is because xlim() (which is just a wrapper for scale_x_continuous(limits=...)) does not just zoom in on your data, but in fact discards any of the data outside of those limits before performing any calculations. Check the documentation on xlim() and the documentation on coord_cartesian() for more info.
It's easy to see how this is working using the following example:
# create dataset
set.seed(8675309)
df <- data.frame(x=1:1000, y=rnorm(1000))
# basic plot
p <- ggplot(df, aes(x,y)) + theme_bw() +
geom_point(color='gray75', size=1) + geom_smooth()
p
We get a basic plot, and as we expect, the result of geom_smooth() on this dataset is a straight line parallel to the x axis at y=0.
If we use xlim() or scale_x_continuous(limits=...) to see the first 10 points, you see that the geom_smooth() line is not the same:
p + xlim(0,10)
# or this one... results in the same plot
p + scale_x_continuous(limits=c(0,10))
The resulting line has a much higher standard deviation and is a bit above y=0, since the first 10 points happen to be just a bit above the average for the rest of the 990 points. If you use coord_cartesian(xlim=...), the zooming in of the plot happens after the calculations are made and no points are discarded, giving you the same points plotted, but the geom_smooth() line that matches that of the full dataset:
p + coord_cartesian(xlim=c(0,10))

GGPlot annotation gets pushed off page scale when combining multiple plots within grid.draw

I have 5 plots for 5 different groups. I want to indicate a statistically significant difference a specific time points. I used annotate() to place asterisks in individual plots above the time points. However, when I combine all the plots together to make one figure, the asterisks get pushed off the plots. It looks like it is a problem with the y scales not being fixed. I'm providing as much data as I feel comfortable with. The first bit of code is for one of the groups. The plots all look relatively similar for the 5 groups. The second bit is the data frame I am using to combine the plots. Pictures attached of one plot by itself, then all plots combined. There should be multiple asterisks on multiple plots
ggplot(data,aes(X,Y,group=Group,color=Group))+
theme_bw()+
theme(panel.grid.major=element_line(color="white",size=.1))+
theme(panel.grid.minor=element_line(color="white",size=.1))+
geom_point(stat="summary")+
geom_errorbar(stat="summary",fun.data=mean_se,width=0.25)+
geom_line(stat="summary")+
scale_color_manual(labels = c("C", "T"),values=c("black", "red"))+
theme(axis.title.y = element_text(vjust=2.5))+
annotate("text", x=5, y=3, label= "*",size=10)
grid.newpage()
grid.draw(rbind(ggplotGrob(plotanimal1),
ggplotGrob(plotanimal2),
ggplotGrob(plotanimal3),
ggplotGrob(plotanimal4),
ggplotGrob(plotanimal5)))
You can make the asterisks by using geom_point with shape = 42. That way, ggplot will automatically fix the y axis values itself. You need to set the aesthetics at the same values you would have with annotate. So instead of
annotate("text", x=5, y=3, label= "*",size=10)
You can do
geom_point(aes(x=5, y=3), shape = 42, size = 2)
Have you tried using the package patchwork to organize the plots? It typically works better than grid.draw

ggplot histogram with labels

I want to label to different histograms that are in the same plot. By labels, I want to identify by colors each histogram, for example one green that corresponds to x and one red that corresponds to y.
I tried to use the function label. But it is not working.
ggplot() +
geom_histogram(data=junk, aes(x),fill="green", alpha=.2) +
geom_histogram(data=jun, aes(y), fill="red", alpha=.2)+
labs(x = "something") +
ggtitle("title")
I expect to have both histograms, one green and the other one red, and labels in the right describing each histogram.
for this, you need to have the data in long format, so the data that should make up the green histogram and the data that make up the red one in a data frame below one another, and another column, that defines the groups.
df=data.frame(values=rnorm(20000),colorby=c("red_values","green_values"))
ggplot(data=df,aes(x=values,fill=colorby))+
geom_histogram(position="dodge")+
scale_fill_manual(values=c("red_values"="red","green_values"="green"))
For the position argument you could also try if "stack" fits your needs better.

Unable to correctly add labels to a list of ggplot graphs

I am trying to assemble a multipanel boxplot with ggplot.
To have a general structure I am generating a list of plots and plotting them. I also want to add letters reporting significance groups for each boxplot.
Everything works fine, except for the fact that all the boxplots show the letters computed during the last iteration of the loop.
I post below an example in which I just try to add letters reporting the loop iteration number, and as you can see instead of reporting "Plot 1" for the first loop and "Plot 2" for the second it always plots the second.
The code I used is the following:
library(ggplot2)
library(gridExtra)
mydata<-data.frame(values=c(1,4,5,6,4,2,4,7,3,4,5,6,4,4,2,1,3,6,4,1,2,5,4,3,4,2,1,3,4,2),group=c(rep("A",15),rep("B",15)))
mydata2<-data.frame(values=c(2,6,5,6,7,2,5,7,3,4,5,6,4,4,2,1,3,6,4,1,2,5,4,3,1,2,3,3,4,7),group=c(rep("A",15),rep("B",15)))
myp<-list()
for(aaa in 1:2)
{
if(aaa==1) mydata<-mydata else mydata<-mydata2
myp[[aaa]]<-ggplot(mydata, aes(x=group, y=values)) +
geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
geom_jitter(position=position_jitter(width=.1, height=0)) +
geom_text(aes(x=1, y=max(values)-0.05*max(values),label=paste("Plot",aaa))) +
geom_text(aes(x=2, y=max(values)-0.05*max(values),label=paste("Plot",aaa)))
}
do.call(grid.arrange,myp)
What am I doing wrong? It looks like the used of do.call with grid.arrange creates problems with the geom_text (but not with the plot, which is different in the two loops).
I would prefer NOT to manually write all the plot functions, since I have at lest three multipanel plots each on with 4 boxplots.
I'm not entirely sure what goes wrong with geom_text, but everything works if you use annotate instead (which should be used exactly for this purpose).
for(aaa in 1:2){
print(aaa)
if(aaa==1) df<-mydata else df<-mydata2
myp[[aaa]]<-ggplot(df, aes(x=group, y=values)) +
geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
geom_jitter(position=position_jitter(width=.1, height=0)) +
annotate("text", x=1, y=max(df$values)-0.05*max(df$values),label=paste("Plot",aaa)) +
annotate("text", x=2, y=max(df$values)-0.05*max(df$values),label=paste("Plot",aaa))
}

Adding group mean lines to geom_bar plot and including in legend

I want to be able to create a bar graph which shows also shows the mean value for bars in each group. AND shows the mean bar in the legend.
I have been able to get this graph Bar chart with means using the code below, which is fine, but I would like to be able to see the mean lines in the legend.
##The data to be graphed is the proportion of persons receiving a treatment
## (num=numerator) in each population (denom=demoninator). The population is
##grouped by two age groups and (Age) and further divided by a categorical
##variable V1
###SET UP DATAFRAME###
require(ggplot2)
df <- data.frame(V1 = c(rep(c("S1","S2","S3","S4","S5"),2)),
Age= c(rep(70,5),rep(80,5)),
num=c(5280,6570,5307,4894,4119,3377,4244,2999,2971,2322),
denom=c(9984,12600,9425,8206,7227,7290,8808,6386,6206,5227))
df$prop<-df$num/df$denom*100
PopMean<-sum(df$num)/sum(df$denom)*100
df70<-df[df$Age==70,]
group70mean<-sum(df70$num)/sum(df70$denom)*100
df80<-df[df$Age==80,]
group80mean<-sum(df80$num)/sum(df80$denom)*100
df$PopMean<-c(rep(PopMean,10))
df$groupmeans<-c(rep(group70mean,5),rep(group80mean,5))
I want the plot to look like this, but want the lines in the legend too, to be labelled as 'mean of group' or similar.
#basic plot
P<-ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity")
P
####add mean lines
P+geom_errorbar(aes(y=df$groupmeans, ymax=df$groupmeans,
ymin=df$groupmeans), col="red", lwd=2)
Adding show.legend=TRUE overlays the error bars onto the factor legend, rather than separately. If there is a way of showing geom_errorbar separately in the legend this is probably the simplest solution.
I have also tried various things with geom_line
The syntax below produces a line for the population mean value, but running from the centre of each point rather than covering the width of the bars
This produces a line for the population mean and it does produce a legend but one showing a bar of colour rather than a line.
P+geom_line(aes(y=df$PopMean, group=df$PopMean, color=df$PopMean),lwd=1)
If i try to do lines for group means the lines are not visible (because they are only single points).
P+geom_line(aes(y=df$groupmeans, group=df$groupmeans, color=df$groupmeans))
I also tried to get round this with facet plot, although this requires me to pretend my categorical variable is numeric to get it to work.
###set up new df
df2<-df
df2$V1<-c(rep(c(1,2,3,4,5),2))
P<-ggplot(df2, aes(x=factor(V1), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(),
colour='black',stat="identity",width=1)
P+facet_grid(.~factor(df2$Age))
P+facet_grid(.~factor(df2$Age))+geom_line(aes(y=df$groupmeans,
group=df$groupmeans, color=df$groupmeans))
Facetplot
This allows me to show the mean lines, using geom_line, so a legend does appear (although it doesn't look right, showing a colour gradient rather than coloured lines!). However, the lines still do not go the full width of the bars. Also my x-axis now needs relabelling to show S1, S2 etc rather than numeric 1,2,3
To sum up - is there a way of showing error bar lines separately in the legend?
If not, then, if i use facetting, how do I correct the legend appearance and relabel axes with my categorical variables and is is possible to get the line to go the full width of the plot?
Or is there an alternate solution that I am missing!?
Thanks
To get the legend for the geom_error you need to pass the colour argument in the aes.
As you want only one category (here red), I've create a dummy variable first
df$mean <- "Mean"
ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity") +
geom_errorbar(aes (ymax=groupmeans,
ymin=groupmeans, colour=mean), lwd=2) +
scale_colour_manual(name="",values = "#ff0000")

Resources