Modiffy axes in ggplot - r

I used the following code based on a previous post How to create odds ratio and 95 % CI plot in R to produce the figure posted below. I would like to:
1) Make x and y axes as well as the legends bold
2) Increase the thickness of the lines
How can I do that in ggplot?
ggplot(alln, aes(x = apoll2, y = increase, ymin = l95, ymax = u95)) + geom_pointrange(aes(col = factor(marker)), position=position_dodge(width=0.50)) +
ylab("Percent increase & 95% CI") + geom_hline(aes(yintercept = 0)) + scale_color_discrete(name = "Marker") + xlab("")

To change axis and legend appearance you should add theme() to your plot.
+ theme(axis.text=element_text(face="bold"),
legend.text=element_text(face="bold"))
To make line wider add size=1.5 inside the geom_pointrange() call.

Related

How to set automatic label position based on box height

In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")

how to plot probability histogram in ggplot2

I want to plot a probability histogram overlay with probability curve and compare them between two group.
my code is as following,
ggplot(MDmedianall, aes(x= MD_median, y=..density.., fill =IDH.type )) +
geom_histogram(alpha = 0.5,binwidth = 0.00010, position = 'identity') +
geom_density( stat="density", position="identity", alpha=0.3 ) +
scale_fill_discrete(breaks=c("0","1"), labels=c("IDH wild type","IDH mutant type")) +
scale_y_continuous(labels = scales :: percent) +
ylab("Relative cumulative frequency(%)") +
xlab("MD median value")
However, the y axis is not what I want, any reasons for that?
BTW, how to change the line style and label them within the color square on the right.

ggplot2 add offset to jitter positions

I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))

How can I force ggplot's geom_tile to fill every facet?

I am using ggplot's geom_tile to do 2-D density plots faceted by a factor. Every facet's scale goes from the minimum of all the data to the maximum of all the data, but the geom_tile in each facet only extends to the range of the data plotted in that facet.
Example code that demonstrates the problem:
library(ggplot2)
data.unlimited <- data.frame(x=rnorm(500), y=rnorm(500))
data.limited <- subset(data.frame(x=rnorm(500), y=rnorm(500)), x<1 & y<1 & x>-1 & y>-1)
mydata <- rbind(data.frame(groupvar="unlimited", data.unlimited),
data.frame(groupvar="limited", data.limited))
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar)
Run the code, and you will see two facets. One facet shows a density plot of an "unlimited" random normal distribution. The second facet shows a random normal truncated to lie within a 2x2 square about the origin. The geom_tile in the "limited" facet will be confined inside this small box instead of filling the facet.
last_plot() +
scale_x_continuous(limits=c(-5,5)) +
scale_y_continuous(limits=c(-5,5))
These last three lines plot the same data with specified x and y limits, and we see that neither facet extends the tile sections to the edge in this case.
Is there any way to force the geom_tile in each facet to extend to the full range of the facet?
I think you're looking for a combination of scales = "free" and expand = c(0,0):
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar,scales = "free") +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0))
EDIT
Given the OP's clarification, here's one option via simply setting the panel background manually:
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar) +
scale_fill_gradient(low = "blue", high = "red") +
opts(panel.background = theme_rect(fill = "blue"),panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank())

Add vertical lines to ggplot2 bar plot

I am doing some research on non-defaulters and defaulters with regards to banking. In that context I am plotting their distributions relative to some score in a bar plot. The higher the score, the better the credit rating.
Since the number of defaults is very limited compared to the number of non-defaults plotting the defaults and non-defaults on the same bar plot is not very giving as you hardly can see the defaults. I then make a second bar plot based on the defaulters' scores only, but on the same interval scale as the full bar plot of both the scores of the defaulters and non-defaulters. I would then like to add vertical lines to the first bar plot indicating where the highest defaulter score is located and the lowest defaulter score is located. That is to get a view of where the distribution of the defaulters fit into that of the overall distribution of both defaulters and non-defaulters.
Below is the code I am using replaced with (seeded) random data instead.
library(ggplot2)
#NDS represents non-defaults and DS defaults on the same scale
#although here being just some random normals for the sake of simplicity.
set.seed(10)
NDS<-rnorm(10000,sd=1)-2
DS<-rnorm(100,sd=2)-5
#Cutoffs are constructed such that intervals of size 0.3
#contain all values of NDS & DS
minCutoff<--9.3
maxCutoff<-2.1
#Generate the actual interval "bins"
NDS_CUT<-cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3))
DS_CUT<-cut(DS,breaks=seq(minCutoff, maxCutoff, by = 0.3))
#Manually generate where to put the vertical lines for min(DS) and max(DS)
minDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[1]
maxDS_bar<-levels(cut(NDS,breaks=seq(minCutoff, maxCutoff, by = 0.3)))[32]
#Generate data frame - seems stupid, but makes sense
#when the "real" data is used :-)
NDSdataframe<-cbind(as.data.frame(NDS_CUT),rep(factor("State-1"),length(NDS_CUT)))
colnames(NDSdataframe)<-c("Score","Action")
DSdataframe<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT)))
colnames(DSdataframe)<-c("Score","Action")
fulldataframe<-rbind(NDSdataframe,DSdataframe)
attach(fulldataframe)
#Plot the full distribution of NDS & DS
# with geom_vline(xintercept = minDS_bar) + geom_vline(xintercept = maxDS_bar)
# that unfortunately does not show :-(
fullplot<-ggplot(fulldataframe, aes(Score, fill=factor(Action,levels=c("State-2","State-1")))) + geom_bar(position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts (legend.position = "none") + xlab("Scoreinterval") + ylab("Antal pr. interval") + geom_vline(xintercept = minDS_bar) + geom_vline(xintercept = maxDS_bar)
#Generate dataframe for DS only
#It might seem stupid, but again makes sense
#when using the original data :-)
DSdataframe2<-cbind(as.data.frame(DS_CUT),rep(factor("State-2"),length(DS_CUT)))
colnames(DSdataframe2)<-c("theScore","theAction")
#Calucate max number of observations to adjust bar plot of DS only
myMax<-max(table(DSdataframe2))+1
attach(DSdataframe2)
#Generate bar plot of DS only
subplot<-ggplot(fulldataframe, aes(theScore, fill=factor(theAction))) + geom_bar (position="stack") + opts(axis.text.x = theme_text(angle = 45)) + opts(legend.position = "none") + ylim(0, myMax) + xlab("Scoreinterval") + ylab("Antal pr. interval")
#plot on a grid
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
vplayout <- function(x, y)
viewport(layout.pos.row = x, layout.pos.col = y)
print(fullplot, vp = vplayout(1, 1))
print(subplot, vp = vplayout(2, 1))
#detach dataframes
detach(DSdataframe2)
detach(fulldataframe)
Furthermore, if anybody has an idea of how I can align the to plot so that correct intervals are just below/above each other on the grid plot
Hope somebody is able to help!
Thanks in advance,
Christian
Wrap aes around the xintercept in the geom_vline layer:
... + geom_vline(aes(xintercept = minDS_bar)) + geom_vline(aes(xintercept = maxDS_bar))
Question 1:
Since you provide the vertical lines as data, you have to map the aesthetics first, using aes()
fullplot <-ggplot(
fulldataframe,
aes(Score, fill=factor(Action,levels=c("State-2","State-1")))) +
geom_bar(position="stack") +
opts(axis.text.x = theme_text(angle = 45)) +
opts (legend.position = "none") +
xlab("Scoreinterval") +
ylab("Antal pr. interval") +
geom_vline(aes(xintercept = minDS_bar)) +
geom_vline(aes(xintercept = maxDS_bar))
Second question:
To align the plots, you can use the align.plots() function in package ggExtra
install.packages("dichromat")
install.packages("ggExtra", repos="http://R-Forge.R-project.org")
library(ggExtra)
ggExtra::align.plots(fullplot, subplot)

Resources