Making ggplot look as nice as the native plot in R - r

I have started using ggplot because I heard it's a lot more flexible and looks a lot better than the native plot function. However, my results ggplot graph looks worse than plot function so I must be doing something wrong. For example, the labels are too small to be legible, the line does not have any points on it, and the ratio just looks better with the default plot function. I am new to data visualization, so any guide or suggestions to making the graph look better would be much appreciated.
With plot:
plot(table(month(data$DATE)), type="b",
main="Time vs. Freq",
xaxt='n',
xlab="Month",
ylab="Frequency")
axis(1, at=1:9, labels = month.name[1:9])
With ggplot:
x <- month(data$DATE)
df = data.frame(x)
df$y <- 1
ggplot(df, aes(x, y)) + stat_summary(fun.y = sum, geom = "line") + xlab("Month") + ylab("Freq") + ggtitle("Time vs. Freq")

It's not completely clear what you don't like about the default ggplot2 plots but have you tried one of the other themes?
p <- ggplot(df, aes(x, y)) + stat_summary(fun.y = sum, geom = "line") +
xlab("Month") + ylab("Freq") + ggtitle("Time vs. Freq")
p + theme_bw() # For black/white publications plots
Or grab more themes and experience
install.packages("ggthemes")
library(ggthemes)
p + theme_tufte() # Based on Tufte's ideas
p + theme_stata() # Resembles plots from stata
p + theme_economist() # A la plots in the economist
just to show a few examples. And they can be tweaked as you please

Related

Add significance lines outside/between facets

I wanted to add significant stars over 3 facets to compare them.
I google online but it is so complicated to add things outside plot. There is a ggsignif package but it does nothing to facets (https://github.com/const-ae/ggsignif/issues/22). It seems possible using gridExtra but I cannot make it.
The stars can be draw easily in a single plot, not facets. But I have to use facets to have separate rugs on the left. If you know how to have separate rugs inside a single plot, it should also solve the problem.
Here is the code and plot I want to add things on:
library(ggplot2)
ToothGrowth$dose = factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x='', y=len, color=dose)) +
geom_boxplot() +
geom_rug(sides="l") +
facet_grid(. ~ dose)
What I want is:
Sorry for the drawing. The line width should be the same. The final result should be really similar to this but for facets:
This is a workaround - plot two plots (one for significance annotation, another for boxplots).
library(ggplot2)
library(ggsignif)
ToothGrowth$dose <- factor(ToothGrowth$dose)
Plot significance annotation. Don't use boxplot here and set tips to 0 (using only one comparison here as others return error from statistical test, but I'm assuming that this is only an example dataset).
p1 <- ggplot(ToothGrowth, aes(as.factor(dose), len)) +
geom_signif(comparisons = list(c("1", "2")), tip_length = 0.005) +
coord_cartesian(ylim = c(35, 35.5)) +
theme_void()
Plot boxplots with different x axis (need this to specify comparisons groups in ggsignif)
p2 <- ggplot(ToothGrowth, aes(factor(dose), len)) +
geom_boxplot() +
geom_rug(sides = "l") +
facet_grid(. ~ dose, scales = "free_x") +
labs(x = NULL) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
Draw plots together geom_signif on-top of geom_boxplot with facet_wrap
egg::ggarrange(p1, p2, heights = c(2, 10))

How do you force Rmarkdown plots to be Square instead of Rectangle?

I have a Generalized Linear Model (GLM) that I'm plotting diagnostics for using the glm.diag.plots function in the MASS package. But it tends to plot rectangular instead of square, which is very ugly for publication.
Below is some sample code that shows the problem in an .Rmd file. In Rstudio, you can just drag the window around until it's square, but not possible in Rmarkdown documents, and I'd like to enforce square manually.
I checked in the ggplot documentation for ways to enforce square plotting, but could not find anything. glm.diag.plot() appears to use split.screen(), which doesn't provide any documentation for enforcing aspect ratios, either.
#rawr's comment is spot-on; this is a knitr/markdown issue, not glm.diag or ggplot or anything else. All you need to do is specify the desired height and width of the output (in inches, by default) using fig.width and fig.height.
It looks like you are using glm.diag.plots from package boot to acquire plots.
You could recreate them using ggplot if you wish. Here is an example:
some model:
data(anorexia, package = "MASS")
anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
family = gaussian, data = anorexia)
the glm.diag.plots output
library(boot)
glm.diag.plots(anorex.1)
To create each plot in ggplot first get an object from glm.diag.plots
z <- glm.diag.plots(anorex.1, ret = T)
then plot each plot:
library(ggplot2)
plot1 <- ggplot(data.frame(x = predict(anorex.1),
y = z$res))+
geom_point(aes(x, y)) +
xlab("Linear predictor") +
ylab("Residuals") +
theme_bw()+
theme(aspect.ratio=1)
plot2 <- ggplot(data.frame(x = qnorm(ppoints(length(z$rd)))[rank(z$rd)],
y = z$rd)) +
geom_point(aes(x, y)) +
xlab("Ordered deviance residuals") +
ylab("Quantiles of standard normal") +
geom_abline(intercept = 0, slope = 1, lty =2) +
theme_bw()+
theme(aspect.ratio=1)
plot3 <- ggplot(data.frame(x = z$h/(1-z$h),
y = z$cook)) +
geom_point(aes(x, y)) +
xlab("h/(h-1)") +
ylab("Cook statistic") +
theme_bw()+
theme(aspect.ratio=1)
plot4 <- ggplot(data.frame(x = 1:length(z$cook),
y = z$cook)) +
geom_point(aes(x, y)) +
xlab("Case") +
ylab("Cook statistic") +
theme_bw()+
theme(aspect.ratio=1)
then combine them
library(cowplot)
plot_grid(plot1, plot2, plot3, plot4, ncol = 2)
Now you can customize each plot the way you wish.

how to make the value on Y axis start from zero in R, ggplot2

currently, I'm using ggplot2 to make density plot.
ggplot(data=resultFile,aes(x=V19, colour=V1) ) +
geom_line(stat="density") +
xlab("score") +
ylab("density") +
ggtitle(paste(data_name,protocol,level,sep=" ")) +
theme(legend.title=element_blank(), legend.position=c(0.92,0.9)) +
scale_color_manual(values=c("blue","red"),
labels=c("A", "B"))
using this code, I can get the plot below.
However, I can get different plot if I used plot(density()...) function in R.
Y value starts from 0.
How can I make the ggplot's plot as like plot(density()...) in R?
ggplot(data=resultFile,aes(x=V19, colour=V1) ) +
ylim(0,range) #you can use this .
geom_line(stat="density") +
xlab("score") +
ylab("density") +
ggtitle(paste(data_name,protocol,level,sep=" ")) +
theme(legend.title=element_blank(), legend.position=c(0.92,0.9)) +
scale_color_manual(values=c("blue","red"),
labels=c("A", "B"))
ggplot obviously cut off the x-axis at the min and max of the empirical distribution. You can extend the x-axis by adding xlim to the plot but please make sure that the plot does not exceed the theoretical limit of the distribution (in the example below, the theoretical limit is [0, 1], so there is not much reason to show outside the range).
set.seed(1)
temp <- data.frame(x =runif(100)^3)
library(ggplot2)
ggplot(temp, aes(x = x)) + geom_line(stat = "density" + xlim(-.2, 1.2)
plot(density(temp$x))

ggplot2: add conditional density curves describing both dimensions of scatterplot

I have scatterplots of 2D data from two categories. I want to add density lines for each dimension -- not outside the plot (cf. Scatterplot with marginal histograms in ggplot2) but right on the plotting surface. I can get this for the x-axis dimension, like this:
set.seed(123)
dim1 <- c(rnorm(100, mean=1), rnorm(100, mean=4))
dim2 <- rnorm(200, mean=1)
cat <- factor(c(rep("a", 100), rep("b", 100)))
mydf <- data.frame(cbind(dim2, dim1, cat))
ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() +
stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
It looks like this:
But I want an analogous pair of density curves running vertically, showing the distribution of points in the y-dimension. I tried
stat_density(aes(y=dim2, x=0+(..scaled..))), position="identity", geom="line)
but receive the error "stat_density requires the following missing aesthetics: x".
Any ideas? thanks
You can get the densities of the dim2 variables. Then, flip the axes and store them in a new data.frame. After that it is simply plotting them on top of the other graph.
p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() +
stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
stuff <- ggplot_build(p)
xrange <- stuff[[2]]$ranges[[1]]$x.range # extract the x range, to make the new densities align with y-axis
## Get densities of dim2
ds <- do.call(rbind, lapply(unique(mydf$cat), function(lev) {
dens <- with(mydf, density(dim2[cat==lev]))
data.frame(x=dens$y+xrange[1], y=dens$x, cat=lev)
}))
p + geom_path(data=ds, aes(x=x, y=y, color=factor(cat)))
So far I can produce:
distrib_horiz <- stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() + distrib_horiz
And:
distrib_vert <- stat_density(data=mydf, aes(x=dim2, y=(-2+(..scaled..))),
position="identity", geom="line")
ggplot(data=mydf, aes(x=dim2, y=dim1, colour=as.factor(cat))) +
geom_point() + distrib_vert + coord_flip()
But combining them is proving tricky.
So far I have only a partial solution since I didn't manage to obtain a vertical stat_density line for each individual category, only for the total set. Maybe this can nevertheless help as a starting point for finding a better solution. My suggestion is to try with the ggMarginal() function from the ggExtra package.
p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() + stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
library(ggExtra)
ggMarginal(p,type = "density", margins = "y", size = 4)
This is what I obtain:
I know it's not perfect, but maybe it's a step in a helpful direction. At least I hope so. Looking forward to seeing other answers.

How to improve the aspect of ggplot histograms with log scales and discrete values

I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.
Please consider the following MWE
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()
which produces
and then
ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))
which probably is even worse
since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).
I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.
Is it possible to improve something?
EDIT:
This what happen when I applied Jaap solution to my real data
Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?
The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth=10) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0.015,0)) +
theme_bw()
gives:
In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.
The following code:
library(ggplot2)
library(scales)
ggplot(data, aes(x=dist)) +
stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
theme_bw()
will give this result:
I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()
Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.
ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))
A solution could be to convert your data to a factor:
library(ggplot2)
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
ggplot(data, aes(x=factor(dist))) +
geom_histogram(stat = "count") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Resulting in:
I had the same issue and, inspired by #Jaap's answer, I fiddled with the histogram binwidth using the x-axis in log scale.
If you use binwidth = 0.201, the bars will be juxtaposed as expected. However, this means you can only have up to five bars between two x coordinates.
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) +
geom_histogram(binwidth = 0.201, color = 'red') +
scale_x_log10()
Result:

Resources