setting breaks in ggplot2 histogram - r

I need to make several histograms regarding the same vector of values and a density estimation. So the next plot is good.
values = rnorm(100)
plot = ggplot(data.frame(val=values), aes(x=val)) + geom_histogram(aes(y = ..density..)) + geom_density()
However, I need to print several plots (not one plot with different panels) with different break points, say:
breaks = list(c(-1,0,1),c(-2,-1.5,0,1.5,2),c(-0.5,0,0.5))
How can I redefine the breaks for the variable plot?

Using your own code, you can do that with:
ggplot(data.frame(val=values), aes(x=val)) +
geom_histogram(aes(y = ..density..)) +
geom_density() +
scale_y_continuous(breaks=c(-2,-1.5,0,1.5,2))

Related

Add significance lines outside/between facets

I wanted to add significant stars over 3 facets to compare them.
I google online but it is so complicated to add things outside plot. There is a ggsignif package but it does nothing to facets (https://github.com/const-ae/ggsignif/issues/22). It seems possible using gridExtra but I cannot make it.
The stars can be draw easily in a single plot, not facets. But I have to use facets to have separate rugs on the left. If you know how to have separate rugs inside a single plot, it should also solve the problem.
Here is the code and plot I want to add things on:
library(ggplot2)
ToothGrowth$dose = factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x='', y=len, color=dose)) +
geom_boxplot() +
geom_rug(sides="l") +
facet_grid(. ~ dose)
What I want is:
Sorry for the drawing. The line width should be the same. The final result should be really similar to this but for facets:
This is a workaround - plot two plots (one for significance annotation, another for boxplots).
library(ggplot2)
library(ggsignif)
ToothGrowth$dose <- factor(ToothGrowth$dose)
Plot significance annotation. Don't use boxplot here and set tips to 0 (using only one comparison here as others return error from statistical test, but I'm assuming that this is only an example dataset).
p1 <- ggplot(ToothGrowth, aes(as.factor(dose), len)) +
geom_signif(comparisons = list(c("1", "2")), tip_length = 0.005) +
coord_cartesian(ylim = c(35, 35.5)) +
theme_void()
Plot boxplots with different x axis (need this to specify comparisons groups in ggsignif)
p2 <- ggplot(ToothGrowth, aes(factor(dose), len)) +
geom_boxplot() +
geom_rug(sides = "l") +
facet_grid(. ~ dose, scales = "free_x") +
labs(x = NULL) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
Draw plots together geom_signif on-top of geom_boxplot with facet_wrap
egg::ggarrange(p1, p2, heights = c(2, 10))

how to make the value on Y axis start from zero in R, ggplot2

currently, I'm using ggplot2 to make density plot.
ggplot(data=resultFile,aes(x=V19, colour=V1) ) +
geom_line(stat="density") +
xlab("score") +
ylab("density") +
ggtitle(paste(data_name,protocol,level,sep=" ")) +
theme(legend.title=element_blank(), legend.position=c(0.92,0.9)) +
scale_color_manual(values=c("blue","red"),
labels=c("A", "B"))
using this code, I can get the plot below.
However, I can get different plot if I used plot(density()...) function in R.
Y value starts from 0.
How can I make the ggplot's plot as like plot(density()...) in R?
ggplot(data=resultFile,aes(x=V19, colour=V1) ) +
ylim(0,range) #you can use this .
geom_line(stat="density") +
xlab("score") +
ylab("density") +
ggtitle(paste(data_name,protocol,level,sep=" ")) +
theme(legend.title=element_blank(), legend.position=c(0.92,0.9)) +
scale_color_manual(values=c("blue","red"),
labels=c("A", "B"))
ggplot obviously cut off the x-axis at the min and max of the empirical distribution. You can extend the x-axis by adding xlim to the plot but please make sure that the plot does not exceed the theoretical limit of the distribution (in the example below, the theoretical limit is [0, 1], so there is not much reason to show outside the range).
set.seed(1)
temp <- data.frame(x =runif(100)^3)
library(ggplot2)
ggplot(temp, aes(x = x)) + geom_line(stat = "density" + xlim(-.2, 1.2)
plot(density(temp$x))

R - ggplot2 change x-axis values to non-log values

I am plotting some payment distribution information and I aggregated the data after scaling it to log-normal (base-e). The histograms turn out great but I want to modify the x-axis to display the non-log equivalents.
My current axis displays [0:2.5:10] values
Alternatively, I would like to see values for exp(2.5), exp(5), etc.
Any suggestions on how to accomplish this? Anything I can add to my plotting statement to scale the x-axis values? Maybe there's a better approach - thoughts?
Current code:
ggplot(plotData, aes_string(pay, fill = pt)) + geom_histogram(bins = 50) + facet_wrap(~M_P)
Answered...Final plot:
Not sure if this is exactly what you are after but you can change the text of the x axis labels to whatever you want using scale_x_continuous.
Here's without:
ggplot(data = cars) + geom_histogram(aes(x = speed), binwidth = 1)
Here's with:
ggplot(data = cars) + geom_histogram(aes(x = speed), binwidth = 1) +
scale_x_continuous(breaks=c(5,10,15,20,25), labels=c(exp(5), exp(10), exp(15), exp(20), exp(25)))

geom_historgram() versus hist() : controlling ranges in geom

Consider the following two plots
library(ggplot2)
set.seed(666)
bigx <- data.frame(x=sample(1:12,50,replace=TRUE))
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour =
"black",stat="bin",binwidth=2) +
ylab("Frequency") +
xlab("things") +
ylim(c(0,30))
hist(bigx$x)
Why do I get the overhang above 12 on ggplot? When i play with right = TRUE this just shifts the overhang to below zero. I want the simple and simply bounded result from hist() but using ggplot2.
How can I do this?
If your goal is to reproduce the output of hist(...) using ggplot, this will work:
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour = "black",stat="bin",
binwidth=2, right=TRUE) +
scale_x_continuous(limits=c(0,12),breaks=seq(0,12,2))
Or, more generally, this:
brks <- hist(bigx$x, plot=F)$breaks
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour = "black",stat="bin",
breaks=brks, right=TRUE) +
scale_x_continuous(limits=range(brks),breaks=brks)
Evidently, the ggplot default for histograms is to use right-closed intervals, whereas the default for hist(...) is left closed intervals. Also, ggplot uses a different algorithm for calculating the x-axis breaks and limits.

Splitting distribution visualisations on the y-axis in ggplot2 in r

The most commonly cited example of how to visualize a logistic fit using ggplot2 seems to be something very much like this:
data("kyphosis", package="rpart")
ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
geom_point() +
stat_smooth(method="glm", family="binomial")
This visualisation works great if you don't have too much overlapping data, and the first suggestion for crowded data seems to be to use injected jitter in the x and y coordinates of the points then adjust the alpha value of the points. When you get to the point where individual points aren't useful but distributions of points are, is it possible to use geom_density(), geom_histogram(), or something else to visualise the data but continue to split the categorical variable along the y-axis as it is done with geom_point()?
From what I have found, geom_density() and geom_histogram() can easily be split/grouped by the categorical variable and both levels can easily be reversed using scale_y_reverse() but I can't figure out if it is even possible to move only one of the categorical variable distributions to the top of the plot. Any help/suggestions would be appreciated.
The annotate() function in ggplot allows you to add geoms to a plot with properties that "are not mapped from the variables of a data frame, but are instead in as vectors," meaning that you can add layers that are unrelated to your data frame. In this case your two density curves are related to the data frame (since the variables are in it), but because you're trying to position them differently, using annotate() is useful.
Here's one way to go about it:
data("kyphosis", package="rpart")
model.only <- ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
stat_smooth(method="glm", family="binomial")
absents <- subset(kyphosis, Kyphosis=="absent")
presents <- subset(kyphosis, Kyphosis=="present")
dens.absents <- density(absents$Age)
dens.presents <- density(presents$Age)
scaling.factor <- 10 # Make the density plots taller
model.only + annotate("line", x=dens.absents$x, y=dens.absents$y*scaling.factor) +
annotate("line", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1)
This adds two annotated layers with scaled density plots for each of the kyphosis groups. For the presents variable, y is scaled and increased by 1 to shift it up.
You can also fill the density plots instead of just using a line. Instead of annotate("line"...) you need to use annotate("polygon"...), like so:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green", colour="black", alpha=0.4)
Technically you could use annotate("density"...), but that won't work when you shift the present plot up by one. Instead of shifting, it fills the whole plot:
model.only + annotate("density", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red") +
annotate("density", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green")
The only way around that problem is to use a polygon instead of a density geom.
One final variant: flipping the top density plot along y-axis = 1:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=(1 - dens.presents$y*scaling.factor), fill="green", colour="black", alpha=0.4)
I am not sure I get your point, but here an attempt:
dat <- rbind(kyphosis,kyphosis)
dat$grp <- factor(rep(c('smooth','dens'),each = nrow(kyphosis)),
levels = c('smooth','dens'))
ggplot(dat,aes(x=Age)) +
facet_grid(grp~.,scales = "free_y") +
#geom_point(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1)) +
stat_smooth(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1),
method="glm", family="binomial") +
geom_density(data=subset(dat,grp=='dens'))

Resources