How to prevent line to extend across whole graph - r

Currently, the below code (part of a more comprehensive code) generates a line that ranges from the very left to the very right of the graph.
geom_abline(intercept=-8.3, slope=1/1.415, col = "black", size = 1,
lty="longdash", lwd=1) +
However, I would like the line to only range from x=1 to x=9; the limits of the x-axis are 1-9.
In ggplot2, is there a command to reduce a line that is derived from a manually defined intercept and slope to only cover the range of the x-axis value limits?

You could use geom_segment instead of geom_abline if you want to manually define the line. If your slope is derived from the dataset you are plotting from, the easiest thing to do is use stat_smooth with method = "lm".
Here is an example with some toy data:
set.seed(16)
x = runif(100, 1, 9)
y = -8.3 + (1/1.415)*x + rnorm(100)
dat = data.frame(x, y)
Estimate intercept and slope:
coef(lm(y~x))
(Intercept) x
-8.3218990 0.7036189
First make the plot with geom_abline for comparison:
ggplot(dat, aes(x, y)) +
geom_point() +
geom_abline(intercept = -8.32, slope = 0.704) +
xlim(1, 9)
Using geom_segment instead, have to define the start and end of the line for both x and y. Make sure line is truncated between 1 and 9 on the x axis.
ggplot(dat, aes(x, y)) +
geom_point() +
geom_segment(aes(x = 1, xend = 9, y = -8.32 + .704, yend = -8.32 + .704*9)) +
xlim(1, 9)
Using stat_smooth. This will draw the line only within the range of the explanatory variable by default.
ggplot(dat, aes(x, y)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE, color = "black") +
xlim(1, 9)

Related

Force geom_smooth() to plot regression line from origin to one set of points (R, ggplot, geom_smooth)

I have a very specific situation where I am trying to get a regression line to start at the origin and fit to one set of points on the x axis. Ideally I wouldn't have to do this but the line from the origin to these points is actually meaningful in my case.
I have come up with a simple example
library(dplyr)
library(ggplot2)
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y)) + geom_point() +
geom_smooth(formula = y ~ x + 0, method = "lm") +
theme_bw() + expand_limits(x = 0, y = 0) +
theme(aspect.ratio = 1)
Here, geom smooth will not fit a line from the origin to the points at x = 3. I'm assuming that there is some background script telling geom smooth to not plot a line where no variation in x axis exists. I've tested this somewhat and by changing one of the x values to 0 I can indeed get a line from the origin (though the y value I choose influences the confidence interval which is not ideal).
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,0)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y)) + geom_point() +
geom_smooth(formula = y ~ x + 0, method = "lm") +
theme_bw() + expand_limits(x = 0, y = 0) +
theme(aspect.ratio = 1)
I don't want to fiddle with the dataset and add a point at y = 0, x = 0 as I'm worried about that influencing some error estimate (however small). I'm assuming that there is some condition that I can set within geom smooth or some other command to force the line to fit. Any help is appreciated, thanks
Remember that linear regression just tells you the conditional mean of y for a given x. The "regression" at x = 3 is simply the best estimate of the mean of y at x = 3. Since all of your points are at x = 3, the conditional mean of y when x = 3 is just mean(y)
So all you need should be a line going from (0, 0) to (0, mean(y)). It really doesn't make any sense to have a standard error around this line, though perhaps it might be justified depending on the context.
library(dplyr)
library(ggplot2)
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y)) + geom_point() +
geom_smooth(formula = y ~ x + 0, method = "lm") +
theme_bw() + expand_limits(x = 0, y = 0) +
theme(aspect.ratio = 1) +
geom_line(data = data.frame(x = c(0, 3), y = c(0, mean(y))))
Kind of silly but workable solution that I have figured out.
If I add an incredibly small amount of random variation to values in the x axis and specify fullrange = TRUE within geom_smooth then I can get the line to fit with an error estimate.
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% mutate(rand = rnorm(8, mean=0.0000000001, sd=0.000000000001), x = x + rand) %>%
ggplot(aes(x,y)) + geom_point() +
geom_smooth(formula = y ~ x + 0, fullrange = TRUE ,method = "lm") +
theme_bw() + expand_limits(x = 0, y = 0) +
theme(aspect.ratio = 1)

facet_zoom can't change breaks of zoomed plot

I currently have a plot and have used facet_zoom to focus on records between 0 and 10 in the x axis. The following code reproduces an example:
require(ggplot2)
require(ggforce)
require(dplyr)
x <- rnorm(10000, 50, 25)
y <- rexp(10000)
data <- data.frame(x, y)
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10))
I want to change the breaks on the zoomed portion of the graph to be the equivalent of:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10)) +
scale_x_continuous(breaks = seq(0,10,2))
But this changes the breaks of the original plot as well. Is it possible to just change the breaks of the zoomed portion whilst leaving the original plot as default?
This works for your use case:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10)) +
scale_x_continuous(breaks = pretty)
From ?scale_x_continuous, breaks would accept the following (emphasis added):
One of:
NULL for no breaks
waiver() for the default breaks computed by the transformation object
A numeric vector of positions
A function that takes the limits as input and returns breaks as output
pretty() is one such function. It doesn't offer very fine control, but does allow you to have some leeway to specify breaks across different facets with very different scales.
For illustration, here are two examples with different desired number of breaks. See ?pretty for more details on the other arguments this function accepts.
p <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10))
cowplot::plot_grid(
p + scale_x_continuous(breaks = function(x) pretty(x, n = 3)),
p + scale_x_continuous(breaks = function(x) pretty(x, n = 10)),
labels = c("n = 3", "n = 10"),
nrow = 1
)
Of course, you can also define your own function to convert plot limits into desired breaks, (e.g. something like p + scale_x_continuous(breaks = function(x) seq(min(x), max(x), length.out = 5))), but I generally find these functions require more tweaking to get right, & pretty() is often good enough.

Cannot alter geom_errorbar width inside multiplot

I intend to put four graphs in a single page. Each plot shows the point estimate of a single statistic and its confidence interval. I am struggling with altering the width of geom_errorbar whisker in each plot. It does not seem to change, even though I alter the width argument in geom_errorbar().
It is important for me to graph those four statistics separately because both point estimates and confidence intervals are defined in different ranges for each statistic, as you can notice on the graph below. The multiplot function I use to plot multiple graphs is defined in http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/.
#creates data.frame where point estimates and confidence intervals will be
#stored
#the numbers inputed in df are similar to the ones I get from previously
#performed regressions
w<-c(1:4)
x<-c(0.68,0.87,2.93,4.66)
y<-c(0.47,0.57,0.97,3.38)
z<-c(0.83,1.34,4.17,7.46)
df<-data.frame(w,x,y,z)
#plot each statistic
#(each row from df is a statistic: w for index, x for point estimate,
#y for ci lower bound and z for ci upper bound)
p1 <- ggplot(df[1,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p2 <- ggplot(df[2,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p3 <- ggplot(df[3,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p4 <- ggplot(df[4,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
multiplot(p1, p2, p3, p4, cols=2)
I greatly appreciate any help and advice.
Thanks,
Gabriel
EXAMPLE PLOT HERE. How can I change errorbar whisker width for each graph separately?
The width is changing, but the x-axis is scaling to match the width of the error bar. You need to set the x axis manually using, for example, xlim.
For p1, you could try + xlim(0.8, 1.2)
Alternatively you could use the expand argument to scale_x_continuous, e.g. scale_x_continuous(expand = c(0, 0.1)).

Shade density plot to the left of vline?

Is it possible to shade a density plot using a vline as cutoff? For example:
df.plot <- data.frame(density=rnorm(100))
library(ggplot2)
ggplot(df.plot, aes(density)) + geom_density() +
geom_vline(xintercept = -0.25)
I tried creating a new variable, but it does not work as I expected
df.plot <- df.plot %>% mutate(color=ifelse(density<(-0.25),"red","NULL"))
ggplot(df.plot, aes(density, fill = color, colour = color)) + geom_density() +
geom_vline(xintercept = -0.25)
I don't know of a way to do that directly with ggplot. But you could calculate the density outside of ggplot over the desired range:
set.seed(4132)
df.plot <- data.frame(density=rnorm(100))
ds <- density(df.plot$density, from = min(df.plot$density), to = -0.25)
ds_data <- data.frame(x = ds$x, y = ds$y)
density() estimates the density for the points given in its first argument. The result will contain x and y values. You can specify the x-range you are interested in with from and to. In order for the density to agree with the one plotted by ggplot(), set the ranges to the minimal and maximal value in df.plot$density. Here, to is set to -0.25, because you only want the part of the density curve to the left of your vline. You can then extract the x and y values with ds$x and ds$y.
The plot is then created by using the same code as you did, but adding an additional geom_area() with the density data that was calculated above:
library(ggplot2)
ggplot(df.plot, aes(density)) + geom_density() +
geom_vline(xintercept = -0.25) +
geom_area(data = ds_data, aes(x = x, y = y))

Adding a general abline in log-log ggplot2

I am trying to add a line to separate part of data in ggplot2. Following this thread:
Adding linear model abline to log-log plot in ggplot
I tried
d = data.frame(x = 100*rlnorm(100), y = 100*rlnorm(100))
ggplot(d, aes(x, y)) + geom_point() +
geom_abline(intercept = 100, slope = -1, col='red') +
scale_y_log10() + scale_x_log10()
but it did not plot the line. Note that the old plot approach got the line alright:
plot(d$x, d$y, log='xy')
abline(a = 100, b=-1, col='red', untf=TRUE)
This may not be the most elegant solution, but I usually define a separate data frame for predictions when I'm adding them to plots. I know that it's quicker in a lot of ways to add the model specification as part of the plot, but I really like the flexibility of having this as a separate object. Here's what I've got in mind in this case:
d = data.frame(x = 100*rlnorm(100), y = 100*rlnorm(100))
p = ggplot(d, aes(x,y)) + geom_point() + scale_x_log10() + scale_y_log10()
pred.func = function(x){
100 - x
}
new.dat = data.frame(x = seq(from = 5, to = 90))
new.dat$pred = pred.func(new.dat$x)
p + geom_line(aes(x = x, y = pred), data = new.dat, col = "red")

Resources