why geom_smooth not showing the smooth line? - r

I am having a problem where geom_smooth() is not working on my ggplot2.
But instead of a smooth curve, there is a fold.
My X-axis variable is the factor variable(I've tried to convert it to a numerical variable, but it didn't work), and Y-axis is numeric variable.
My data.frame is that
ggplot(tmp, aes(x = x, y = y))+
geom_point()+
geom_smooth(formula = y ~ x, method = "loess", stat = "identity", se = T, group = "")
I hope to get a pic like this.

A quick fix will be to wrap the group inside aes. Generated a data similar to the structure you have (a factor x variable and a numeric y var).
set.seed(777)
x <- rep(c(LETTERS[1:7]), 3)
y <- rnorm(21, mean = 0, sd = 1)
tmp <- data.frame(x,y)
# -------------------------------------------------------------------------
base <- ggplot(tmp, aes(x = x, y = y))+geom_point()
base + geom_smooth(formula = y ~ x, method = "loess",se = TRUE, aes(group = "" ), level = 0.95) + theme_bw()
If you want to use a different level of confidence interval, you can change the value of level (which is a 95% by default).
Output

Related

ggplot2: how to add sample numbers to density plot?

I am trying to generate a (grouped) density plot labelled with sample sizes.
Sample data:
set.seed(100)
df <- data.frame(ab.class = c(rep("A", 200), rep("B", 200)),
val = c(rnorm(200, 0, 1), rnorm(200, 1, 1)))
The unlabelled density plot is generated and looks as follows:
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
What I want to do is add text labels somewhere near the peak of each density, showing the number of samples in each group. However, I cannot find the right combination of options to summarise the data in this way.
I tried to adapt the code suggested in this answer to a similar question on boxplots: https://stackoverflow.com/a/15720769/1836013
n_fun <- function(x){
return(data.frame(y = max(x), label = paste0("n = ",length(x))))
}
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4) +
stat_summary(geom = "text", fun.data = n_fun)
However, this fails with Error: stat_summary requires the following missing aesthetics: y.
I also tried adding y = ..density.. within aes() for each of the geom_density() and stat_summary() layers, and in the ggplot() object itself... none of which solved the problem.
I know this could be achieved by manually adding labels for each group, but I was hoping for a solution that generalises, and e.g. allows the label colour to be set via aes() to match the densities.
Where am I going wrong?
The y in the return of fun.data is not the aes. stat_summary complains that he cannot find y, which should be specificed in global settings at ggplot(df, aes(x = val, group = ab.class, y = or stat_summary(aes(y = if global setting of y is not available. The fun.data compute where to display point/text/... at each x based on y given in the data through aes. (I am not sure whether I have made this clear. Not a native English speaker).
Even if you have specified y through aes, you won't get desired results because stat_summary compute a y at each x.
However, you can add text to desired positions by geom_text or annotate:
# save the plot as p
p <- ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
# build the data displayed on the plot.
p.data <- ggplot_build(p)$data[[1]]
# Note that column 'scaled' is used for plotting
# so we extract the max density row for each group
p.text <- lapply(split(p.data, f = p.data$group), function(df){
df[which.max(df$scaled), ]
})
p.text <- do.call(rbind, p.text) # we can also get p.text with dplyr.
# now add the text layer to the plot
p + annotate('text', x = p.text$x, y = p.text$y,
label = sprintf('n = %d', p.text$n), vjust = 0)

predict x values from simple fitting and annoting it in the plot

I have a very simple question but so far couldn't find easy solution for that. Let's say I have a some data that I want to fit and show its x axis value where y is in particular value. In this case let's say when y=0 what is the x value. Model is very simple y~x for fitting but I don't know how to estimate x value from there. Anyway,
sample data
library(ggplot2)
library(scales)
df = data.frame(x= sort(10^runif(8,-6,1),decreasing=TRUE), y = seq(-4,4,length.out = 8))
ggplot(df, aes(x = x, y = y)) +
geom_point() +
#geom_smooth(method = "lm", formula = y ~ x, size = 1,linetype="dashed", col="black",se=FALSE, fullrange = TRUE)+
geom_smooth(se=FALSE)+
labs(title = "Made-up data") +
scale_x_log10(breaks = c(1e-6,1e-4,1e-2,1),
labels = trans_format("log10", math_format(10^.x)),limits = c(1e-6,1))+
geom_hline(yintercept=0,linetype="dashed",colour="red",size=0.6)
I would like to convert 1e-10 input to 10^-10 format and annotate it on the plot. As I indicated in the plot.
thanks in advance!
Because geom_smooth() uses R functions to calculate the smooth line, you can attain the predicted values outside the ggplot() environment. One option is then to use approx() to get a linear approximations of the x-value, given the predicted y-value 0.
# Define formula
formula <- loess(y~x, df)
# Approximate when y would be 0
xval <- approx(x = formula$fitted, y = formula$x, xout = 0)$y
# Add to plot
ggplot(...) + annotate("text", x = xval, y = 0 , label = yval)

ggplot: Plotting the bins on x-axis and the average on y-axis

Suppose that I have a dataframe that looks like this:
data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))
What I would like to do is to cut the x values into bins, such as:
data$bins <- cut(data$x,breaks = 4)
Then, I would like to plot (using ggplot) the result in a way that the x-axis is the bins, and the y axis is the mean of data$y data points that fall into the corresponding bin.
Thank you in advance
You can use the stat_summary() function.
library(ggplot2)
data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))
data$bins <- cut(data$x,breaks = 4)
# Points:
ggplot(data, aes(x = bins, y = y)) +
stat_summary(fun.y = "mean", geom = "point")
# Histogram bars:
ggplot(data, aes(x = bins, y = y)) +
stat_summary(fun.y = "mean", geom = "histogram")
Here is the picture of the points:
This thread is a bit old but here you go, use stat_summary_bin (it might be in the newer versions).
ggplot(data, mapping=aes(x, y)) +
stat_summary_bin(fun.y = "mean", geom="bar", bins=4 - 1) +
ylab("mean")
Since the mean of your y values can be smaller than 0, I recommend a dot plot instead of a bar chart. The dots represent the means. You can use either qplot or the regular ggplot function. The latter is more customizable. In this example, both produce the same output.
library(ggplot2)
set.seed(7)
data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))
data$bins <- cut(data$x,breaks = 4, dig.lab = 2)
qplot(bins, y, data = data, stat="summary", fun.y = "mean")
ggplot(data, aes(x = factor(bins), y = y)) +
stat_summary(fun.y = mean, geom = "point")
You can also add error bars. In this case, they show the mean +/- 1.96 times the group standard deviation. The group mean and SD can be obtained using tapply.
m <- tapply(data$y, data$bins, mean)
sd <- tapply(data$y, data$bins, sd)
df <- data.frame(mean.y = m, sd = sd, bin = names(m))
ggplot(df, aes(x = bin, y = mean.y,
ymin = mean.y - 1.96*sd,
ymax = mean.y + 1.96*sd)) +
geom_errorbar() + geom_point(size = 3)

Conditional colouring of a geom_smooth

I'm analyzing a series that varies around zero. And to see where there are parts of the series with a tendency to be mostly positive or mostly negative I'm plotting a geom_smooth. I was wondering if it is possible to have the color of the smooth line be dependent on whether or not it is above or below 0. Below is some code that produces a graph much like what I am trying to create.
set.seed(5)
r <- runif(22, max = 5, min = -5)
t <- rep(-5:5,2)
df <- data.frame(r+t,1:22)
colnames(df) <- c("x1","x2")
ggplot(df, aes(x = x2, y = x1)) + geom_hline() + geom_line() + geom_smooth()
I considered calculating the smoothed values, adding them to the df and then using a scale_color_gradient, but I was wondering if there is a way to achieve this in ggplot directly.
You may use the n argument in geom_smooth to increase "number of points to evaluate smoother at" in order to create some more y values close to zero. Then use ggplot_build to grab the smoothed values from the ggplot object. These values are used in a geom_line, which is added on top of the original plot. Last we overplot the y = 0 values with the geom_hline.
# basic plot with a larger number of smoothed values
p <- ggplot(df, aes(x = x2, y = x1)) +
geom_line() +
geom_smooth(linetype = "blank", n = 10000)
# grab smoothed values
df2 <- ggplot_build(p)[[1]][[2]][ , c("x", "y")]
# add smoothed values with conditional color
p +
geom_line(data = df2, aes(x = x, y = y, color = y > 0)) +
geom_hline(yintercept = 0)
Something like this:
# loess data
res <- loess.smooth(df$x2, df$x1)
res <- data.frame(do.call(cbind, res))
res$posY <- ifelse(res$y >= 0, res$y, NA)
res$negY <- ifelse(res$y < 0, res$y, NA)
# plot
ggplot(df, aes(x = x2, y = x1)) +
geom_hline() +
geom_line() +
geom_line(data=res, aes(x = x, y = posY, col = "green")) +
geom_line(data=res, aes(x = x, y = negY, col = "red")) +
scale_color_identity()

Most succinct way to label/annotate extreme values with ggplot?

I'd like to annotate all y-values greater than a y-threshold using ggplot2.
When you plot(lm(y~x)), using the base package, the second graph that pops up automatically is Residuals vs Fitted, the third is qqplot, and the fourth is Scale-location. Each of these automatically label your extreme Y values by listing their corresponding X value as an adjacent annotation. I'm looking for something like this.
What's the best way to achieve this base-default behavior using ggplot2?
Updated scale_size_area() in place of scale_area()
You might be able to take something from this to suit your needs.
library(ggplot2)
#Some data
df <- data.frame(x = round(runif(100), 2), y = round(runif(100), 2))
m1 <- lm(y ~ x, data = df)
df.fortified = fortify(m1)
names(df.fortified) # Names for the variables containing residuals and derived qquantities
# Select extreme values
df.fortified$extreme = ifelse(abs(df.fortified$`.stdresid`) > 1.5, 1, 0)
# Based on examples on page 173 in Wickham's ggplot2 book
plot = ggplot(data = df.fortified, aes(x = x, y = .stdresid)) +
geom_point() +
geom_text(data = df.fortified[df.fortified$extreme == 1, ],
aes(label = x, x = x, y = .stdresid), size = 3, hjust = -.3)
plot
plot1 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid)) +
geom_point() + geom_smooth(se = F)
plot2 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid, size = .cooksd)) +
geom_point() + scale_size_area("Cook's distance") + geom_smooth(se = FALSE, show_guide = FALSE)
library(gridExtra)
grid.arrange(plot1, plot2)

Resources