Adding a function curve to ggplot, not with geom_smooth() - r

So I am trying to add my Bayesian logistic curve to the output I already have. For context, the coefficients of my output lead me to an equation of: y = -2.53 + .0003*x
Plot without function
Here is what I have tried, and I am getting the following error:
failed function, nothing ran for the function/line
And I do not want to use geom_smooth, as it does the following which is not the EXACT points that would be on the line. Help anybody?
geom_smooth not getting the job done

It's difficult to provide a solution to your problem without your code/data or a minimal reproducible example, but perhaps this approach will help you:
library(tidyverse)
#install.packages("gapminder")
library(gapminder)
gapminder %>%
mutate(is_long = ifelse(lifeExp > 80, 1, 0)) %>%
mutate(fit = (0.00003 * gdpPercap)^2 - .253) %>%
ggplot(aes(x = gdpPercap)) +
geom_jitter(aes(y = is_long), width = 0,
height = 0.01, alpha = 0.5) +
geom_line(aes(y = fit), colour = "blue", lty = 2) +
annotate("text", x = 50000, y = 0.5,
label = paste("y = -0.253 + .00003*x", "\U00B2", sep = "")) +
coord_cartesian(ylim = c(0,1))

Related

Modify ggdist interval thickness?

I’m visualizing some distributions with the ggdist package and would like to modify the width of the interval lines. For example, a basic plot created with stat_histinterval() creates a histogram with an interval at the bottom.
library(tidyverse)
library(ggdist)
set.seed(123)
dist <-
tibble(p_grid = seq(from = 0, to = 1, length.out = 1000),
prior = rep(1, times = 1000)) %>%
mutate(likelihood = dbinom(4, size = 15, prob = p_grid),
posterior = likelihood * prior,
posterior = posterior / sum(posterior)) %>%
slice_sample(n = 10000, weight_by = posterior, replace = TRUE)
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(.width = c(0.67, 0.89))
What I would like to do is make the black interval lines thicker. From the documentation, it seems like the interval_size argument is what I need. However, specifying an interval size overwrites the entire interval (i.e., it looks like one interval instead of a 67% and 89% interval).
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(interval_size = 5)
And specifying multiple sizes to the interval_size argument errors out.
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(interval_size = c(5, 10))
#> Error: Aesthetics must be either length 1 or the same as the data (34): interval_size
Is there a way to modify the interval's thickness while preserving the presence of multiple intervals?
Created on 2022-01-14 by the reprex package (v2.0.1)
The argument for this is interval_size_range which for some reason is only documented on geom_slabinterval despite working in other functions:
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(.width = c(0.67, 0.89),
interval_size_range = c(1, 3))
To eliminate the giant point, you want to change the default value of fatten_point which expands that point. For some reason, fatten_point also affects the size of the interval, so you'll need to increase the interval_size_range to compensate with a matching line size:
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(.width = c(0.67, 0.89),
interval_size_range = c(2, 5),
fatten_point = 1)
Could you just use two interval statements?
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(.width = .89, interval_size=10) +
stat_interval(.width = .67, interval_size=10, col="black", show.legend=FALSE)
The problem is that there is a bit of overplotting. You could just add all of the individual elements:
ggplot(dist, aes(x = p_grid)) +
geom_histogram(fill="gray55") +
stat_interval(.width = .67, interval_size=10, col="black", show.legend = FALSE) +
stat_interval(.width = .89, interval_size=5,col="black", show.legend = FALSE) +
geom_point(data=dist, aes(x=mean(p_grid), y=0), col="white", inherit.aes = FALSE, size=5)

Different objects are not showing up on my ggplot2

I'm studying the returns to college admission for marginal student and i'm trying to make a ggplot2 of the following data which is, average salaries of students who finished or didn't finish their masters in medicin and the average 'GPA' (foreign equivalent) distance to the 'acceptance score':
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
I have to do a Regression Discontinuity Design (RDD), so to do the regression - as far as i understand it - i have to rewrite the DistanceGrades to numeric so i just created a variable z
z <- -5:4
where 0 is the cutoff (ie. 0 is equal to "0.0" in DistanceGrades).
I then make a dataframe
df <- data.frame(z,SalaryAfter)
Now my attempt to create the plot gets a bit messy (i use the package 'fpp3', but i suppose that it is just the ggplot2 and maybe dyplr packages)
df %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0))) %>%
ggplot(aes(x = z, y = SalaryAfter, color = D)) +
geom_point(stat = "identity") +
geom_smooth(method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
xlim(-6,5) +
xlab("Distance to acceptance score") +
labs(title = "Figur 1.1", subtitle = "Salary for every distance to the acceptance score")
Which plots:
What i'm trying to do is firstly, split the data with a dummy variable D=1 if z>0 and D=0 if z<0. Then i plot it with a linear regression and a vertical line at z=0. Lastly i write the title and subtilte. Now i have two problems:
The x axis is displaying -5, -2.5, ... but i would like for it to show all the integers, the rational numbers have no relation to the z variable which is discrete. I have tried to fix this with several different methods, but none of them have worked, i can't remember all the ways i have tried (theme(panel.grid...),scale_x_discrete and many more), but the outcome has all been pretty similar. They all cause the x-axis to be completely removed such that there is no numbers and sometimes it even removes the axis title.
i would like for the regression channel for the first part of the data to extend to z=0
When i try to solve both of these problems i again get similar results, most of the things i try is not producing an error message when i run the code, but they either do nothing to my plot or they remove some of the existing elements which leaves me made of questions. I suppose that the error is caused by some of the elements not working together but i have no idea.
Try this:
library(tidyverse)
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
z <- -5:4
df <- data.frame(z,SalaryAfter) %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0)))
# Fit a lm model for the left part of the panel
fit_data <- lm(SalaryAfter~z, data = filter(df, z <= -0.1)) %>%
predict(., newdata = data.frame(z = seq(-5, 0, 0.1)), interval = "confidence") %>%
as.data.frame() %>%
mutate(z = seq(-5, 0, 0.1), D = factor(0, levels = c(0, 1)))
# Plot
ggplot(mapping = aes(color = D)) +
geom_ribbon(data = filter(fit_data, z <= 0 & -1 <= z),
aes(x = z, ymin = lwr, ymax = upr),
fill = "grey70", color = "transparent", alpha = 0.5) +
geom_line(data = fit_data, aes(x = z, y = fit), size = 1) +
geom_point(data = df, aes(x = z, y = SalaryAfter), stat = "identity") +
geom_smooth(data = df, aes(x = z, y = SalaryAfter), method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
scale_x_continuous(limits = c(-6, 5), breaks = -6:5) +
xlab("Distance to acceptance score") +
labs(title = "Figure 1.1", subtitle = "Salary for every distance to the acceptance score")

Advice/ on how to plot side by side histograms with line graph going through in ggplot2

I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.

How can I add annotation in ggplotly animation?

I am creating animated plotly graph for my assignment in r, where I am comparing several models with various number of observations. I would like to add annotation showing what is the RMSE of the current model - this means I would like to have text that changes together with slider. Is there any easy way how to do that?
Here is my dataset stored on GitHub. There already is created variable with RMSE: data
The base ggplot graphic is as follows:
library(tidyverse)
library(plotly)
p <- ggplot(values_predictions, aes(x = x)) +
geom_line(aes(y = preds_BLR, frame = n, colour = "BLR")) +
geom_line(aes(y = preds_RLS, frame = n, colour = "RLS")) +
geom_point(aes(x = x, y = target, frame = n, colour = "target"), alpha = 0.3) +
geom_line(aes(x = x, y = sin(2 * pi * x), colour = "sin(2*pi*x)"), alpha = 0.3) +
ggtitle("Comparison of performance) +
labs(y = "predictions and targets", colour = "colours")
This is converted to plotly, and I have added an animation to the Plotly graph:
plot <- ggplotly(p) %>%
animation_opts(easing = "linear",redraw = FALSE)
plot
Thanks!
You can add annotations to a ggplot graph using the annotate function: http://ggplot2.tidyverse.org/reference/annotate.html
df <- data.frame(x = rnorm(100, mean = 10), y = rnorm(100, mean = 10))
# Build model
fit <- lm(x ~ y, data = df)
# function finds RMSE
RMSE <- function(error) { sqrt(mean(error^2)) }
library(ggplot2)
ggplot(df, aes(x, y)) +
geom_point() +
annotate("text", x = Inf, y = Inf, hjust = 1.1, vjust = 2,
label = paste("RMSE", RMSE(fit$residuals)) )
There seems to be a bit of a problem converting between ggplot and plotly. However this workaround here shows a workaround which can be used:
ggplotly(plot) %>%
layout(annotations = list(x = 12, y = 13, text = paste("RMSE",
RMSE(fit$residuals)), showarrow = F))
Here's an example of adding data dependent text using the built in iris dataset with correlation as text to ggplotly.
library(plotly)
library(ggplot2)
library(dplyr)
mydata = iris %>% rename(variable1=Sepal.Length, variable2= Sepal.Width)
shift_right = 0.1 # number from 0-1 where higher = more right
shift_down = 0.02 # number from 0-1 where higher = more down
p = ggplot(mydata, aes(variable1,variable2))+
annotate(geom = "text",
label = paste0("Cor = ",as.character(round(cor.test(mydata$variable1,mydata$variable2)$estimate,2))),
x = min(mydata$variable1)+abs(shift_right*(min(mydata$variable1)-max(mydata$variable1))),
y = max(mydata$variable2)-abs(shift_down*(min(mydata$variable2)-max(mydata$variable2))), size=4)+
geom_point()
ggplotly(p) %>% style(hoverinfo = "none", traces = 1) # remove hover on text

How to plot two geoms side-by-side for same categorical in R? (errorbarjitter)

I came across what I found to be some helpful figures by David L. Stern at howtogiveatalk.com that have a jitterplot next to a mean/sd summary for each categorical variable.
Here is the first example from the page linked above
I spent some time trying to find similar figures online and couldn't.
I'm not sure which software and packages he used to create these figures (UPDATE: David Stern responded and explained that he uses a custom built Matlab function). I am most familiar with R and ggplot2 and figure it must be possible to create something similar using these tools. I tried to jump right in and make it but can't figure out where to go from here.
How I got started:
library(dplyr)
library(ggplot2)
library(tidyr)
df <- data_frame(a = rnorm(100, mean = 0.75, sd = 0.5), b = rgamma(100, shape = 0.75, scale = 0.5), c = rbinom(100, size = 1, prob = 0.6))
df <- gather(df)
df.sum <- df %>% group_by(key) %>% summarise(mean = mean(value), sd = sd(value))
ggplot(data = df.sum, aes(x = key)) +
geom_jitter(data = df, aes(y = value)) +
geom_point(aes(y = mean)) +
geom_linerange(aes(x = key, y = mean, ymin = (mean - sd), ymax = (mean + sd))) +
theme_bw()
Which produces the following graph:
The code is pretty rough, but gets most of the way there. I can't figure out how to move the geom_point and geom_linerange beside the jitter, though.
So how can this figure be made in R (preferably using ggplot2)?
I've figured it out! I'll post the answer here for future reference and for anyone else wanting to make a similar plot.
The key for me came down to converting the x-axis from a factor to a numeric in order to apply the shift.
library(dplyr)
library(ggplot2)
library(tidyr)
set.seed(125)
df <- data_frame(Normal = rnorm(100, mean = 0.5, sd = 0.5),
Gamma = rgamma(100, shape = 0.5, scale = 0.5),
Bimodal = c(rnorm(50, mean = 0.1, sd = 0.15), rnorm(50, mean = 0.9, sd = 0.15))
)
df <- gather(df)
df.sum <- df %>%
group_by(key) %>%
summarise(mean = mean(value), sd = sd(value))
ggplot(data = df, aes(x = key, y = value)) +
geom_jitter(position = position_jitter(width = 0.2), shape = 1, size = 3.5) +
geom_pointrange(data = df.sum, aes(x = as.numeric(key)+0.3, y = mean, ymin = (mean - sd), ymax = (mean + sd))) +
geom_point(data = df.sum, aes(x = as.numeric(key)+0.3, y = mean), size = 3.5) +
theme_bw() + xlab("") + ylab("Arbitrary Units")
It would be great if this code could be adapted into a ggplot extension to make this into a simple geom. I might take on the challenge myself if I can find the time.
Fairly straightforward without ggplot2
x<-0.5+runif(100,-0.2,0.2)
y<-rbind(rnorm(100,1,1),rgamma(100,1,1),rbinom(100,1,0.5)*2+rnorm(100,0,0.2))
for (j in 0:2){
if (j==0){plot(x,y[1,],xlim=c(0,4),ylim=c(-1,5),xlab="",ylab="Arbitrary Units",xaxt="n",bty="n",col="gray50")}
else{points(x+j, y[j+1,],col="gray50")}
points(j+0.9, mean(y[j+1,]),pch=19)
arrows(j+0.9,mean(y[j+1,])-sd(y[j+1,]),j+0.9,mean(y[j+1,])+sd(y[j+1,]), angle=90,length=0)
} # for j categories
axis(1,seq(0.5,2.5, by=1),tick=F,labels=c("Normal","Gamma","Bimodal"))

Resources