I’m visualizing some distributions with the ggdist package and would like to modify the width of the interval lines. For example, a basic plot created with stat_histinterval() creates a histogram with an interval at the bottom.
library(tidyverse)
library(ggdist)
set.seed(123)
dist <-
tibble(p_grid = seq(from = 0, to = 1, length.out = 1000),
prior = rep(1, times = 1000)) %>%
mutate(likelihood = dbinom(4, size = 15, prob = p_grid),
posterior = likelihood * prior,
posterior = posterior / sum(posterior)) %>%
slice_sample(n = 10000, weight_by = posterior, replace = TRUE)
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(.width = c(0.67, 0.89))
What I would like to do is make the black interval lines thicker. From the documentation, it seems like the interval_size argument is what I need. However, specifying an interval size overwrites the entire interval (i.e., it looks like one interval instead of a 67% and 89% interval).
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(interval_size = 5)
And specifying multiple sizes to the interval_size argument errors out.
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(interval_size = c(5, 10))
#> Error: Aesthetics must be either length 1 or the same as the data (34): interval_size
Is there a way to modify the interval's thickness while preserving the presence of multiple intervals?
Created on 2022-01-14 by the reprex package (v2.0.1)
The argument for this is interval_size_range which for some reason is only documented on geom_slabinterval despite working in other functions:
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(.width = c(0.67, 0.89),
interval_size_range = c(1, 3))
To eliminate the giant point, you want to change the default value of fatten_point which expands that point. For some reason, fatten_point also affects the size of the interval, so you'll need to increase the interval_size_range to compensate with a matching line size:
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(.width = c(0.67, 0.89),
interval_size_range = c(2, 5),
fatten_point = 1)
Could you just use two interval statements?
ggplot(dist, aes(x = p_grid)) +
stat_histinterval(.width = .89, interval_size=10) +
stat_interval(.width = .67, interval_size=10, col="black", show.legend=FALSE)
The problem is that there is a bit of overplotting. You could just add all of the individual elements:
ggplot(dist, aes(x = p_grid)) +
geom_histogram(fill="gray55") +
stat_interval(.width = .67, interval_size=10, col="black", show.legend = FALSE) +
stat_interval(.width = .89, interval_size=5,col="black", show.legend = FALSE) +
geom_point(data=dist, aes(x=mean(p_grid), y=0), col="white", inherit.aes = FALSE, size=5)
Related
I'm using quickpsy package in R (https://cran.r-project.org/web/packages/quickpsy/quickpsy.pdf /
http://dlinares.org/quickpsy.html) to fit psychometric functions to the data. I use quickpsy and then plotcurves.
fit <- quickpsy(data, delta, response, grouping = c("condition"),lapses = FALSE, bootstrap = "none", fun = logistic_fun)
plotcurves(fit, ci = TRUE) + labs(y = "Proportion yes responses", x="Delta") + theme_classic(base_size = 20) + scale_x_continuous(n.breaks = 6, limits=c(-3, 3)) +
scale_color_manual(values=c("#C0C0C0", "#000000")) + theme(legend.title = element_blank())
I'd like to make the plotted curves thicker. Is there any way to do it? I couldn't increase the thickness with any ggplot width manipulation.
What you could do is using your quickpsy fit with ggplot instead of plotcurves. Then you can change the size of your geom_line. Here is a reproducible example:
library(ggplot2)
library(quickpsy)
x <- seq(0, 420, 60)
k <- c(0, 0, 4, 18, 20, 20, 19, 20)
dat <- tibble(x, k, n = 20)
fitWithoutLapses <- quickpsy(dat, x, k, n, prob = .75)
#> Warning: `group_by_()` was deprecated in dplyr 0.7.0.
#> Please use `group_by()` instead.
#> See vignette('programming') for more help
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
curvesWithoutLapses <- fitWithoutLapses$curves %>%
mutate(cond = 'Without Lapses')
pWithout <- ggplot()+
geom_point(data = fitWithoutLapses$averages, aes(x = x, y = prob)) +
geom_line(data = curvesWithoutLapses,
aes(x = x, y = y, color = cond), size = 2) +
geom_linerange(data = fitWithoutLapses$thresholds,
aes(x = thre, ymin = 0, ymax = prob), lty =2)
pWithout
Created on 2022-07-30 by the reprex package (v2.0.1)
For more info check this link for using quickpsy in ggplot.
I would like to group a series of lines by 2 factors using group = interaction in ggplot. Here is some sample code:
set.seed(123)
N <- 18
means <- rnorm(N,0,1)
ses <- rexp(N,2)
upper<- means+qnorm(0.975)*ses
lower<- means+qnorm(0.025)*ses
fruit <- rep(c("Apples","Bananas","Pears"), each=6)
size <- rep(rep(c("Small","Medium","Big"), each=2),3)
GMO <- rep(c("Yes","No"), 9)
d<- data.frame(means,upper,lower,fruit,size,GMO)
ggplot(data=d,
aes(x = fruit,y = means, ymin = lower, ymax = upper, col=size,linetype=GMO,group=interaction(GMO, size)))+
geom_hline(aes(fill=size),yintercept =1, linetype=2)+
xlab('labels')+ ylab("Parameter estimates (95% Confidence Interval)")+
geom_pointrange(position=position_dodge(width = 0.6)) +
scale_x_discrete(name="Fruits")+
coord_flip()-> fplot
dev.new()
fplot
Here's a link to the resulting graph: https://i.stack.imgur.com/5YF4F.png
I would like to bring the same coloured lines for each of the three groups closer together. In other words I would like the lines to cluster not only by the 'Fruit' variable but also the 'Size' variable for each of the fruits. poisition_dodge seems to only work for one of the interacting groups.
Thanks for your advice.
As far as I know that is not possible with position_dodge, i.e. it dodges according to the categories of the group aes. And it does not matter whether you map one variable on the group aes or an interaction of two or more. The groups are simply placed equidistant from one another.
One option to achieve your desired result would be to use the "facets that don't look like facets" trick which means faceting by fruit, mapping size on x and afterwards using theme options to get rid of the facet look plus some tweaking of the x scale:
set.seed(123)
N <- 18
means <- rnorm(N, 0, 1)
ses <- rexp(N, 2)
upper <- means + qnorm(0.975) * ses
lower <- means + qnorm(0.025) * ses
fruit <- rep(c("Apples", "Bananas", "Pears"), each = 6)
size <- rep(rep(c("Small", "Medium", "Big"), each = 2), 3)
GMO <- rep(c("Yes", "No"), 9)
d <- data.frame(means, upper, lower, fruit, size, GMO)
library(ggplot2)
ggplot(data = d, aes(x = size, y = means, ymin = lower, ymax = upper, col = size, linetype = GMO, group = GMO)) +
geom_hline(yintercept = 1, linetype = 2) +
xlab("labels") +
ylab("Parameter estimates (95% Confidence Interval)") +
geom_pointrange(position = position_dodge(width = 0.6)) +
scale_x_discrete(name = "Fruits", breaks = "Medium", labels = NULL, expand = c(0, 1)) +
coord_flip() +
facet_grid(fruit ~ ., switch = "y") +
theme(strip.placement = "outside",
strip.background.y = element_blank(),
strip.text.y.left = element_text(angle = 0),
panel.spacing.y = unit(0, "pt"))
Maybe you want to facet_wrap your size variable:
set.seed(123)
N <- 18
means <- rnorm(N,0,1)
ses <- rexp(N,2)
upper<- means+qnorm(0.975)*ses
lower<- means+qnorm(0.025)*ses
fruit <- rep(c("Apples","Bananas","Pears"), each=6)
size <- rep(rep(c("Small","Medium","Big"), each=2),3)
GMO <- rep(c("Yes","No"), 9)
d<- data.frame(means,upper,lower,fruit,size,GMO)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.2
ggplot(data=d,
aes(x = fruit,y = means, ymin = lower, ymax = upper, col=size,linetype=GMO,group=interaction(GMO, size)))+
geom_hline(aes(fill=size),yintercept =1, linetype=2)+
xlab('labels')+ ylab("Parameter estimates (95% Confidence Interval)")+
geom_pointrange(position=position_dodge(width = 0.6)) +
scale_x_discrete(name="Fruits")+
coord_flip() +
facet_wrap(~size)-> fplot
#> Warning: geom_hline(): Ignoring `mapping` because `yintercept` was provided.
fplot
Created on 2022-07-13 by the reprex package (v2.0.1)
I'm studying the returns to college admission for marginal student and i'm trying to make a ggplot2 of the following data which is, average salaries of students who finished or didn't finish their masters in medicin and the average 'GPA' (foreign equivalent) distance to the 'acceptance score':
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
I have to do a Regression Discontinuity Design (RDD), so to do the regression - as far as i understand it - i have to rewrite the DistanceGrades to numeric so i just created a variable z
z <- -5:4
where 0 is the cutoff (ie. 0 is equal to "0.0" in DistanceGrades).
I then make a dataframe
df <- data.frame(z,SalaryAfter)
Now my attempt to create the plot gets a bit messy (i use the package 'fpp3', but i suppose that it is just the ggplot2 and maybe dyplr packages)
df %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0))) %>%
ggplot(aes(x = z, y = SalaryAfter, color = D)) +
geom_point(stat = "identity") +
geom_smooth(method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
xlim(-6,5) +
xlab("Distance to acceptance score") +
labs(title = "Figur 1.1", subtitle = "Salary for every distance to the acceptance score")
Which plots:
What i'm trying to do is firstly, split the data with a dummy variable D=1 if z>0 and D=0 if z<0. Then i plot it with a linear regression and a vertical line at z=0. Lastly i write the title and subtilte. Now i have two problems:
The x axis is displaying -5, -2.5, ... but i would like for it to show all the integers, the rational numbers have no relation to the z variable which is discrete. I have tried to fix this with several different methods, but none of them have worked, i can't remember all the ways i have tried (theme(panel.grid...),scale_x_discrete and many more), but the outcome has all been pretty similar. They all cause the x-axis to be completely removed such that there is no numbers and sometimes it even removes the axis title.
i would like for the regression channel for the first part of the data to extend to z=0
When i try to solve both of these problems i again get similar results, most of the things i try is not producing an error message when i run the code, but they either do nothing to my plot or they remove some of the existing elements which leaves me made of questions. I suppose that the error is caused by some of the elements not working together but i have no idea.
Try this:
library(tidyverse)
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
z <- -5:4
df <- data.frame(z,SalaryAfter) %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0)))
# Fit a lm model for the left part of the panel
fit_data <- lm(SalaryAfter~z, data = filter(df, z <= -0.1)) %>%
predict(., newdata = data.frame(z = seq(-5, 0, 0.1)), interval = "confidence") %>%
as.data.frame() %>%
mutate(z = seq(-5, 0, 0.1), D = factor(0, levels = c(0, 1)))
# Plot
ggplot(mapping = aes(color = D)) +
geom_ribbon(data = filter(fit_data, z <= 0 & -1 <= z),
aes(x = z, ymin = lwr, ymax = upr),
fill = "grey70", color = "transparent", alpha = 0.5) +
geom_line(data = fit_data, aes(x = z, y = fit), size = 1) +
geom_point(data = df, aes(x = z, y = SalaryAfter), stat = "identity") +
geom_smooth(data = df, aes(x = z, y = SalaryAfter), method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
scale_x_continuous(limits = c(-6, 5), breaks = -6:5) +
xlab("Distance to acceptance score") +
labs(title = "Figure 1.1", subtitle = "Salary for every distance to the acceptance score")
I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.
I'm faced with the following problem: a few extreme values are dominating the colorscale of my geom_raster plot. An example is probably more clear (note that this example only works with a recent ggplot2 version, I use 0.9.2.1):
library(ggplot2)
library(reshape)
theme_set(theme_bw())
m_small_sd = melt(matrix(rnorm(10000), 100, 100))
m_big_sd = melt(matrix(rnorm(100, sd = 10), 10, 10))
new_xy = m_small_sd[sample(nrow(m_small_sd), nrow(m_big_sd)), c("X1","X2")]
m_big_sd[c("X1","X2")] = new_xy
m = data.frame(rbind(m_small_sd, m_big_sd))
names(m) = c("x", "y", "fill")
ggplot(m, aes_auto(m)) + geom_raster() + scale_fill_gradient2()
Right now I solve this by setting the values over a certain quantile equal to that quantile:
qn = quantile(m$fill, c(0.01, 0.99), na.rm = TRUE)
m = within(m, { fill = ifelse(fill < qn[1], qn[1], fill)
fill = ifelse(fill > qn[2], qn[2], fill)})
This does not really feel like an optimal solution. What I would like to do is have a non-linear mapping of colors to the range of values, i.e. more colors present in the area with more observations. In spplot I could use classIntervals from the classInt package to calculate the appropriate class boundaries:
library(sp)
library(classInt)
gridded(m) = ~x+y
col = c("#EDF8B1", "#C7E9B4", "#7FCDBB", "#41B6C4",
"#1D91C0", "#225EA8", "#0C2C84", "#5A005A")
at = classIntervals(m$fill, n = length(col) + 1)$brks
spplot(m, at = at, col.regions = col)
To my knowledge it is not possible to hardcode this mapping of colors to class intervals like I can in spplot. I could transform the fill axis, but as there are negative values in the fill variable that will not work.
So my question is: are there any solutions to this problem using ggplot2?
Seems that ggplot (0.9.2.1) and scales (0.2.2) bring all you need (for your original m):
library(scales)
qn = quantile(m$fill, c(0.01, 0.99), na.rm = TRUE)
qn01 <- rescale(c(qn, range(m$fill)))
ggplot(m, aes(x = x, y = y, fill = fill)) +
geom_raster() +
scale_fill_gradientn (
colours = colorRampPalette(c("darkblue", "white", "darkred"))(20),
values = c(0, seq(qn01[1], qn01[2], length.out = 18), 1)) +
theme(legend.key.height = unit (4.5, "lines"))