Related
I'm using quickpsy package in R (https://cran.r-project.org/web/packages/quickpsy/quickpsy.pdf /
http://dlinares.org/quickpsy.html) to fit psychometric functions to the data. I use quickpsy and then plotcurves.
fit <- quickpsy(data, delta, response, grouping = c("condition"),lapses = FALSE, bootstrap = "none", fun = logistic_fun)
plotcurves(fit, ci = TRUE) + labs(y = "Proportion yes responses", x="Delta") + theme_classic(base_size = 20) + scale_x_continuous(n.breaks = 6, limits=c(-3, 3)) +
scale_color_manual(values=c("#C0C0C0", "#000000")) + theme(legend.title = element_blank())
I'd like to make the plotted curves thicker. Is there any way to do it? I couldn't increase the thickness with any ggplot width manipulation.
What you could do is using your quickpsy fit with ggplot instead of plotcurves. Then you can change the size of your geom_line. Here is a reproducible example:
library(ggplot2)
library(quickpsy)
x <- seq(0, 420, 60)
k <- c(0, 0, 4, 18, 20, 20, 19, 20)
dat <- tibble(x, k, n = 20)
fitWithoutLapses <- quickpsy(dat, x, k, n, prob = .75)
#> Warning: `group_by_()` was deprecated in dplyr 0.7.0.
#> Please use `group_by()` instead.
#> See vignette('programming') for more help
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
curvesWithoutLapses <- fitWithoutLapses$curves %>%
mutate(cond = 'Without Lapses')
pWithout <- ggplot()+
geom_point(data = fitWithoutLapses$averages, aes(x = x, y = prob)) +
geom_line(data = curvesWithoutLapses,
aes(x = x, y = y, color = cond), size = 2) +
geom_linerange(data = fitWithoutLapses$thresholds,
aes(x = thre, ymin = 0, ymax = prob), lty =2)
pWithout
Created on 2022-07-30 by the reprex package (v2.0.1)
For more info check this link for using quickpsy in ggplot.
I'm studying the returns to college admission for marginal student and i'm trying to make a ggplot2 of the following data which is, average salaries of students who finished or didn't finish their masters in medicin and the average 'GPA' (foreign equivalent) distance to the 'acceptance score':
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
I have to do a Regression Discontinuity Design (RDD), so to do the regression - as far as i understand it - i have to rewrite the DistanceGrades to numeric so i just created a variable z
z <- -5:4
where 0 is the cutoff (ie. 0 is equal to "0.0" in DistanceGrades).
I then make a dataframe
df <- data.frame(z,SalaryAfter)
Now my attempt to create the plot gets a bit messy (i use the package 'fpp3', but i suppose that it is just the ggplot2 and maybe dyplr packages)
df %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0))) %>%
ggplot(aes(x = z, y = SalaryAfter, color = D)) +
geom_point(stat = "identity") +
geom_smooth(method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
xlim(-6,5) +
xlab("Distance to acceptance score") +
labs(title = "Figur 1.1", subtitle = "Salary for every distance to the acceptance score")
Which plots:
What i'm trying to do is firstly, split the data with a dummy variable D=1 if z>0 and D=0 if z<0. Then i plot it with a linear regression and a vertical line at z=0. Lastly i write the title and subtilte. Now i have two problems:
The x axis is displaying -5, -2.5, ... but i would like for it to show all the integers, the rational numbers have no relation to the z variable which is discrete. I have tried to fix this with several different methods, but none of them have worked, i can't remember all the ways i have tried (theme(panel.grid...),scale_x_discrete and many more), but the outcome has all been pretty similar. They all cause the x-axis to be completely removed such that there is no numbers and sometimes it even removes the axis title.
i would like for the regression channel for the first part of the data to extend to z=0
When i try to solve both of these problems i again get similar results, most of the things i try is not producing an error message when i run the code, but they either do nothing to my plot or they remove some of the existing elements which leaves me made of questions. I suppose that the error is caused by some of the elements not working together but i have no idea.
Try this:
library(tidyverse)
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
z <- -5:4
df <- data.frame(z,SalaryAfter) %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0)))
# Fit a lm model for the left part of the panel
fit_data <- lm(SalaryAfter~z, data = filter(df, z <= -0.1)) %>%
predict(., newdata = data.frame(z = seq(-5, 0, 0.1)), interval = "confidence") %>%
as.data.frame() %>%
mutate(z = seq(-5, 0, 0.1), D = factor(0, levels = c(0, 1)))
# Plot
ggplot(mapping = aes(color = D)) +
geom_ribbon(data = filter(fit_data, z <= 0 & -1 <= z),
aes(x = z, ymin = lwr, ymax = upr),
fill = "grey70", color = "transparent", alpha = 0.5) +
geom_line(data = fit_data, aes(x = z, y = fit), size = 1) +
geom_point(data = df, aes(x = z, y = SalaryAfter), stat = "identity") +
geom_smooth(data = df, aes(x = z, y = SalaryAfter), method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
scale_x_continuous(limits = c(-6, 5), breaks = -6:5) +
xlab("Distance to acceptance score") +
labs(title = "Figure 1.1", subtitle = "Salary for every distance to the acceptance score")
Suppose a dataset containing count data per multiple time periods and per multiple groups in the following format:
set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
week = rep(1:50, 3),
rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))
group week rate
1 1 1 604
2 1 2 598
3 1 3 578
4 1 4 591
5 1 5 589
6 1 6 571
7 1 7 581
8 1 8 597
9 1 9 589
10 1 10 584
I'm interested in fitting a model-based trend line per groups, however, I want this trend line to be displayed only from a certain x value. To visualize the trend line using all data points (requires ggplot2):
df %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "quasipoisson"),
se = FALSE)
Or to fit a model based on a specific range of values (requires ggplot2 and dplyr):
df %>%
group_by(group) %>%
mutate(rate2 = ifelse(week < 35, NA, rate)) %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(aes(y = rate2),
method = "glm",
method.args = list(family = "quasipoisson"),
se = FALSE)
However, I cannot find a way to fit the models using all data, but display the trend line only from a specific x value (let's say 35+). Thus, I essentially want the trend line as computed for plot one, but displaying it according the second plot, using ggplot2 and ideally only one pipeline.
I went to look at the after_stat function mentioned by #tjebo. See if the following works for you?
df %>%
ggplot(aes(x = week,
y = rate,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
aes(group = after_stat(interaction(group, x > 35)),
colour = after_scale(alpha(colour, as.numeric(x > 35)))),
method.args = list(family = "quasipoisson"),
se = F)
This works by splitting the points associated with each line into two groups, those in the x <=35 region and those in the x >35 region, since a line's colour shouldn't vary, and defining a separate colour transparency for each new group. As a result, only the lines in the x > 35 region are visible.
When used, the code triggers a warning that the after_scale modification isn't applied to the legend. I don't think that's a problem though, since we don't need it to appear in the legend anyway.
If you can tolerate a warning, you can solve this with 1 line difference from the example code using stage().
library(tidyverse)
set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
week = rep(1:50, 3),
rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))
df %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "quasipoisson"),
aes(x = stage(week, after_stat = ifelse(x > 35, x, NA))),
se = FALSE)
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 165 rows containing missing values (geom_smooth).
One way to do this is to construct the fitted values outside of ggplot so you have control over them:
df$fit <- glm(rate ~ week + group, data = df, family = "quasipoisson")$fitted.values
library(dplyr)
library(ggplot2)
ggplot(df, aes(x = week, group = group, lty = group)) +
geom_line(aes(y = rate)) +
geom_point(aes(y = rate)) +
geom_line(data = df %>% filter(week >= 35), aes(y = fit), color = "blue", size = 1.25)
I am not sure if it is generally correct to use a linear model in time series. The whole point about time series is that they require specific statistics because of their expected autocorrelation. You might want something like average rolling models instead.
I am not sure if your visualisation might not be quite confusing and, more dangerously, misleading.
Besides, an interesting problem. I thought the new after_stat might somehow help, but I couldn't get it working.
So, here a quick hack. Change the order of your geom-s and draw a rectangle in-between. I am cheekily using a different theme, but if you really want to use theme_grey(), you can fake the axis lines as well.
library(tidyverse)
set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
week = rep(1:50, 3),
rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))
df %>%
ggplot(aes(x = week, y = rate, group = group, lty = group)) +
stat_smooth(se = FALSE) +
geom_rect(xmin = -Inf, xmax = 35, ymin = -Inf, ymax = Inf,
fill = "white") +
geom_line() +
geom_point() +
theme_classic()
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2021-02-09 by the reprex package (v1.0.0)
P.S. I've removed a few of the unnecessary bits in the code to reproduce this, like the model specs.
You could use ggplot_build to get the structure of the plot :
p <- ggplot(df, aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "quasipoisson"),
se = FALSE)
p_build <- ggplot_build(p)
You could then modify the internal data, here the third element of the data list (geom_smooth):
p_build$data[[3]]$x <- sapply(p_build$data[[3]]$x,function(x) {ifelse(x<35,NA,x)})
and use ggplot_gtable to regenerate the plot (the lm calculations still apply to the whole dataset):
plot(ggplot_gtable(p_build))
Consider the MWE below. I would like to generate boxplots with these ideas in mind:
Food on the y-axix ordered according to Amot for Home, while Amt (1:40) on x-axis
show mean points overlaying the boxes
boxplots of Food to be ordered based on median of Home Site from dfsummary data
text annotations of N of observations (to be taken from dfsummary data)
MWE
df <- data.frame(
Site = sample(rep(c("Home", "Office"), size = 884)),
Food = sample(rep(c("Banana","Apple","Egg","Berry","Tomato","Potato","Bean","Pea","Nuts","Onion","Carrot","Cabbage","Eggplant"), size=884)),
Amt = sample(seq(1, 40, by = 0.25), size = 884, replace = TRUE)
)
random <- sample(seq(1, 884, by = 1), size = 100, replace = TRUE) # to randomly introduce 100 NAs to Amt vector
df$Amt[random] <- NA
Summary code
dfsummary <- df %>%
dplyr::group_by(Food, Site) %>%
dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2), N = sum(!is.na(Amt))) %>%
ungroup()
ggplot code
p1 <- ggplot(df, aes(Amt, Food)) +
geom_boxplot() +
facet_grid(facets = . ~ Site)
Graph
I was expecting to see boxplots here.
Adding annotation
p2 <- p1 + geom_text(aes(y = 42, Food, label = paste("n=", N)), data = dfsummary, size = 3, nudge_x = 0.1) +
facet_grid(facets = . ~ Site)
Unfortunately, this doesn't work either.
Note
tidyverse version is 1.3.0
R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night"
To work out the problem, you may want to generate a scatter plot, first:
library(ggplot2)
p1 <- ggplot(df, aes(Amt, Food)) +
geom_point() +
facet_grid(facets = . ~ Site)
p1
As you can see it is impossible to generate a boxplot.
However, if you switch x and y
ggplot(df, aes(Food, Amt)) +
geom_boxplot() +
facet_grid(facets = . ~ Site)
You get:
This works just fine in the current development version of ggplot2, to be released in January 2020.
# If your ggplot2 version is <= 3.2.1, do:
# remotes::install_github("tidyverse/ggplot2")
library(tidyverse)
df <- data.frame(
Site = sample(rep(c("Home", "Office"), size = 884)),
Food = sample(rep(c("Banana","Apple","Egg","Berry","Tomato","Potato","Bean","Pea","Nuts","Onion","Carrot","Cabbage","Eggplant"), size=884)),
Amt = sample(seq(1, 40, by = 0.25), size = 884, replace = TRUE)
)
random <- sample(seq(1, 884, by = 1), size = 100, replace = TRUE) # to randomly introduce 100 NAs to Amt vector
df$Amt[random] <- NA
ggplot(df, aes(Amt, Food)) +
geom_boxplot() +
facet_grid(facets = . ~ Site)
#> Warning: Removed 98 rows containing non-finite values (stat_boxplot).
Created on 2020-01-01 by the reprex package (v0.3.0)
With annotations:
library(tidyverse)
df <- data.frame(
Site = sample(rep(c("Home", "Office"), size = 884)),
Food = sample(rep(c("Banana","Apple","Egg","Berry","Tomato","Potato","Bean","Pea","Nuts","Onion","Carrot","Cabbage","Eggplant"), size=884)),
Amt = sample(seq(1, 40, by = 0.25), size = 884, replace = TRUE)
)
random <- sample(seq(1, 884, by = 1), size = 100, replace = TRUE) # to randomly introduce 100 NAs to Amt vector
df$Amt[random] <- NA
dfsummary <- df %>%
dplyr::group_by(Food, Site) %>%
dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2), N = sum(!is.na(Amt))) %>%
ungroup()
ggplot(df, aes(Amt, Food)) +
geom_boxplot() +
geom_text(
aes(x = 42, Food, label = paste("n=", N)),
data = dfsummary,
size = 3, nudge_x = 0.1
) +
facet_grid(facets = . ~ Site)
#> Warning: Removed 95 rows containing non-finite values (stat_boxplot).
Created on 2020-01-01 by the reprex package (v0.3.0)
I have following data. Each observation is a genomic coordinate with copy number changes (copy.number.type) which is found in some percentage of samples (per.found).
chr<-c('1','12','2','12','12','4','2','X','12','12','16','16','16','5'
,'4','16','X','16','16','4','1','5','2','4','5','X','X','X','4',
'1','16','16','1','4','4','12','2','X','1','16','16','2','1','12',
'2','2','4','4','2','1','5','X','4','2','12','16','2','X','4','5',
'4','X','5','5')
start <- c(247123880,91884413,88886155,9403011,40503634,10667741,88914884,
100632615,25804205,25803542,18925987,21501823,21501855,115902990,
26120955,22008406,432498,22008406,22008406,69306802,4144380,73083197,
47743372,34836043,16525257,315832,1558229,51048657,49635818,239952709,
69727769,27941625,80328938,49136485,49136654,96076105,133702693,315823,
16725215,69728318,88520557,89832606,202205081,124379013,16045662,89836880,
49657307,97117994,76547133,35051701,344973,1770075,49139874,77426085,
9406416,69727781,108238962,151006944,49121333,6669602,89419843,74214551,
91203955,115395615)
type <- c('Inversions','Deletions','Deletions','Deletions','Deletions','Duplications','Deletions','Deletions',
'Duplications','Deletions','Duplications','Inversions','Inversions','Deletions','Duplications',
'Deletions','Deletions','Deletions','Deletions','Inversions','Duplications','Inversions','Inversions',
'Inversions','Deletions','Deletions','Deletions','Insertions','Deletions','Inversions','Inversions',
'Inversions','Inversions','Deletions','Deletions','Inversions','Deletions','Deletions','Inversions',
'Inversions','Deletions','Deletions','Deletions','Insertions','Inversions','Deletions','Deletions',
'Deletions','Inversions','Deletions','Duplications','Inversions','Deletions','Deletions','Deletions',
'Inversions','Deletions','Inversions','Deletions','Inversions','Inversions','Inversions','Deletions','Deletions')
per.found <- c(-0.040,0.080,0.080,0.040,0.080,0.040,0.080,0.040,0.040,0.120,0.040,-0.080,-0.080,0.040,0.040,0.120,
0.040,0.120,0.120,-0.040,0.011,-0.011,-0.023,-0.023,0.011,0.023,0.011,0.011,0.011,-0.011,-0.034,
-0.011,-0.023,0.011,0.011,-0.011,0.023,0.023,-0.023,-0.034,0.011,0.023,0.011,0.011,-0.023,0.023,
0.011,0.011,-0.011,0.011,0.011,-0.023,0.011,0.057,0.011,-0.034,0.023,-0.011,0.011,-0.011,-0.023,
-0.023,0.011,0.011)
df <- data.frame(chromosome = chr, start.coordinate = start, copy.number.type = type, per.found = per.found )
I would like to create a line plot. I created a plot using ggplot (facets), but the problem is I can not connect the points between two facets. Is there any way to do that. I do not necessarily need to use facets if there is a way to annotate x axis scales by chromosome. In the following image the dotted line shows what I would like to have for all copy.number.type lines.
EDIT: Looking for simplified approach.
library(ggplot2)
ggplot(df, aes(x=start.coordinate,y=per.found, group=copy.number.type, color=copy.number.type))+
geom_line()+
geom_point()+
facet_grid(.~chromosome,scales = "free_x", space = "free_x")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Desired output: As shown by the red dashed lines. I want to connect all the border points with a dashed line across facets.
Note: it may not make sense to connect the lines between the chromosomes.
But here is one way, by avoiding facets:
library(dplyr)
df2 <- df %>%
mutate(chromosome = factor(chromosome, c(1, 2, 4, 5, 12, 16, 'X'))) %>%
arrange(chromosome, start.coordinate)
chromosome_positions <- df2 %>%
group_by(chromosome) %>%
summarise(start = first(start.coordinate), end = last(start.coordinate)) %>%
mutate(
size = end - start,
new_start = cumsum(lag(size, default = 0)),
new_end = new_start + size
)
df3 <- df2 %>%
left_join(chromosome_positions, 'chromosome') %>%
mutate(new_x = start.coordinate + (new_start - start))
ggplot(df3, aes(x=new_x,y=per.found, group=copy.number.type, color=copy.number.type))+
geom_rect(
aes(xmin = new_start, xmax = new_end, ymin = -Inf, ymax = Inf, fill = chromosome),
chromosome_positions, inherit.aes = FALSE, alpha = 0.3
) +
geom_line() +
geom_point() +
geom_text(
aes(x = new_start + 0.5 * size, y = Inf, label = chromosome),
chromosome_positions, inherit.aes = FALSE, vjust = 1
) +
scale_fill_manual(values = rep(c('grey60', 'grey90'), 10), guide = 'none') +
scale_x_continuous(expand = c(0, 0))