Related
I am working on a boxplot with points overlayed and lines connecting the points between two time sets, example data provided below.
I have two questions:
I would like the points to look like this, with just a little height jitter and more width jitter. However, I want the points to be symmetrically centered around the middle of the boxplot on each y axis label (to make the plots more visually pleasing). For example, I would like the 6 datapoints at y = 4 and x = "after to be placed 3 to the right of the boxplot center and 3 to the left of the center, at symmetrical distances from the center.
Also, I want the lines to connect with the correct points, but now the lines start and end in the wrong places. I know I can use position = position_dodge() in geom_point() and geom_line() to get the correct positions, but I want to be able to adjust the points by height also (why do the points and lines align with position_dodge() but not with position_jitter?).
Are these to things possible to achieve?
Thank you!
examiner <- rep(1:15, 2)
time <- rep(c("before", "after"), each = 15)
result <- c(1,3,2,3,2,1,2,4,3,2,3,2,1,3,3,3,4,4,5,3,4,3,2,2,3,4,3,4,4,3)
data <- data.frame(examiner, time, result)
ggplot(data, aes(time, result, fill=time)) +
geom_boxplot() +
geom_point(aes(group = examiner),
position = position_jitter(width = 0.2, height = 0.03)) +
geom_line(aes(group = examiner),
position = position_jitter(width = 0.2, height = 0.03), alpha = 0.3)
I'm not sure that you can satisfy both of your questions together.
You can have a more "symmetric" jitter by using a geom_dotplot, as per:
ggplot(data, aes(time, result, fill=time)) +
geom_boxplot() +
geom_dotplot(binaxis="y", aes(x=time, y=result, group = time),
stackdir = "center", binwidth = 0.075)
The problem is that when you add the lines, they will join at the original, un-jittered points.
To join jittered points with lines that map to the jittered points, the jitter can be added to the data before plotting. As you saw, jittering both ends up with points and lines that don't match. See Connecting grouped points with lines in ggplot for a better explanation.
library(dplyr)
data <- data %>%
mutate(result_jit = jitter(result, amount=0.1),
time_jit = jitter(case_when(
time == "before" ~ 2,
time == "after" ~ 1
), amount=0.1)
)
ggplot(data, aes(time, result, fill=time)) +
geom_boxplot() +
geom_point(aes(x=time_jit, y=result_jit, group = examiner)) +
geom_line(aes(x=time_jit, y=result_jit, group = examiner), alpha=0.3)
Result
It is possible to extract the transformed points from the geom_dotplot using ggplot_build() - see Is it possible to get the transformed plot data? (e.g. coordinates of points in dot plot, density curve)
These points can be merged onto the original data, to be used as the anchor points for the geom_line.
Putting it all together:
library(dplyr)
library(ggplot2)
examiner <- rep(1:15, 2)
time <- rep(c("before", "after"), each = 15)
result <- c(1,3,2,3,2,1,2,4,3,2,3,2,1,3,3,3,4,4,5,3,4,3,2,2,3,4,3,4,4,3)
# Create a numeric version of time
data <- data.frame(examiner, time, result) %>%
mutate(group = case_when(
time == "before" ~ 2,
time == "after" ~ 1)
)
# Build a ggplot of the dotplot to extract data
dotpoints <- ggplot(data, aes(time, result, fill=time)) +
geom_dotplot(binaxis="y", aes(x=time, y=result, group = time),
stackdir = "center", binwidth = 0.075)
# Extract values of the dotplot
dotpoints_dat <- ggplot_build(dotpoints)[["data"]][[1]] %>%
mutate(key = row_number(),
x = as.numeric(x),
newx = x + 1.2*stackpos*binwidth/2) %>%
select(key, x, y, newx)
# Join the extracted values to the original data
data <- arrange(data, group, result) %>%
mutate(key = row_number())
newdata <- inner_join(data, dotpoints_dat, by = "key") %>%
select(-key)
# Create final plot
ggplot(newdata, aes(time, result, fill=time)) +
geom_boxplot() +
geom_dotplot(binaxis="y", aes(x=time, y=result, group = time),
stackdir = "center", binwidth = 0.075) +
geom_line(aes(x=newx, y=result, group = examiner), alpha=0.3)
Result
I have a dataset at the municipality level. I would like to draw a histogram of a given variable and, at the same time, fill the bars with another continuous variable (using a color gradient). This is because I believe the municipalities with low values of the variable I am plotting the histogram for have very different population size (on average) when comparing with the municipalities that are in the upper end of the distribution.
Using the mtcar data, say I would like to plot the distribution of mpg and fill the bars with a continuous color to represent the mean of the variable wt for each of the histogram bars. I typed the code below but I don't know how to actually make the fill option take the average of wt. I would want a legend to show up with a color gradient so as to inform if the mean value of wt for each histogram bar is low-medium-high in relative terms.
mtcars %>%
ggplot(aes(x=mpg, fill=wt)) +
geom_histogram()
If you want a genuine histogram you need to transform your data to do this by summarizing it first, and plot with geom_col rather than geom_histogram. The base R function hist will help you here to generate the breaks and midpoints:
library(ggplot2)
library(dplyr)
mtcars %>%
mutate(mpg = cut(x = mpg,
breaks = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$breaks,
labels = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$mids)) %>%
group_by(mpg) %>%
summarize(n = n(), wt = mean(wt)) %>%
ggplot(aes(x = as.numeric(as.character(mpg)), y = n, fill = wt)) +
scale_x_continuous(limits = c(0, 40), name = "mpg") +
geom_col(width = 10) +
theme_bw()
It is not a histogram exactly, but was the closest that I could think for your problem
library(tidyverse)
mtcars %>%
#Create breaks for mpg, where this sequence is just an example
mutate(mpg_cut = cut(mpg,seq(10,35,5))) %>%
#Count and mean of wt by mpg_cut
group_by(mpg_cut) %>%
summarise(
n = n(),
wt = mean(wt)
) %>%
ggplot(aes(x=mpg_cut, fill=wt)) +
#Bar plot
geom_col(aes(y = n), width = 1)
I have created an stacked barplot with the counts of a variables. I want to keep these as counts, so that the different bar sizes represent different group sizes. However, inside the bar plot i would like to add labels that show the proportion of each stack - in terms of percentage.
I managed to create the stacked plot of count for every group. Also I have created the labels and they are are placed correctly. What i struggle with is how to calculate the percentage there?
I have tried this, but i get an error:
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n())
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = ifelse((..count..)==0,"",scales::percent((..count..)/sum(..count..)))), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()
Error in (count) == 0 : comparison (1) is possible only for atomic
and list types
desired result:
well, just found answer ... or workaround. Maybe this will help someone in the future: calculate the percentage before the ggplot and then just just use that vector as labels.
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n()) %>%
dplyr::mutate(pct = paste0((round(N/sum(N)*100, 2))," %"))
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = dataex$pct), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()
I'm making a graph of the expression of multiple genes among multiple subjects, displaying the data points and smoothed conditional means with the respective confidence intervals, but the points and lines are obscured by the fill of the confidence intervals. Is there a way to put the points and lines back on the first plane or make the confidence interval fill lighter, to make the points and lines more visible?
data1
library(forcats)
library(ggplot2)
library(tidyr)
tbl_long <- data1 %>%
gather(gene, expression, -X)
tbl_long %>%
ggplot(aes(x = fct_inorder(X), y = expression, color = gene, group = gene)) +
geom_point() +
geom_smooth(aes(fill=gene)) +
theme_classic()
I`m a begginer R user, so any help would be much appreciated
library(dplyr)
library(forcats)
library(ggplot2)
library(readr)
library(tidyr)
"X,ALDOA,ALDOC,GPI,GAPDHS,LDHA,PGK1,PKLR
C1,-0.643185598,-0.645053078,-0.087097464,-0.343085671,-0.770712771,0.004189881,0.088937264
C2,-0.167424935,-0.414607255,0.049551335,-0.405339423,-0.182211808,-0.127414498,-0.313125427
C3,-0.81858642,-0.938110755,-1.141371324,-0.212165875,-0.582733509,-0.299505078,-0.417053296
C4,-0.83403929,-0.36359332,-0.731276681,-1.173581357,-0.42953985,-0.14434282,-0.861271021
C5,-0.689384044,-0.833311409,-0.622961915,-1.13983245,0.479864518,-0.353765462,-0.787467172
C6,-0.465153207,-0.740128773,-0.05430084,0.499455778,-0.692945684,-0.215067456,-0.460695935
S2,0.099525323,0.327565645,-0.315537278,0.065457821,0.78394394,0.189251447,0.11684847
S3,0.33216583,0.190001824,0.749459725,0.224739679,-0.138610536,-0.420150288,0.919318891
S4,0.522281547,0.278411886,1.715325626,0.534957031,1.130054777,-0.129296273,1.803756399
S5,0.691225088,0.665540011,1.661124529,0.662320212,0.267803229,0.853683613,1.105808889
S6,1.269616976,1.86390714,2.069219749,1.312324149,1.498836807,1.794147633,0.842335285
S7,1.254166133,1.819075004,0.44893804,0.438435159,0.482694339,0.446939822,0.802671992
S8,0.751743085,0.702057721,0.657752337,1.668582798,-0.186354601,1.214976683,0.287904556
S9,0.091028475,-0.214746307,0.037471169,-0.90747123,-0.172209571,0.062382102,0.136354703
S10,1.5792826,1.736452158,0.194961866,0.706323594,1.396245579,0.208168636,0.883114282
R2,-0.36289097,-0.252649755,0.026497148,-0.026676693,-0.720750516,-0.087657548,0.390400605
R3,0.106992251,0.290831853,-0.815393104,-0.020562949,-0.579128953,-0.222087138,0.603723294
R4,0.208230649,0.533552023,-0.116632671,1.126588341,-0.09646495,0.157577458,-0.402493353
R5,-0.10781116,0.436174594,-0.969979695,-1.298192703,0.541570124,-0.07591813,-0.704663307
R6,-0.282867322,-0.960902616,0.184185506,-1.215118472,0.856165556,-0.256458847,-1.528611038
R7,-0.300331377,-0.918484952,0.191947526,-0.895049036,1.200294702,0.7120941,-0.047383224
R8,0.278804568,-0.07335879,0.300083636,0.37631121,-0.288228181,0.427576413,0.631281194
R9,0.393632652,0.228379711,-0.201269856,1.731887958,0.141541807,0.242716283,0.154875397
R10,0.731821818,0.058779515,-0.310899832,0.578285435,-0.474621274,0.126920851,0.017104493" %>%
read_csv() -> tbl_wide
tbl_long <- tbl_wide %>%
gather(gene, expression, -X)
tbl_long %>%
ggplot(aes(x = fct_inorder(X), y = expression, color = gene, fill = gene, group = gene)) +
geom_smooth(method = "loess", alpha = 0.1) +
geom_point() +
labs(x = "Location",
y = "Expression",
color = "Gene",
fill = "Gene") +
theme_classic()
I need to overlay normal density curves on 3 histograms sharing the same y-axis. The curves need to be separate for each histogram.
My dataframe (example):
height <- seq(140,189, length.out = 50)
weight <- seq(67,86, length.out = 50)
fev <- seq(71,91, length.out = 50)
df <- as.data.frame(cbind(height, weight, fev))
I created the histograms for the data as:
library(ggplot)
library(tidyr)
df %>%
gather(key=Type, value=Value) %>%
ggplot(aes(x=Value,fill=Type)) +
geom_histogram(binwidth = 8, position="dodge")
I am now stuck at how to overlay normal density curves for the 3 variables (separate curve for each histogram) on the histograms that I have generated. I won't mind the final figure showing either count or density on the y-axis.
Any thoughts on how to proceed from here?
Thanks in advance.
I believe that the code in the question is almost right, the code below just uses the answer in the link provided by #akrun.
Note that I have commented out the call to facet_wrap by placing a comment char before the last plus sign.
library(ggplot2)
library(tidyr)
df %>%
gather(key = Type, value = Value) %>%
ggplot(aes(x = Value, color = Type, fill = Type)) +
geom_histogram(aes(y = ..density..),
binwidth = 8, position = "dodge") +
geom_density(alpha = 0.25) #+
facet_wrap(~ Type)