ggplot2: fit geom_smooth() like categorical variable were continuous - r

I have a continuous variable on y, and a categorical on x axis. At the categorical variable the order makes sense, and it would make sense to fit a regression by its index, I mean instead of c('a', 'b', 'c') use the indices (order(c('a', 'b', 'c')), which is c(1, 2, 3)), and fit the model against this. However, ggplot rejects to fit a geom_smooth(method = lm) if one variable is not numeric. Ok, then I can tell it that use the order:
geom_smooth(aes(x = order(hgcc), y = rtmean), method = lm)
But then it takes the indices of the whole column from the data frame, which is not good with faceting with scales = 'free', when only a subset of the levels of the x variable appears on one plot. The indexes in the whole dataframe are much higher in average, so the regression will be plotted far on the right:
Here is a minimal working example:
require(ggplot2)
load(url('http://www.ebi.ac.uk/~denes/54b510889336eb2591d8beff/sample_data.RData'))
ggplot(adata12cc, aes(x = hgcc, y = rtmean, color = cls, size = log10(intensity))) +
geom_point(stat = 'sum', alpha = 0.33) +
geom_smooth(
aes(x = order(hgcc), y = rtmean),
method = 'glm') +
facet_wrap( ~ uhgroup, scales = 'free') +
scale_radius(guide = guide_legend(title = 'Intensity (log)')) +
scale_color_discrete(guide = guide_legend(title = 'Class')) +
xlab('Carbon count unsaturation') +
ylab('Mean RT [min]') +
ggtitle('RT vs. carbon count & unsaturation by headgroup') +
theme(axis.title = element_text(size = 24),
axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1),
axis.text.y = element_text(size = 11),
plot.title = element_text(size = 21),
strip.text = element_text(size = 18),
panel.grid.minor.x = element_blank())
I know this is not the nice way of doing things, but ggplot could make life so much easier, if I could refer to those variables and do something with them which are subsetted anyways by faceting.

I think I've got a solution, but I'm not sure what you want...
The Main problem is that your x value label, is already split by uhgroup
If you look at the factor they are PC-O(38.7) PC(38.7 etc...
So the first thing is too create a new hgcc value for the x axis.
adata12cc$hgcc_value <-as.factor(substr(adata12cc$hgcc, (nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])-5), nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])))
Then another problem is that you have different x axis for geom_point and geom_smooth. One is hgcc, the other is order(hgcc_value).
The solution is to use the same value, here I use as.numeric(hgcc_value) (instead of order()) and to precise in scale_x_continuous the label of the breaks.
ggplot(adata12cc, aes(x = as.numeric(hgcc_value), y = rtmean, color = cls, size = log10(intensity))) +
geom_point(stat = 'sum', alpha = 0.33) +
geom_smooth(
aes(x = as.numeric(hgcc_value), y = rtmean),
method = 'glm') +
facet_wrap( ~ uhgroup, scales = 'free') +
scale_radius(guide = guide_legend(title = 'Intensity (log)')) +
scale_color_discrete(guide = guide_legend(title = 'Class')) +
scale_x_continuous(name = "Carbon count unsaturation",
breaks=as.numeric(adata12cc$hgcc_value),
labels = adata12cc$hgcc_value,
minor_breaks = NULL)+
ylab('Mean RT [min]') +
ggtitle('RT vs. carbon count & unsaturation by headgroup') +
theme(axis.title = element_text(size = 24),
axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1),
axis.text.y = element_text(size = 11),
plot.title = element_text(size = 21),
strip.text = element_text(size = 18),
panel.grid.minor.x = element_blank())
Is it what you were looking for?

Related

Dodge failing in violin plot

I would like to plot the congruence effects (incongruent minus congruent) as a violin plot per combination of stimulus age and response type. This is what my code looks like so far. I am not yet satisfied with the representation. How can I change it so that for each of the four conditions (adult frown, adult smile, child frown, child smile) I get the corresponding violin plot horizontally next to each other? Thanks in advance for the help. Attached is the code and an excerpt from the data frame.
violin plot
dataset$congruency_effect <- ifelse(dataset$congruency == "congruent", dataset$avgAmplitude, -dataset$avgAmplitude)
p <- ggplot(dataset, aes(x = stimulusResponse, y = congruency_effect, fill = congruency_effect, group = stimulusAge)) +
geom_violin() +
geom_point(position = position_dodge(width = 0.75), size = 3, stat = "summary", fun.y = "mean") +
scale_fill_manual(values = c("#F8766D", "#00BFC4")) +
ggtitle("Conventional EEG 350-450 ms") +
scale_y_continuous(limits = c(-5, 5)) +
facet_wrap(~stimulusAge, scales = "free_x")
EEG_Conventional450_age_response <- p + theme(
# Set the plot title and axis labels to APA style
plot.title = element_text(face = "bold", size = 16),
axis.title = element_text(face = "bold", size = 14),
# Set the axis tick labels to APA style
axis.text = element_text(size = 12),
# Set the legend title and labels to APA style
legend.title = element_text(face = "bold", size = 14),
legend.text = element_text(size = 12),
# Set the plot and panel backgrounds to white
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "white")
)
EEG_Conventional450_age_response
excerpt data frame
several permutations of arguments in ggplot
This has to do with the grouping aesthetic. Remove it, and your plot works.
library(ggplot2)
set.seed(42)
dataset <- data.frame(stimulusResponse = rep(c("frown", "smile"), each = 20),
congruency_effect = rnorm(40),
stimulusAge = rep(c("baby", "adult"), 20))
## removed group = stimulusAge
ggplot(dataset, aes(x = stimulusResponse, y = congruency_effect)) +
geom_violin() +
geom_point(position = position_dodge(width = 0.75), size = 3, stat = "summary") +
facet_wrap(~stimulusAge, scales = "free_x")

ggplot2 plot points when y value in NA

I have a dataframe where the x value (discrete) is present and want to include on the x-axis in the plot; however, its y value is NA
I still want to show the x value even though y is NA. Is there a way to do this in ggplot2?
Currently, it simply skips the first two rows that has the NA value.
ggplot(tChartDF()[['df']], aes(
x = factor(tChartDF()[['df']][['Rare event date']], levels = unique(tChartDF()[['df']][['Rare event date']])),
y = unlist(tChartDF()[['df']][['days_between']]),
)) +
geom_hline(yintercept = unlist(tChartDF()[['timeScaleCL']]), color = input$tChartCLColour, lwd = input$tChartCLWidth) +
geom_hline(yintercept = unlist(tChartDF()[['timeScaleUL']]), linetype = 'dashed', lwd = 1, color = 'red') +
geom_hline(yintercept = unlist(tChartDF()[['timeScaleLL']]), linetype = 'dashed', lwd = 1, color = 'red') +
scale_x_discrete(expand = c(0,0)) +
theme_classic() +
geom_line(aes(group = 1), lwd = input$tChartLineWidth, color= input$tChartLineColour) +
geom_point(size = input$tChartMarkerSize, color = input$tChartMarkerColour) +
labs(title = input$tChartPlotTitle, x = input$tChartPlotXLabel, y = input$tChartPlotYLabel) +
theme(
plot.title = element_text(size = 24, face = 'bold', family = 'Arial', hjust = 0.5),
plot.margin = margin(0, 1, 0, 0, "cm"),
axis.title = element_text(size = 20, face = 'bold', family = 'Arial'),
axis.text = element_text(size = 16, face = 'bold', family = 'Arial'),
axis.text.x = element_text(angle = as.numeric(input$tChartXOrientation), vjust = 0.5),
axis.ticks.length = unit(.25, 'cm'),
) +
coord_cartesian(clip = 'off')
As seen, it only starts plotting at date: 2022/12/15 (ignoring the previous values in the table) columns y and mr have the NA values.
For the plot, I only care about the first two columns (Rare events and days_between). I tried selecting only those two columns and plotting but it still ignores the first two rows.
Desired result:
If we start with something like
mt <- mtcars
mt$mpg[c(3,6,9)] <- NA
and plot the line as in
ggplot(mt, aes(disp, mpg)) +
geom_line()
we don't see the missing points (not a surprise). We can add them this way:
transform(mt, mpg2 = ifelse(is.na(mpg), max(mpg, na.rm = TRUE) + 1, mpg)) |>
ggplot(aes(disp, mpg)) +
geom_line() +
geom_point(aes(y = mpg2), data = ~ subset(., is.na(mpg)), shape = 1)
This can easily be adapted to be at the bottom, using different shapes/colors, perhaps even on an explicitly-gray background (top/bottom ribbon).

Issue with ggplot barchart with multiple subgroups

I am trying to adapt the approach from (ggplot2 multiple sub groups of a bar chart) but something is not as it should be.
The code is:
library(grid)
MethodA= rep(c("ARIMA"), 6)
MethodB=rep(c("LSTM"), 6)
MethodC = rep(c("ARIMA-LSTM"),6)
MethodD=rep(c("SSA"),6)
Method=c(MethodA, MethodB, MethodC, MethodD)
Measure = rep(c("RMSE", "RMSE", "MAE", "MAE", "MAPE", "MAPE"), 4)
trtest=rep(c("train", "test"), 12)
Value=sample(x = 4000:7000, size = 24, replace = TRUE)
df2 <- data.frame(Method, Measure, trtest, Value)
dodge <- position_dodge(width = 0.9)
g1 <- ggplot(data = df, aes(x = interaction(Variety, Trt), y = yield, fill = factor(geno))) +
geom_bar(stat = "identity", position = position_dodge()) +
#geom_errorbar(aes(ymax = yield + SE, ymin = yield - SE), position = dodge, width = 0.2) +
coord_cartesian(ylim = c(0, 7500)) +
annotate("text", x = 1:6, y = - 10,
label = rep(c("Variety 1", "Variety 2", "Variety 3"), 2)) +
annotate("text", c(1.5, 3.5), y = - 20, label = c("Irrigated", "Dry")) +
theme_classic() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank())
# remove clipping of x axis labels
g2 <- ggplot_gtable(ggplot_build(g1))
g2$layout$clip[g2$layout$name == "panel"] <- "off"
grid.draw(g2)
The problem is aslo in a sequence that interaction function generates - the sequences are not by the order - ARIMA - RMSE, MAE, MAPE, then LSTM - RMSE, MAE, MAPE ...
I would appreciate for any help.
Best,
Nikola
Instead of using interaction, it might be a lot clearer if you use facets.
Note that your example is not reproducible (your sample data has different variable names from the ones you use in your plotting code, so I had to guess which you meant to substitute):
ggplot(data = df2, aes(x = Measure, y = Value, fill = trtest)) +
geom_bar(stat = "identity", position = position_dodge()) +
coord_cartesian(ylim = c(0, 7500)) +
facet_grid(.~Method, switch = 'x') +
theme_classic() +
theme(strip.placement = 'outside',
strip.background = element_blank(),
strip.text = element_text(face = 'bold', size = 16),
panel.spacing.x = unit(0, 'mm'),
panel.border = element_rect(fill = NA, color = 'gray'))

Log scale on y axis but data have negative values

I am trying to create a boxplot with a log y axis as I have some very small values and then some much higher values which do not work well in a boxplot with a continuous y axis. However, I have negative values which obviously do not work with a log scale. I was wondering if there was a way around this so that I can display my data on a boxplot which is still easy to interpret but has a more appropriate scale on the y axis.
p <- ggplot(data = Elstow.monthly.fluxes, aes(x = Month1, y = CH4.Flux)) + stat_boxplot(geom = "errorbar", linetype = 1, width = 0.5) + geom_boxplot() +
xlab(expression("Month")) + ylab(expression(~CH[4]~Flux~(µg~CH[4]~m^{-2}~d^{-1}))) +
scale_y_continuous(breaks = seq(-5000,40000,5000), limits = c(-5000,40000))+
theme(axis.text.x = element_text(colour = "black")) + theme(axis.text.y = element_text(colour =
"black")) +
theme(panel.background = element_rect("white", "black")) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
theme(axis.text = element_text(size = 12))+ theme(axis.title = element_text(size = 14))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0))) +
theme(axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0))) +
geom_hline(yintercept = 0, linetype ="dashed", colour = "black")
While you could indeed use the secondary axis to get the labels you want as Zhiqiang suggests, you could also use a transformation that fits your needs.
Consider the following skewed boxplots:
df <- data.frame(
x = rep(letters[1:2], each = 500),
y = rlnorm(1000) - 2
)
ggplot(df, aes(x, y)) +
geom_boxplot()
Instead, you could use the pseudo-log transformation to visualise your data:
ggplot(df, aes(x, y)) +
geom_boxplot() +
scale_y_continuous(trans = scales::pseudo_log_trans())
Alternatively, you could make any transformation you want. I personally like the inverse hyperbolic sine transformation, which is very much like the pseudo-log:
asinh_trans <- scales::trans_new(
"inverse_hyperbolic_sine",
transform = function(x) {asinh(x)},
inverse = function(x) {sinh(x)}
)
ggplot(df, aes(x, y)) +
geom_boxplot() +
scale_y_continuous(trans = asinh_trans)
I have a silly solution: trick the secondary axis to re-scale y axis. I do not have your data, just made up some numbers for the purpose of demonstration.
First convert y values as logy = log(y + 5000). When generating the graph, transform the values back to the original scale. I borrow the second axis to display the values. I am pretty sure others may have more elegant ways to do this.
I was lazy for not trying to find the right way to remove the primary y axis tick labels, just used breaks = c(0).
df<-data.frame(y = runif(33, min=-5000, max=40000),
x = rep(c("Aug", "Sep", "Oct"),33))
library(tidyverse)
df$logy = log(df$y+5000)
p <- ggplot(data = df, aes(x = x, y = logy)) +
stat_boxplot(geom = "errorbar", linetype = 1, width = 0.5) +
geom_boxplot() +
xlab(expression("Month")) +
ylab(expression(~CH[4]~Flux~(µg~CH[4]~m^{-2}~d^{-1}))) +
scale_y_continuous(sec.axis = sec_axis(~(exp(.) -5000),
breaks = c(-4000, 0, 5000, 10000, 20000, 40000)),
breaks = c(0))+
theme(axis.text.x = element_text(colour = "black")) +
theme(axis.text.y = element_text(colour = "black")) +
theme(panel.background = element_rect("white", "black")) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
theme(axis.text = element_text(size = 12))+
theme(axis.title = element_text(size = 14))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0))) +
theme(axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0))) +
geom_hline(yintercept = log(5000), linetype ="dashed", colour = "black")
p
coord_trans() is applied after the statistics are calculated (unlike scale). This can be combined with the pseudo_log_trans to cope with negatives.
library(plotly)
set.seed(1234)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), rating = c(rnorm(200),rnorm(200, mean=500)))
pseudoLog <- scales::pseudo_log_trans(base = 10)
p <- ggplot(dat, aes(x=cond, y=rating)) + geom_boxplot() + coord_trans(y=pseudoLog)

Add horizontal quantile lines to scatter plot ggplot2 R

I have eg data below
eg_data <- data.frame(
period = c(sample( c("1 + 2"), 1000, replace = TRUE)),
max_sales = c(sample( c(1:10), 1000, replace = TRUE, prob =
c(.05, .10, .15, .25, .25, .10, .05, .02, .02, .01)))
I want to make a scatter (jitter, actually) plot and add horizontal lines at different points along the y-axis. I want to be able to customize the percentiles at which I add the lines, but for now, something like R's summary function would work just fine.
summary(eg_data$max_sales)
I have the code for a jitter plot below. It runs and produces the graph, but I keep getting the error message:
Each group consists of only one observation. Do you need to adjust the
group aesthetic?
jitter <- (
(ggplot(data = eg_data, aes(x=period, y=max_sales, group = 1)) +
geom_jitter(stat = "identity", width = .15, color = "blue", alpha = .4)) +
scale_y_continuous(breaks= seq(0,12, by=1)) +
geom_line(stat = 'summary', fun.y = "quantile", fun.args=list(probs=0.1)) +
ggtitle("Distribution of Sales by Period") + xlab("Period") + ylab("Sales") +
theme(plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(color = "black", size = 12, face = "bold"),
axis.title.y = element_text(color = "black", size = 12, face = "bold")) +
labs(fill = "Period") )
jitter
I tried looking at this question -
ggplot2 line chart gives "geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
It suggests making all variables numeric. My period variable is a character, I'd like to keep it that way, but even when I convert it to numeric, it still gives me the error.
Any help would be appreciated. Thank you!
Instead of geom_line what you want is geom_hline. In particular, replacing geom_line with
stat_summary(fun.y = "quantile", fun.args = list(probs = c(0.1, 0.2)),
geom = "hline", aes(yintercept = ..y..))
gives
where indeed
quantile(eg_data$max_sales, c(0.1, 0.2))
# 10% 20%
# 2 3
It also eliminates the warning you were getting.
I don't know if this is the most elegant solution, but you can always calculate the summary statistics elsewhere and put it in the plot. This also gives a bit more control over what is happening (for my taste)
hline_coordinates= data.frame(Quantile_Name=names(summary(eg_data$max_sales)),
quantile_values=as.numeric(summary(eg_data$max_sales)))
jitter <- (
(ggplot(data = eg_data, aes(x=period, y=max_sales)) + #removed group=1
geom_jitter(stat = "identity", width = .15, color = "blue", alpha = .4)) +
scale_y_continuous(breaks= seq(0,12, by=1)) +
geom_hline(data=hline_coordinates,aes(yintercept=quantile_values)) +
ggtitle("Distribution of Sales by Period") + xlab("Period") + ylab("Sales") +
theme(plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(color = "black", size = 12, face = "bold"),
axis.title.y = element_text(color = "black", size = 12, face = "bold")) +
labs(fill = "Period") )
jitter

Resources