Plot timeseries and regression line for two groups of data - r

I have data from two sites across years (note the differences in sampling years). A sample is below:
df<- data.frame( year= c(seq(1997,2016,1), seq(2001,2017,1)),
site= c(rep("cr", 20),rep("ec", 17)),
mean= sample(1:50,37))
I would like to make a time series-like graph of mean for each year. Each data point would be connected (in the typical zig-zag fashion of time-series graphs) and then a regression line is superimposed to indicate the trend. I have created a time series-like plot using ggplot (I do not mind a solution from base package), but I am having trouble superimposing a dashed-regression line for each site without error.
Here is the code I have tried:
f1 <- ggplot(data = df, aes(x = year, y = mean, group= site, color=
site))+
geom_line(aes(color=site)) +
geom_point( aes(color=site),size=0.5)+
geom_smooth(method = "lm", se = FALSE, size= 0.5, aes(fill=site,
linetype= 2 ))+
scale_linetype_manual(values=c("solid", "solid"))+
scale_color_manual(values=c("#CC0000", "#000000"))+
theme_minimal()+
scale_x_continuous("Year",limits = c(1997, 2020), breaks =
seq(1995,2020,5)) +
scale_y_continuous("Mean Monthly Abundance", limits = c(0, 1500),
breaks=seq(0, 1500, by = 100)) +
theme_bw()+
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank())
f1
A few details I would like this graph to illustrate:
Each group (site) will have a different color (black ,red) for the points and the line connecting each point
The regression lines for each group (site) will be dashed and match the color specified above.
The regression lines should NOT extend to the y-axis and be limited to the length the of the data
-Points do not need to be visible. Only the line connecting each point should be visible.
Preferably the dashed regression line will NOT display the shaded 95% CI.

As #kath stated, adding linetype = "dashed" would fix it. I've made some minor modifications to the code as well:
ggplot(data = df, aes(x = year, y = mean, group= site, color = site))+
geom_line() +
geom_point(size=0.5)+
geom_smooth(method = "lm", se = FALSE, size= 0.5, linetype = "dashed")+
scale_color_manual(values=c("#CC0000", "#000000"))+
theme_minimal()+
scale_x_continuous("Year",limits = c(1997, 2020), breaks =
seq(1995,2020,5)) +
scale_y_continuous("Mean Monthly Abundance", limits = c(0, 1500),
breaks=seq(0, 1500, by = 100)) +
theme_bw()+
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank())

Related

plot TOTAL errorbar for multiple lines in ggplot r

I would like to plot the data by subject but adding the errorbar of the total mean and se. I mean, not an error bar for each subject. I've tried geom_errorbar and stat_summary but still failed to get my ideal plot (see the figure I drew).
and here is the code I used to draw this figure (the errorbars are added by hand).
ggplot(ASD, aes(x=period, y=meanF0, group=subject, color=group)) +
geom_line(aes(color=group, size=group)) +
scale_size_manual(values=c(.6, .6, .6, .6)) +
theme_light()+
xlab("Period")+
ylab("F0 (Hz)")+
ggtitle("Mean F0 Adjustment (ASD Group)") +
geom_point()+
scale_color_manual(values=c("red")) +
theme(plot.title = element_text(size=14.5, face="bold", hjust = 0.5, family = "serif"),
axis.title.y= element_text(size=12, face = "bold", family = "serif"),
axis.title.x= element_text(size=12, face = "bold", family = "serif"),
axis.text.x = element_text(size=11, face="bold", family = "serif"),
axis.text.y = element_text(size=11, face="bold", family = "serif"))+
theme(legend.position = "none")+
geom_hline(yintercept=112.8, linetype="dashed",
color = "dark grey", size=.7)
Anyone could help? Thank you very much!!!
Use annotate to add the error bars. I don't have your data, so I created my own. You're going to need the confidence interval and the average for each group. My average-by-group values and confidence interval-by-group are stored in df4$meanV and df4$ci. You can replace these with your variable names. In annotate, you'll include the data frame in the call like you would in base R plots. Like base R, you can just use raw values, as well. Multiple values can be joined with c(). As in y = c(12, 10). If you have any questions, just let me know.
ggplot(df2, aes(x = condition, y = value,
color = subject, group = subject)) +
geom_line() + geom_point() +
annotate("errorbar",
x = df4$condition
ymin = df4$meanV - df4$ci,
ymax = df4$meanV + df4$ci,
width = .2) +
annotate("point",
x = df4$condition,
y = df4$meanV) +
ylim(min(df2$value), max(df2$value))

facet_wrap text labelling issues with stat_fit_glance

I am wondering why the text is trending higher in the plots... it won't stay put with the facet_wrap or facet_grid. In a more complex dataset plot, the text is illegible because of the overlap.
Below is data and code to reproduce the plot and issue. Adding geom="text" to stat_fit_glance, results in Error: Discrete value supplied to continuous scale .
library(ggpmisc)
library(ggplot2)
DF <- data.frame(Site = rep(LETTERS[20:24], each = 4),
Region = rep(LETTERS[14:18], each = 4),
time = rep(LETTERS[1:10], each = 10),
group = rep(LETTERS[1:4], each = 10),
value1 = runif(n = 1000, min = 10, max = 15),
value2 = runif(n = 1000, min = 100, max = 150))
DF$time <- as.numeric(DF$time)
formula1 <- y~x
plot1 <- ggplot(data=DF,
aes(x=time, y= value2,group=Site)) +
geom_point(col="gray", alpha=0.5) +
geom_line(aes(group=Site),col="gray", alpha=0.5) +
geom_smooth(se=F, col="darkorange", alpha=0.8, fill="orange",
method="lm",formula=formula1) +
theme_bw() +
theme(strip.text.x = element_text(size=10),
strip.text.y = element_text(size=10, face="bold", angle=0),
strip.background = element_rect(colour="black", fill="gray90"),
axis.text.x = element_text(size=10), # remove x-axis text
axis.text.y = element_text(size=10), # remove y-axis text
axis.ticks = element_blank(), # remove axis ticks
axis.title.x = element_text(size=18), # remove x-axis labels
axis.title.y = element_text(size=25), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank()) +
labs(y="", x="Year", title = "")+ facet_wrap(~group)
plot1 + stat_fit_glance(method = "lm", label.x="right", label.y="bottom",
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE)
When the position of the labels is set automatically, the npcy position is increased for each level in the grouping variable. You map Site to the group aesthetic, as Site has 5 levels unevenly appearing in different facets, the rather crude algorithm in 'ggpmisc' positions the labels unevenly: the five rows correspond one to each of the five Sites. I have changed the mapping to use colour so that this becomes more obvious. I have also deleted all code that is irrelevant to this question.
plot1 <- ggplot(data=DF,
aes(x=time, y= value2, color=Site)) +
geom_smooth(se=F, alpha=0.8,
method="lm",formula=formula1) +
facet_wrap(~group)
plot1 +
stat_fit_glance(method = "lm", label.x="right", label.y="bottom",
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE) +
expand_limits(y = 110)
To use fixed positions one can pass the npcy coordinates if using the default "geom_text_npcy()" or passing data coordinates and using "geom_text()". One position corresponds to each level of the grouping factor Site. If the vector is shorter, it is recycled. Of course to fit more labels you can reduce the size of the text and add space by expanding the plotting area. In any case, in practice, you will need to indicate in a way or another which estimates correspond to which line.
plot1 +
stat_fit_glance(method = "lm", label.x="right", label.y= c(0.01, 0.06, 0.11, 0.01, 0.06),
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE, size = 2.5) +
expand_limits(y = 110)
Note: Error: Discrete value supplied to continuous scale when attempting to use
geom_text() is a bug in 'ggpmisc' that I fixed some days ago, but has not made it yet to CRAN (future version 0.3.3).

Dealing with factors in geom_pointrange in ggplot

I am trying to visualize some data that consist of odds ratios and confidence intervals for regions nested in countries. I am using the geom_pointrange option for that and it general it works very well.
My problem is that since the odds ratios (and upper confidence intervals) can get quite high values, the axes of the plot are stretched to accommodate for that. That has as a result that confidence intervals that lie between 0 and 1 do not appear clearly enough. One option I found through this community is to change the values into factors and the distance between them will be considered the same for every measurement. This works for the odds ratios (still need to tweak the axis tick marks) but when the values of lower and upper confidence intervals are involved, the position is totally wrong and the confidence intervals do not include the point estimate. I tried to solve this by including all values as levels of the factor, but this did not seem to solve the issue.
What i am trying to do is either to be able to "magnify" the area between 0 and 1 in the graph, while leaving the rest of the plot area unchanged or to manage to make ggplot to place the confidence intervals correctly around the odds ratios.
Below I include a simplified version of my data and the code I have been using for re-producibility.
dat <- data.frame(region = rep(LETTERS[1:5], 2),
country = rep(c("A1", "A2"), each = 5),
or = c(6.459578, 1.696221, 0.895115, 3.393235, 2.325510,
4.457805, 0.407111, 22.760861, 3.354883, 2.214915),
lower = c(5.768999699, 0.237062909, 0.347443105, 0.369881529,
0.010233696, 1.020315696, 0.004419494, 3.87391259,
0.808667764, 0.874415935),
upper = c(7.2328221, 12.1367207, 2.3060778, 31.1290104,
28.4497981, 19.4763489, 0.750188, 337.2960785,
13.9182469, 5.610429))
library(ggplot2)
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper))+
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip(ylim = c(0, 100))
# Change numeric variable into factors
f.levels <- c(dat$or, dat$lower, dat$upper)
f.levels <- unique(f.levels)
f.levels <- as.character(f.levels[order(f.levels)])
dat$or <- factor(dat$or, levels = f.levels)
dat$lower <- factor(dat$lower, levels = f.levels)
dat$upper <- factor(dat$upper, levels = f.levels)
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper))+
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip(ylim = c(0, 30))
I am relatively new to ggplot so please excuse any newbie mistakes.
Any suggestions on this problem are highly appreciated.
Thank you!
I think the standard solution for this problem is plotting the OR's in a log(10) scale. For a neat explanation see https://blogs.sas.com/content/iml/2015/07/29/or-plots-log-scale.html
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper)) +
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
scale_y_log10() + ### This is the line that makes the transfomation
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip()

move legend title in ggplot2

I have been trying to shift my legend title across to be centered over the legend contents using the guide function. I've been trying to use the following code:
guides(colour=guide_legend(title.hjust = 20))
I thought of trying to make a reproducable example, but I think the reason it's not working has something to do with the above line not matching the rest of my code specifically. So here is the rest of the code I'm using in my plot:
NH4.cum <- ggplot(data=NH4_by_Date, aes(x=date, y=avg.NH4, group = CO2, colour=CO2)) +
geom_line(aes(linetype=CO2), size=1) + #line options
geom_point(size=3) + #point symbol sizes
#scale_shape_manual(values = c(1, 16)) + #manually choose symbols
theme_bw()+
theme(axis.text.x=element_text(colour="white"), #change x axis labels to white.
axis.title=element_text(size=12),
axis.title.x = element_text(color="white"), #Change x axis label colour to white
panel.border = element_blank(), #remove box boarder
axis.line.x = element_line(color="black", size = 0.5), #add x axis line
axis.line.y = element_line(color="black", size = 0.5), #add y axis line
legend.key = element_blank(), #remove grey box from around legend
legend.position = c(0.9, 0.6))+ #change legend position
geom_vline(xintercept=c(1.4,7.5), linetype="dotted", color="black")+ #put in dotted lines for season boundaries
scale_color_manual(values = c("#FF6600", "green4", "#0099FF"),
name=expression(CO[2]~concentration~(ppm))) + #manually define line colour
scale_linetype_manual(guide="none", values=c("solid", "solid", "solid")) + #manually define line types
scale_shape_manual(values = c(16, 16, 16)) + #manually choose symbols
guides(colour=guide_legend(title.hjust = 20))+
scale_y_continuous(expand = c(0, 0), limits = c(0,2200), breaks=seq(0,2200,200))+ #change x axis to intercept y axis at 0
xlab("Date")+
ylab(expression(Membrane~available~NH[4]^{" +"}~-N~(~mu~g~resin^{-1}~14~day^{-1})))+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
geom_errorbar(aes(ymin = avg.NH4 - se.NH4, #set y error bars
ymax = avg.NH4 + se.NH4),
width=0.1)
I have tried doing the following instead with no luck:
guides(fill=guide_legend(title.hjust=20)
I have also adjusted the hjust value from values between -2 to 20 just to see if that made a difference but it didn't.
I'll try to attach a picture of the graph so far so you can see what I'm talking about.
I've looked through all the questions I can on stack overflow and to the best of my knowledge this is not a duplicate as it's specific to a coding error of my own somewhere.
Thank-you in advance!!
The obvious approach e.g.
theme(legend.title = element_text(hjust = .5))
didn't work for me. I wonder if it is related to this open issue in ggplot2. In any case, one manual approach would be to remove the legend title, and position a new one manually:
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) +
geom_point() +
stat_smooth(se = FALSE) +
theme_bw() +
theme(legend.position = c(.85, .6),
legend.title = element_blank(),
legend.background = element_rect(fill = alpha("white", 0)),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
annotate("text", x = 5, y = 27, size = 3,
label = "CO[2]~concentration~(ppm)", parse = TRUE)
Output:

how to make a cdf plot smoother and label y axis

I read parameters "data1" and "data2" from files and use this code to plot cdf but I have two problems:
make the figure smoother
label Y axis to CDF
Please notice that this code is correct but I need some modifications.
df <- data.frame(x = c(data1, data2), ggg=factor(rep(1:2, c(19365,19365))))
ggplot(df, aes(x, colour = ggg)) +
stat_ecdf() +
labs(x='Time (ms)', ggg='CDF', fill='') +
theme_bw()+
theme(panel.grid.major = element_line(colour = 'grey'),
panel.border = element_rect(colour = 'black'),
axis.line = element_blank(),
panel.background = element_blank(),
legend.direction='vertical',
legend.position = c(1, 0.5),
legend.justification = c(1, 0.5),
legend.background = element_rect(colour = NA)) +
scale_colour_hue(name='', labels=c('IEEE 802.11p','Our protocol'))
The empirical distribution function is always a step function and you should not smooth it in any way. Having said that, you can get the values for the empirical distribution function using ecdf. If you want to do any smoothing on the result (and this is not suggested), you can.
require(dplyr)
res <- df %>%
group_by(ggg) %>%
do(data.frame(x = sort(.$x),
ecdf = ecdf(.$x)(sort(.$x))))
ggplot(res, aes(x, ecdf, colour = ggg)) + geom_step()
To relabel the y axis, you can use
labs(x='Time (ms)', y='CDF')

Resources