Try to plot very basic data in R.
Year X1 X2 X3 X4 X5 X6 X7
2004 0.91 0.23 0.28 1.02 0.90 0.95 0.94
2005 0.57 -0.03 0.88 0.52 0.47 0.55 0.56
2006 1.30 -0.43 1.95 1.27 1.00 1.19 1.26
2007 0.44 0.63 0.60 0.34 0.60 0.50 0.46
2008 1.69 0.34 -2.81 -2.41 -1.80 -1.87 -1.83
What I am looking for is a basic line chart over time with x = year and y = value and the chart itself should include all X1-X7.
I was looking at the ggplot2 functionality, but I don't know where to start.
# Libraries
library(tidyverse)
library(streamgraph)
library(viridis)
library(plotly)
# Plot
p <- data %>%
ggplot(aes(x = year, y = n) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position = "none") +
ggtitle("multiple X over time") +
theme_ipsum() +
theme(legend.position = "none")
ggplotly(p, tooltip = "text")
Would anyone give me a hand on it, please? Is there an easy way to do it in basic R?
Thanks.
It’s unclear what you intend to do with geom_area but in the following you’ll see a basic line chart of the data you’ve shown.
The key point is that ‘ggplot2’ works on tidy data, so you need to first transform your data into long form:
data %>%
pivot_longer(-Year, names_to = 'Vars', values_to = 'Values') %>%
ggplot() +
aes(x = Year, y = Values, color = Vars) +
geom_line()
Related
I have a lm model results containing R2 and pvalue, and I plotted them in a bar plot. I have then facetted them using two discrete variables.
I want to put * on the top of bars to flag statistical significance (pvlue <= 0.05), as shown on the bottom-left-most panel of the below image.
I have not found an insightful tutorial on how to do this.
Any way to do this, please?
Here is some code I used
> head(res_all_s2)
WI aggre_per Season yield_level slope Intercept r.squared
1 R IDW2 Dec Season2 Region II -7.06 6091 0.41
2 R IDW2 Dec Season2 Region I -7.29 6280 0.40
3 GDD AS OND Season2 Region II 14.23 -18270 0.34
4 GDD AS Nov Season2 Region II 36.84 -14760 0.33
5 SPI1 IDW2 Dec Season2 Region II -405.10 5358 0.31
6 SPI1 IDW2 Dec Season2 Region I -421.70 5523 0.32
adj.r.squared fstatistic.value pval pearson
1 0.36 9.58 0.01 -0.64
2 0.36 9.49 0.01 -0.64
3 0.29 7.09 0.02 0.58
4 0.28 6.97 0.02 0.58
5 0.26 6.40 0.02 -0.56
6 0.27 6.51 0.02 -0.56
> # significance (pval <= 0.05)
> signif_reg <- res_all_s2 %>% filter(pval <= 0.05)
> head(signif_reg)
WI aggre_per Season yield_level slope Intercept r.squared
1 R IDW2 Dec Season2 Region II -7.06 6091 0.41
2 R IDW2 Dec Season2 Region I -7.29 6280 0.40
3 GDD AS OND Season2 Region II 14.23 -18270 0.34
4 GDD AS Nov Season2 Region II 36.84 -14760 0.33
5 SPI1 IDW2 Dec Season2 Region II -405.10 5358 0.31
6 SPI1 IDW2 Dec Season2 Region I -421.70 5523 0.32
adj.r.squared fstatistic.value pval pearson
1 0.36 9.58 0.01 -0.64
2 0.36 9.49 0.01 -0.64
3 0.29 7.09 0.02 0.58
4 0.28 6.97 0.02 0.58
5 0.26 6.40 0.02 -0.56
6 0.27 6.51 0.02 -0.56
>
> # Plot R2
>
> r <- res_all_s2 %>% ggplot(aes(x=aggre_per,
+ y=r.squared )) +
+ geom_bar(stat="identity", width=0.8) +
+ facet_grid(yield_level ~ WI,
+ scales = "free_y",
+ switch = "y") +
+ scale_y_continuous(limits = c(0, 1)) +
+ xlab("Aggregation period") +
+ ylab(expression(paste("R-squared"))) +
+ theme_bw() +
+ theme(axis.title = element_text(size = 12), # all titles
+ axis.text = element_text(colour = "black"),
+ axis.text.x = element_text(angle = 90, vjust = 0.5,
+ hjust = 1, color = "black"),
+ strip.text.y.left = element_text(angle = 0),
+ panel.border = element_rect(color = "black",
+ size = .5))
> r
And, here is the link to my res_all_s2 dataset https://1drv.ms/u/s!Ajl_vaNPXhANgckJeqDKA0fzfFEbhg?e=VfoFaB
Technically, you can always add an appropriate geom with its independent dataset (that would be your data filtered to exclude pval > .05):
df_filtered <- res_all_s2 %>% filter(...)
## ggplot(...) +
geom_point(data = df_filtered, pch = 8)
## pch = point character, no. 8 = asterisk
or
## ... +
geom_text(data = df_filtered, aes(label = '*'), nudge_y = .05)
## nudge_y = vertical offset
or color only significant columns:
## ... +
geom_col(aes(fill = c('grey','red')[1 + pval <= .05]))
So, yes, technically that's feasible. But before throwing the results of 13 x 7 x 5 = 455 linear models at your audience, please consider the issues of p-hacking, the benefits of multivariate analysis and the viewers' ressources ;-)
This is an example of the variables that I would like to visualize
id post.test.score pre.test.score messages forum.posts av.assignment.score
1 0.37 0.48 68 7 0.19
2 0.52 0.37 83 22 0.28
3 0.42 0.37 81 7 0.25
4 0.56 0.34 94 14 0.27
5 0.25 0.39 42 11 0.07
I've copied the data from your post above so you can skip the variable assignment
library("tidyverse")
df <- read.table(file = "clipboard", header = T) %>%
as_tibble()
You need to modify your data structure slightly before you pass it to ggplot. Get each of your test names into a single variable with tidyr::gather. Then pipe to ggplot:
df %>%
gather(test, value, -id) %>%
ggplot(aes(x = value)) +
geom_histogram() +
facet_grid(~test)
I have a problem trying to use different colors in my plot for two groups. I created a plot with odds ratios (including 95%CI) over a period of serveral years for 2 groups (mfin and ffin). When using the syntax below, all points and lines are black and my attempts to adjust them e.g. geom_linerange(colour=c("red","blue")) have failed (Error: Incompatible lengths for set aesthetics: colour).
Can anyone help me with this?
ggplot(rbind(data.frame(mfin, group=mfin), data.frame(ffin, group=ffin)),
aes(x = JAAR, y = ror, ymin = llror, ymax = ulror)) +
geom_linerange() +
geom_point() +
geom_hline(yintercept = 1) +
ylab("Odds ratio & 95% CI") +
xlab("") +
geom_errorbar(width=0.2)
Below are some sample data (1st group = mfin, #ND GROUP + ffin)
JAAR ror llror ulror
2008 2.00 1.49 2.51
2009 2.01 1.57 2.59
2010 2.06 1.55 2.56
2011 2.07 1.56 2.58
2012 2.19 1.70 2.69
2013 2.23 1.73 2.72
2014 2.20 1.71 2.69
2015 2.31 1.84 2.78
2016 .230 1.83 2.76
JAAR ror llror ulror
2008 1.36 0.88 1.84
2009 1.20 0.73 1.68
2010 1.16 0.68 1.64
2011 1.23 0.77 1.69
2012 1.43 1.00 1.86
2013 1.46 1.04 1.88
2014 1.49 1.07 1.90
2015 1.30 0.89 1.70
2016 1.29 0.89 1.70
You need to map the group membership variable to the color aesthetic (in the long version of the data):
library(readr)
library(dplyr)
library(ggplot2)
# simulate some data
year_min = 1985
year_max = 2016
num_years = year_max - year_min + 1
num_groups = 2
num_estimates = num_years*num_groups
df_foo = data_frame(
upper_limit = runif(n = num_estimates, min = -20, max = 20),
lower_limit = upper_limit - runif(n = num_estimates, min = 0, max = 5),
point_estimate = runif(num_estimates, min = lower_limit, max = upper_limit),
year = rep(seq(year_min, year_max), num_groups),
group = rep(c("mfin", "ffin"), each = num_years)
)
# plot the confidence intervals
df_foo %>%
ggplot(aes(x = year, y = point_estimate,
ymin = lower_limit, ymax = upper_limit,
color = group)) +
geom_point() +
geom_errorbar() +
theme_bw() +
ylab("Odds Ratio & 95% CI") +
xlab("Year") +
scale_color_discrete(name = "Group")
This produces what I think you are looking for, except the simulated data makes it look somewhat messy:
I have a small table of summary data with the odds ratio, upper and lower confidence limits for four categories, with six levels within each category. I'd like to produce a chart using ggplot2 that looks similar to the usual one created when you specify a lm and it's se, but I'd like R just to use the pre-specified values I have in my table. I've managed to create the line graph with error bars, but these overlap and make it unclear. The data look like this:
interval OR Drug lower upper
14 0.004 a 0.002 0.205
30 0.022 a 0.001 0.101
60 0.13 a 0.061 0.23
90 0.22 a 0.14 0.34
180 0.25 a 0.17 0.35
365 0.31 a 0.23 0.41
14 0.84 b 0.59 1.19
30 0.85 b 0.66 1.084
60 0.94 b 0.75 1.17
90 0.83 b 0.68 1.01
180 1.28 b 1.09 1.51
365 1.58 b 1.38 1.82
14 1.9 c 0.9 4.27
30 2.91 c 1.47 6.29
60 2.57 c 1.52 4.55
90 2.05 c 1.31 3.27
180 2.422 c 1.596 3.769
365 2.83 c 1.93 4.26
14 0.29 d 0.04 1.18
30 0.09 d 0.01 0.29
60 0.39 d 0.17 0.82
90 0.39 d 0.2 0.7
180 0.37 d 0.22 0.59
365 0.34 d 0.21 0.53
I have tried this:
limits <- aes(ymax=upper, ymin=lower)
dodge <- position_dodge(width=0.9)
ggplot(data, aes(y=OR, x=days, colour=Drug)) +
geom_line(stat="identity") +
geom_errorbar(limits, position=dodge)
and searched for a suitable answer to create a pretty plot, but I'm flummoxed!
Any help greatly appreciated!
You need the following lines:
p<-ggplot(data=data, aes(x=interval, y=OR, colour=Drug)) + geom_point() + geom_line()
p<-p+geom_ribbon(aes(ymin=data$lower, ymax=data$upper), linetype=2, alpha=0.1)
Here is a base R approach using polygon() since #jmb requested a solution in the comments. Note that I have to define two sets of x-values and associated y values for the polygon to plot. It works by plotting the outer perimeter of the polygon. I define plot type = 'n' and use points() separately to get the points on top of the polygon. My personal preference is the ggplot solutions above when possible since polygon() is pretty clunky.
library(tidyverse)
data('mtcars') #built in dataset
mean.mpg = mtcars %>%
group_by(cyl) %>%
summarise(N = n(),
avg.mpg = mean(mpg),
SE.low = avg.mpg - (sd(mpg)/sqrt(N)),
SE.high =avg.mpg + (sd(mpg)/sqrt(N)))
plot(avg.mpg ~ cyl, data = mean.mpg, ylim = c(10,30), type = 'n')
#note I have defined c(x1, x2) and c(y1, y2)
polygon(c(mean.mpg$cyl, rev(mean.mpg$cyl)),
c(mean.mpg$SE.low,rev(mean.mpg$SE.high)), density = 200, col ='grey90')
points(avg.mpg ~ cyl, data = mean.mpg, pch = 19, col = 'firebrick')
I have estimates of odds ratio with corresponding 95% CI of six pollutants overs 4 lag periods. How can I create a vertical plot similar to the attached figure in R? The figure below was created in SPSS.
Sample data that produced the figure is the following:
lag pollut or lcl ucl
0 CO 0.97 0.90 1.06
0 PM10 1.00 0.91 1.09
0 NO 0.97 0.92 1.02
0 NO2 1.01 0.89 1.15
0 SO2 0.97 0.85 1.11
0 Ozone 1.00 0.87 1.15
1 CO 1.03 0.95 1.10
1 PM10 0.93 0.86 1.01
1 NO 1.01 0.97 1.06
1 NO2 1.08 0.97 1.20
1 SO2 0.94 0.84 1.04
1 Ozone 0.94 0.84 1.04
2 CO 1.09 1.02 1.16
2 PM10 1.04 0.96 1.13
2 NO 1.04 1.00 1.08
2 NO2 1.07 0.96 1.18
2 SO2 1.05 0.95 1.17
2 Ozone 0.93 0.84 1.03
3 CO 0.98 0.91 1.06
3 PM10 1.14 1.05 1.24
3 NO 0.99 0.95 1.04
3 NO2 1.01 0.91 1.12
3 SO2 1.11 1.00 1.23
3 Ozone 1.00 0.90 1.11
You can also do this with ggplot2. The code is somewhat shorter:
dat <- read.table("clipboard", header = T)
dat$lag <- paste0("L", dat$lag)
library(ggplot2)
ggplot(dat, aes(x = pollut, y = or, ymin = lcl, ymax = ucl)) + geom_pointrange(aes(col = factor(lag)), position=position_dodge(width=0.30)) +
ylab("Odds ratio & 95% CI") + geom_hline(aes(yintercept = 1)) + scale_color_discrete(name = "Lag") + xlab("")
EDIT: Here is a version is closer to the SPSS figure:
ggplot(dat, aes(x = pollut, y = or, ymin = lcl, ymax = ucl)) + geom_linerange(aes(col = factor(lag)), position=position_dodge(width=0.30)) +
geom_point(aes(shape = factor(lag)), position=position_dodge(width=0.30)) + ylab("Odds ratio & 95% CI") + geom_hline(aes(yintercept = 1)) + xlab("")
Assuming your data are in datf...
I'd sort it first into just what you want order wise.
datf <- datf[order(datf$pollut, datf$lag), ]
You want a space before and after every lab grouping so I'd add some extra rows in that are NA. That makes it easier because then you'll automatically have blanks in your plot calls.
datfPlusNA <- lapply(split(datf, datf$pollut), function(x) rbind(NA, x, NA))
datf <- do.call(rbind, datfPlusNA)
Now that you have your data.frame sorted and with the extra NAs the plotting is easy.
nr <- nrow(datf) # find out how many rows all together
with(datf, {# this allows entering your commands more succinctly
# first you could set up the plot so you can select the order of drawing
plot(1:nr, or, ylim = c(0.8, 1.3), type = 'n', xaxt = 'n', xlab = '', ylab = 'Odds Ratio and 95% CI', frame.plot = TRUE, panel.first = grid(nx = NA, ny = NULL))
# arrows(1:nr, lcl, 1:nr, ucl, length = 0.02, angle = 90, code = 3, col = factor(lag))
# you could use arrows above but you don't want ends so segments is easier
segments(1:nr, lcl, 1:nr, ucl, col = factor(lag))
# add your points
points(1:nr, or, pch = 19, cex = 0.6)
xLabels <- na.omit(unique(pollut))
axis(1, seq(4, 34, by = 6) - 0.5, xLabels)
})
abline(h = 1.0)
There are packages that make this kind of thing easier but if you can do it like this you can start doing any graphs that you can imagine.