Plotting by ggplot in R - r

I have a time series data for different group components. Each group ID with its various time stamps (given as Date) has an hypo and hyper response data. I would like to plot the time series for each of this group by facet (ggplot) for both (1) Group ID and also by response i.e. (2) Hyper and Hypo response so that the picture by response is one top of another. Any help is appreciated.
A demo data set and what I have done so far is given below.
set.seed(1)
tdat <- data.frame(Group = rep(paste0("GroupID-", c("A","B")),
each = 100),
Date = rep(seq(Sys.Date(), by = "1 day", length = 100), 2),
Fitted = c(cumsum(rnorm(100)), cumsum(rnorm(100))),
Signif = rep(NA, 200))
tdat <- transform(tdat, Hyper = Fitted + 1.5, Hypo = Fitted - 1.5)
## select 1 region per Site as signif
take <- sample(10:70, 2)
take[2] <- take[2] + 100
tdat$Signif[take[1]:(take[1]+25)] <- tdat$Fitted[take[1]:(take[1]+25)]
tdat$Signif[take[2]:(take[2]+25)] <- tdat$Fitted[take[2]:(take[2]+25)]
And the data frame looks like this -
> head(tdat)
Group Date Fitted Signif Hyper Hypo
1 GroupID-A 2017-04-18 -0.6264538 NA 0.8735462 -2.1264538
2 GroupID-A 2017-04-19 -0.4428105 NA 1.0571895 -1.9428105
3 GroupID-A 2017-04-20 -1.2784391 NA 0.2215609 -2.7784391
4 GroupID-A 2017-04-21 0.3168417 NA 1.8168417 -1.1831583
5 GroupID-A 2017-04-22 0.6463495 NA 2.1463495 -0.8536505
6 GroupID-A 2017-04-23 -0.1741189 NA 1.3258811 -1.6741189
The time series is given by Date.
The data I have plotted is given below. However my real data has more group ID's and I really want one picture for each group ID with splitting the image for Hyper and Hypo response.
library(ggplot2)
ggplot(tdat, aes(x = Date, y = Fitted, group = Group)) +
geom_line() +
geom_line(mapping = aes(y = Hyper), lty = "dashed") +
geom_line(mapping = aes(y = Hypo), lty = "dashed") +
geom_line(mapping = aes(y = Signif), lwd = 1.3, colour = "red") +
facet_wrap( ~ Group)
Again any help is appreciated.
Thanks

If you will reshape your data with reshape2 or tidyr or data.table and convert wide to long:
library(reshape2)
tdat2<-melt(tdat,id.vars = c("Group","Date","Signif","Fitted"))
ggplot(tdat2, aes(x = Date, y = value, group = Group)) +
geom_line() +
geom_line(mapping = aes(y = Signif), lwd = 1.3, colour = "red") +
facet_wrap( variable~ Group)

How about something like this, using geom_ribbon to show the Hyper and Hypo values:
tdat %>%
ggplot(aes(Date, Fitted)) +
geom_line(lty = "dashed") +
geom_line(aes(y = Signif), lwd = 1.3, color = "red") +
geom_ribbon(aes(ymin = Hypo, ymax = Hyper, group = Group), alpha = 0.2) +
facet_grid(Group ~ .) +
theme_light()
Result:

Related

geom_line() with x as factor and a grouping variable for color

I have the following reproducible data :
d <- data.frame(ATB = rep(c("ATB1", "ATB2"), each = 4),
status = rep(rep(c("S", "R"), each = 2), 2),
season = rep(c("Winter", "summer"), 4),
n = c(239,284,113,120,229,269,127,140)
)
I am trying to draw points for the count n for each line by season, the color being the variable ATB, and to link each point according to the ATB and to the status from one season value to another (ATB1 S winter linked to ATB1 S summer). Here is the plot I am trying to get:
Until now I managed to draw the points but not the lines.
ggplot(d, aes(x=season, y = n)) +
geom_point(aes(color = ATB)) +
geom_line(aes(color = ATB, linetype = status))
I tried group = 1 in each aes, but it didn't work.
Is there a way to obtain the plot ?
You need to group by the interaction of ATB and status, otherwise you are not correctly telling ggplot which points to connect:
ggplot(d, aes(x=season, y = n)) +
geom_point(aes(color = ATB)) +
geom_line(aes(color = ATB, linetype = status,
group = interaction(ATB, status))) +
scale_linetype_manual(values = c(2, 1))

ggplot and lapply /mapply for nested list and data frames

Edit:
I did find a way to do what I need, but now I'm having trouble getting a title to appear for each of the plots that are created so I know which site I am looking at:
lapply(seq(gl), function(i){
lapply(seq(gl[[i]]), function(j){
ggplot() +
geom_point(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `actSWE_mm`, color = `swe_Res_mm`))+
geom_segment(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `actSWE_mm`, xend = `UTC_date.1`, yend = `swe_mm`), alpha=.2)+
scale_color_steps2(low = "blue", mid = "white", high = "red") +
guides(color = FALSE) + geom_point(data = gl[[i]][[j]], aes(x = `UTC_date.1`, y = `swe_mm`), shape = 1) +
facet_wrap(vars(year), scales="free_x") + theme_bw()
})})
I tried adding:
theme(plot.title = paste(names(gl)[i], names(gl[[i]])[j], sep = "_"))
but that does not seem to work.
Original:
I have a list of 12 dataframes representing each month. Within each data frame are timeseries measurements of several different sites. Below is a table example (not actual data) of the data for January (monthSplit is the list - monthSplit$January):
site_id UTC_date.1 swe_mm actSWE_mm swe_Res_mm Month Year
<int> <date> <dbl> <dbl> <dbl> <chr> <num>
1003 2005-01-01 2 54.2 0.241 53.059 "January" 2005
1003 2005-01-02 2 54.2 0.241 53.059 "January" 2005
958 2005-01-01 2 154.2 0.241 153.059 "January" 2005
946 2005-01-01 2 154.2 152.25 1.95 "January" 2005
946 2005-01-02 2 500.2 550.241 50.059 "January" 2005
I'm having two problems when trying to perform ggplot over a list of dataframes that need to be further subset by the unique sites.
I tried to create a ggplot function and use mapply:
plot_fun = function(d) {
ggplot(d, aes(x = `UTC_date.1`, y = `actSWE_mm`)) +
geom_segment(aes(xend = `UTC_date.1`, yend = `swe_mm`), alpha=.2) + geom_point(aes(color = `swe_Res_mm`)) +
scale_color_steps2(low = "blue", mid = "white", high = "red") +
guides(color = FALSE) + geom_point(aes(y = `swe_mm`), shape = 1) +
facet_wrap(vars(year), scales="free_x") + theme_bw()
}
pltlist = mapply(plot_fun, d = monthSplit, SIMPLIFY=FALSE)
This yielded plot in the right format and everything, however it was not split by site_id. So it created a plot that contained several plots with the month's plot values each year. EG: September plot yielded 13 plots in one window representing each year from 2003-2015 for the month of September. The problem is, all the sites were lumped in there.
When looking at the actual data (as is the case with the above plot function), nothing meaningful is gained from the plots because the range of data varies so broadly in the y-axis.
I was wondering how I would go about splitting the list of plots further by site_id so that only one site appears in each plot for comparison.
Add group = site_id if you want to have one color point and line per site_id, e.g.
plot_fun = function(d) { ggplot(d, aes(x = UTC_date.1, y = actSWE_mm, group = site_id)) + geom_segment(aes(xend = UTC_date.1, yend = swe_mm), alpha=.2) + geom_point(aes(color = swe_Res_mm)) + scale_color_steps2(low = "blue", mid = "white", high = "red") + guides(color = FALSE) + geom_point(aes(y = swe_mm), shape = 1) + facet_wrap(vars(year), scales="free_x") + theme_bw() }
(Note I had to delete all your '`' characters as that is the code character).
Not this proposal gives not more plots, but more lines per plot.
If you want to have one plots per site_id, you might split your datasets by that variable, or include it in the facet_wrap:
facet_wrap(facets = ~ year + site_id, scales="free_x")
And if the scales are very different per site, I use log scales. However, zeros and negative values cannot be graphed then, that is a drawback.

Is there an equivalent to points() on ggplot2

I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()
We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))

How to create two barplots with different x and y axis in tha same plot in R?

I need plot two grouped barcodes with two dataframes that has distinct number of rows: 6, 5.
I tried many codes in R but I don't know how to fix it
Here are my data frames: The Freq colum must be in Y axis and the inter and intra columns must be the x axis.
> freqinter
inter Freq
1 0.293040975264367 17
2 0.296736775990729 2
3 0.297619926364764 4
4 0.587377012109561 1
5 0.595245125315916 4
6 0.597022018595893 2
> freqintra
intra Freq
1 0 3
2 0.293040975264367 15
3 0.597022018595893 4
4 0.598809552335782 2
5 0.898227748764939 6
I expect to plot the barplots in the same plot and could differ inter e intra values by colour
I want a picture like this one:
You probably want a histogram. Use the raw data if possible. For example:
library(tidyverse)
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id") %>%
uncount(Freq)
ggplot(df, aes(x, fill = id)) +
geom_histogram(binwidth = 0.1, position = 'dodge', col = 1) +
scale_fill_grey() +
theme_minimal()
With the data you posted I don't think you can have this graph to look good. You can't have bars thin enough to differentiate 0.293 and 0.296 when your data ranges from 0 to 0.9.
Maybe you could try to treat it as a factor just to illustrate what you want to do:
freqinter <- data.frame(x = c(
0.293040975264367,
0.296736775990729,
0.297619926364764,
0.587377012109561,
0.595245125315916,
0.597022018595893), Freq = c(17,2,4,1,4,2))
freqintra <- data.frame(x = c(
0 ,
0.293040975264367,
0.597022018595893,
0.598809552335782,
0.898227748764939), Freq = c(3,15,4,2,6))
df <- bind_rows(freqinter, freqintra, .id = "id")
ggplot(df, aes(x = as.factor(x), y = Freq, fill = id)) +
geom_bar(stat = "identity", position = position_dodge2(preserve = "single")) +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
You can also check the problem by not treating your x variable as a factor:
ggplot(df, aes(x = x, y = Freq, fill = id)) +
geom_bar(stat = "identity", width = 0.05, position = "dodge") +
theme(axis.text.x = element_text(angle = 90)) +
scale_fill_discrete(labels = c("inter", "intra"))
Either the bars must be very thin (small width), or you'll get overlapping x intervals breaking the plot.

plot multiple lines in ggplot

I need to plot hourly data for different days using ggplot, and here is my dataset:
The data consists of hourly observations, and I want to plot each day's observation into one separate line.
Here is my code
xbj1 = bj[c(1:24),c(1,6)]
xbj2 = bj[c(24:47),c(1,6)]
xbj3 = bj[c(48:71),c(1,6)]
ggplot()+
geom_line(data = xbj1,aes(x = Date, y= Value), colour="blue") +
geom_line(data = xbj2,aes(x = Date, y= Value), colour = "grey") +
geom_line(data = xbj3,aes(x = Date, y= Value), colour = "green") +
xlab('Hour') +
ylab('PM2.5')
Please advice on this.
I'll make some fake data (I won't try to transcribe yours) first:
set.seed(2)
x <- data.frame(
Date = rep(Sys.Date() + 0:1, each = 24),
# Year, Month, Day ... are not used here
Hour = rep(0:23, times = 2),
Value = sample(1e2, size = 48, replace = TRUE)
)
This is a straight-forward ggplot2 plot:
library(ggplot2)
ggplot(x) +
geom_line(aes(Hour, Value, color = as.factor(Date))) +
scale_color_discrete(name = "Date")
ggplot(x) +
geom_line(aes(Hour, Value)) +
facet_grid(Date ~ .)
I highly recommend you find good tutorials for ggplot2, such as http://www.cookbook-r.com/Graphs/. Others exist, many quite good.

Resources