Scale the x-axes with quarterly date format - r

I created a plot in R using the ggplot library:
library(ggplot2)
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = variable), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))
I got the plot that I want but the only problem is that variable, yQ values have the format:
1990Q1
1900Q2
1990Q3
1990Q4
......
......
2017Q1
2017Q2
2017Q3
2017Q4
and because there are many years, the x-axis label cannot show all the dates clearly (they overlapped).
Therefore, I want the x-axis label to show only Q1 and Q3 for every 5 years.
So I want the x-axis to be something like this:
1990Q1 1990Q3 1995Q1 1995Q3 ...... 2015Q1 2015Q3
I tried to use scale_x_date but my dates are not in date format (e.g. 1990Q1) and therefore this does not work. How can I fix it?

The question does not provide reproducible input but using df from the Note below with the autoplot.zoo method of ggplot's autoplot generic we can write:
library(ggplot2)
library(zoo)
z <- read.zoo(df, index = "yQ", FUN = as.yearqtr)
autoplot(z) + scale_x_yearqtr()
Note
Test input--
df <- data.frame(yQ = c("1990Q1", "1990Q2", "1990Q3", "1990Q4"), value = 1:4)

The zoo::format.yearqtr() function is quite easy to use with ggplot2.
Try
scale_x_date(labels = function(x) zoo::format.yearqtr(x, "%YQ%q"))

Use function zoo::as.yearqtr (zoo package) to work with quarterly dates.
Generate example data:
year <- 1990:2000
quar <- paste0("Q", 1:4)
foo <- as.vector(outer(year, quar, paste0))
data <- data.frame(dateQ = foo, Y = rnorm(length(foo)))
head(data)
dateQ Y
1 1990Q1 -0.09944705
2 1991Q1 0.14493910
3 1992Q1 0.54856787
4 1993Q1 1.12966224
5 1994Q1 -0.93539302
6 1995Q1 0.24772265
Transform quarterly date to "normal" date:
data$dateNorm <- as.Date(zoo::as.yearqtr(data$dateQ))
head(data)
dateQ Y dateNorm
1 1990Q1 -0.09944705 1990-01-01
2 1991Q1 0.14493910 1991-01-01
3 1992Q1 0.54856787 1992-01-01
4 1993Q1 1.12966224 1993-01-01
5 1994Q1 -0.93539302 1994-01-01
6 1995Q1 0.24772265 1995-01-01
It sets Q1/2/3/4 as the first day of January/April/July/October.
data[grep("1991", data$dateQ), ]
dateQ Y dateNorm
2 1991Q1 0.1449391 1991-01-01
13 1991Q2 1.5878678 1991-04-01
24 1991Q3 -0.1071823 1991-07-01
35 1991Q4 2.2905729 1991-10-01
Now you can plot it or perform other calculations as it's in Date format.
library(ggplot2)
ggplot(data, aes(dateNorm, Y)) +
geom_line()

You can
manipulate x-axis breaks and labels with scale_x_discrete(breaks = ..., labels = ...)
change the angle of text with theme(axis.text.x = element_text(angle = ...))
I generated some data
Combs <- expand.grid(1990:2017, c("Q1", "Q2", "Q3", "Q4"))
df <- data.frame(
yQ = sort(apply(Combs, 1, paste, collapse="")),
value = runif(112)
)
In the first example, I subset yQ values you want with a logical vector - and change the angle of text
library(ggplot2)
pattern <- c(T, F, T, F, rep(F, 16))
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ[pattern], labels = df$yQ[pattern]) +
theme(axis.text.x = element_text(angle=90))
But notice that ticks marks not specified by break are not shown - so the alternative is to copy yQ values into a vector and make non-relevant years = ""
xVec <- as.character(df$yQ)
xVec[pattern==F] <- ""
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ, labels = xVec) +
theme(axis.text.x = element_text(angle=90))

Related

Change ggplot2 point color based on date occurring less than 4 weeks after previous date

I have an example dataframe composed of:
example dataframe
I have used ggplot2 to plot dates on the x-axis with a count on the y-axis:
df_ggplot <- read.csv("ggplot_ex.csv", header = T, na.strings = "", fileEncoding = "UTF-8-BOM")
df_ggplot$Date <- mdy(df_ggplot$Date)
df_ggplot$Ccount <- as.numeric(as.character(df_ggplot$Ccount))
ggplot(df_ggplot, aes(x=Date, y = Ccount)) +
geom_line() +
geom_point()
ggplot ex output
I am wanting points that occur less than 4 weeks after the previous point to turn red. Can anyone help? In this example, the second point would be red as it occurs about 2 weeks after the previous point.
You probably have to do the calculation in the dataframe before the plot (make sure your Date column is in the correct date format).
One option you can try:
df_ggplot <- df_ggplot %>%
mutate(time_diff = difftime(time1 = Date, time2 = lag(x = Date, n = 1), units = "weeks"),
is_red = as.factor(time_diff < 4))
will give you the points that must be flagged.
Date Ccount time_diff is_red
1 2019-08-17 20000 NA weeks <NA>
2 2019-08-30 15000 1.857143 weeks TRUE
3 2019-09-30 25000 4.285714 weeks FALSE
Then you can plot, using some the colors you want.
ggplot(df_ggplot, aes(x = Date, y = Ccount)) +
geom_line() +
geom_point(aes(color = is_red)) +
scale_color_manual(values = c("black", "red"), na.value = "black")

R - extracting time only from xts, zoo and POSIXct

I am analyzing day to day data to see when the value would be lower. I set each day as categorical variable so I can differentiate each day. But I want to get each day plotted on top of another day instead of one continuous graph as shown below.
Data set:
Value Day
2013-01-03 01:55:00 0.35435715 1
2013-01-03 02:00:00 0.33018654 1
2013-01-03 02:05:00 0.38976118 1
2013-01-04 02:10:00 0.45583868 2
2013-01-04 02:15:00 0.29290860 2
My current ggplot code is as follows:
g <- ggplot(data = Data, aes(x = Index, color = Dates)) +
geom_line(y = Data$Value) +
scale_x_datetime(date_breaks = TimeIntervalForGraph, date_labels = "%H") +
xlab("Time") +
ylab("Random value")
I would really appreciate if anyone can guide me on how I can turn my x-axis into 24hrs time series so that I can plot each day on the same graph to see when the value is lower during the 24 hrs.Thanks in advance.
Method tried:
I tried creating an 3rd column with time only, for some reasons the following codes didnt work:
time <- format(index(x), format = "%H:%M"))
data <- cbind(data, time)
You need a way of summarising the data for each hour of the day. Here are some approaches you're probably looking for:
library(xts)
library(data.table)
library(ggplot2)
tm <- seq(as.POSIXct("2017-08-08 17:30:00"), by = "5 mins", length.out = 10000)
z <- xts(runif(10000), tm, dimnames = list(NULL, "vals"))
DT <- data.table(time = index(z), coredata(z))
# note the data.table syntax is different:
DT[, hr := hour(time)]
# Plot the average value by hour:
datByHour <- DT[, list(avgval = mean(vals)), by = c("hr")]
# Use line plot if you have one point per hour:
g <- ggplot(data = datByHour, aes(x = hr, y = avgval, colour = avgval)) +
geom_line()
datByHour <- DT[, list(avgval = mean(vals)), by = c("hr")]
# visualise the distribution by hour:
g2 <- ggplot(data = DT, aes(x = hr, y = vals, group = hr)) +
geom_boxplot()
Please try the following and let me know if it works (here I am taking tm time column as given):
Data$tm = strftime(Data$tm, format="%H:%M:%S")
library(ggplot2)
ggplot(Data, aes(x = tm, y = Value, group = Day, colour = Day)) +
geom_line() +
theme_classic()

Ordering panel data in ggplot2 by the value of the observation in one certain year

I am plotting a simple panel of data with ggplot2. Observations from the same individual (region) are from two different waves, and I want to plot my graph ordering individuals by the value of only one of the waves. However, ggplot by default orders by the mean value of both waves. Here's a basic sample of the data.
data <- read.table(text = "
ID Country time Theil0
1 AT1 2004 0.10358155
2 AT2 2004 0.08181044
3 AT3 2004 0.08238252
4 BE1 2004 0.14754138
5 BE2 2004 0.07205898
6 BE3 2004 0.09522730
7 AT1 2010 0.10901556
8 AT2 2010 0.09593889
9 AT3 2010 0.07579683
10 BE1 2010 0.16500438
11 BE2 2010 0.08313131
12 BE3 2010 0.10281853
", sep = "", header = TRUE)
And here's the code for the plot:
library(ggplot2)
pd <- position_dodge(0.4)
ggplot(data, aes(x=reorder(Country, Theil0), y=Theil0, colour = as.factor(time))) +
geom_point(size=3, position = pd)+
xlab("Region") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
ylab("Index") +
ggtitle("2004 and 2010")
And the resulting plot:
As you can see, ordering by the values of 2010 only (and not the average of both years) would make the BE2 and AT3 observations switch order, which is what I would prefer in the graph. Thank you for any help on this.
I created a reproducible example that uses generic xs and ys. Basically, you need to use the ordered function on your factor:
x <- letters[1:4]
y1 <- 1:4
y2 <- c(1, 4, 2, 5) + 1
library(ggplot2)
library(reshape2) # used to melt the dummy dataset
df <- data.frame(x = x, y1 = y1, y2 = y2)
df2 <- melt(df, id.vars = "x", variable.name = "Group", value.name = "y")
df2$Group <- factor(df2$Group)
gg1 <- ggplot(data = df2, aes( x = x, y = y, color = Group)) +
geom_point()
ggsave("eample1.jpg", gg1, width = 3, height = 3)
Gives a plot similar to what you had:
However, x may be reorder:
df2$x2 <- ordered(df2$x, x[order(y2)])
gg2 <- ggplot(data = df2, aes( x = x2, y = y, color = Group)) +
geom_point()
ggsave("eample2.jpg", gg2, width = 3, height = 3)
which gives this figure:
Also, I get tripped up on this a lot. I find adjusting levels in ggplot2 to be trick at times.

How to Create a Graph of Statistical Time Series

I have data in the following format:
Date Year Month Day Flow
1 1953-10-01 1953 10 1 530
2 1953-10-02 1953 10 2 530
3 1953-10-03 1953 10 3 530
I would like to create a graph like this:
Here is my current image and code:
library(ggplot2)
library(plyr)
library(reshape2)
library(scales)
## Read Data
df <- read.csv("Salt River Flow.csv")
## Convert Date column to R-recognized dates
df$Date <- as.Date(df$Date, "%m/%d/%Y")
## Finds Water Years (Oct - Sept)
df$WY <- as.POSIXlt(as.POSIXlt(df$Date)+7948800)$year+1900
## Normalizes Water Years so stats can be applied to just months and days
df$w <- ifelse(month(df$Date) %in% c(10,11,12), 1903, 1904)
##Creates New Date (dat) Column
df$dat <- as.Date(paste(df$w,month(df$Date),day(df$Date), sep = "-"))
## Creates new data frame with summarised data by MonthDay
PlotData <- ddply(df, .(dat), summarise, Min = min(Flow), Tenth = quantile(Flow, p = 0.05), TwentyFifth = quantile(Flow, p = 0.25), Median = quantile(Flow, p = 0.50), Mean = mean(Flow), SeventyFifth = quantile(Flow, p = 0.75), Ninetieth = quantile(Flow, p = 0.90), Max = max(Flow))
## Melts data so it can be plotted with ggplot
m <- melt(PlotData, id="dat")
## Plots
p <- ggplot(m, aes(x = dat)) +
geom_ribbon(aes(min = TwentyFifth, max = Median), data = PlotData, fill = alpha("black", 0.1), color = NA) +
geom_ribbon(aes(min = Median, max = SeventyFifth), data = PlotData, fill = alpha("black", 0.5), color = NA) +
scale_x_date(labels = date_format("%b"), breaks = date_breaks("month"), expand = c(0,0)) +
geom_line(data = subset(m, variable == "Mean"), aes(y = value), size = 1.2) +
theme_bw() +
geom_line(data = subset(m, variable %in% c("Min","Max")), aes(y = value, group = variable)) +
geom_line(data = subset(m, variable %in% c("Ninetieth","Tenth")), aes(y = value, group = variable), linetype = 2) +
labs(x = "Water Year", y = "Flow (cfs)")
p
I am very close but there are some issues I'm having. First, if you can see a way to improve my code, please let me know. The main problem I ran into was that I needed two dataframes to make this graph: one melted, and one not. The unmelted dataframe was necessary (I think) to create the ribbons. I tried many ways to use the melted dataframe for the ribbons, but there was always a problem with the aesthetic length.
Second, I know to have a legend - and I want one, I need to have something in the aesthetics of each line/ribbon, but I am having trouble getting that to work. I think it would involve scale_fill_manual.
Third, and I don't know if this is possible, I would like to have each month label in between the tick marks, not on them (like in the above image).
Any help is greatly appreciated (especially with creating more efficient code).
Thank you.
Something along these lines might get you close with base:
library(lubridate)
library(reshape2)
# simulating data...
Date <- seq(as.Date("1953-10-01"),as.Date("2010-10-01"),by="day")
Year <- year(Date)
Month <- month(Date)
Day <- day(Date)
set.seed(1)
Flow <- rpois(length(Date), 2000)
Data <- data.frame(Date=Date,Year=Year,Month=Month,Day=Day,Flow=Flow)
# use acast to get it in a convenient shape:
PlotData <- acast(Data,Year~Month+Day,value.var="Flow")
# apply for quantiles
Quantiles <- apply(PlotData,2,function(x){
quantile(x,probs=c(1,.9,.75,.5,.25,.1,0),na.rm=TRUE)
})
Mean <- colMeans(PlotData, na.rm=TRUE)
# ugly way to get month tick separators
MonthTicks <- cumsum(table(unlist(lapply(strsplit(names(Mean),split="_"),"[[",1))))
# and finally your question:
plot(1:366,seq(0,max(Flow),length=366),type="n",xlab = "Water Year",ylab="Discharge",axes=FALSE)
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["75%",])),border=NA,col=gray(.6))
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["25%",])),border=NA,col=gray(.4))
lines(1:366,Quantiles["90%",], col = gray(.5), lty=4)
lines(1:366,Quantiles["10%",], col = gray(.5))
lines(1:366,Quantiles["100%",], col = gray(.7))
lines(1:366,Quantiles["0%",], col = gray(.7), lty=4)
lines(1:366,Mean,lwd=3)
axis(1,at=MonthTicks, labels=NA)
text(MonthTicks-15,-100,1:12,pos=1,xpd=TRUE)
axis(2)
The plotting code really isn't that tricky. You'll need to clean up the aesthetics, but polygon() is usually my strategy for shaded regions in plots (confidence bands, whatever).
Perhaps this will get you closer to what you're looking for, using ggplot2 and plyr:
library(ggplot2)
library(plyr)
library(lubridate)
library(scales)
df$MonthDay <- df$Date - years( year(df$Date) + 100 ) #Normalize points to same year
df <- ddply(df, .(Month, Day), mutate, MaxDayFlow = max(Flow) ) #Max flow on day
df <- ddply(df, .(Month, Day), mutate, MinDayFlow = min(Flow) ) #Min flow on day
p <- ggplot(df, aes(x=MonthDay) ) +
geom_smooth(size=2,level=.8,color="black",aes(y=Flow)) + #80% conf. interval
geom_smooth(size=2,level=.5,color="black",aes(y=Flow)) + #50% conf. interval
geom_line( linetype="longdash", aes(y=MaxDayFlow) ) +
geom_line( linetype="longdash", aes(y=MinDayFlow) ) +
labs(x="Month",y="Flow") +
scale_x_date( labels = date_format("%b") ) +
theme_bw()
Edit: Fixed X scale and X scale label
(Partial answer with base plotting function and not including the min, max, or mean.) I suspect you will need to construct a dataset before passing to ggplot, since that is typical for that function. I already do something similar and then pass the resulting matrix to matplot. (It doesn't do that kewl highlighting, but maybe ggplot can do it>
HDL.mon.mat <- aggregate(dfrm$Flow,
list( dfrm$Year + dfrm$Month/12),
quantile, prob=c(0.1,0.25,0.5,0.75, 0.9), na.rm=TRUE)
matplot(HDL.mon.mat[,1], HDL.mon.mat$x, type="pl")

Plot a 24 hour cycle monthly for multiple variables?

I have data that can be mimicked in the following manner:
set.seed(1234)
foo <- data.frame(month = rep(month.name, each = 24),
hour = rep(seq(1:24), 12),
value1 = rnorm(nrow(foo), 60, 1),
value2 = rnorm(nrow(foo), 60, 1))
foo <- melt(foo, id = c('month', 'hour'))
I would like to create a plot for the entire year using ggplot that displays the 24 hour cycle of each variable per month.
Here's what I've tried so far:
t.plot <- ggplot(foo,
aes(interaction(month,hour), value, group = interaction(variable,hour)))
t.plot <- t.plot + geom_line(aes(colour = variable))
print(t.plot)
I get this, which throws the data into misalignment. For such a small SD you see that the first 24 values should be nearer to 60, but they are all over the place. I don't understand what's causing this discrepancy.
https://www.dropbox.com/s/rv6uxhe7wk7q35w/foo.png
when I plot:
plot(interaction(foo$month,foo$hour)[1:24], foo$value[1:24])
I get the shape that I would expect however the xaxis is very strange and not what I was expecting.
Any help?
The solution is to set your dates to be dates (not an interaction of a factor)
eg
library(lubridate)
library(reshape2)
Date <- as.Date(dmy('01-01-2000') + seq_len(24*365)*hours(1))
foo <- data.frame(Date = Date,
value1 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1),
value2 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 24*365-1))
foo_melt <- melt(foo, id = 'Date')
# then you can use `scale_x_date` and `r` and ggplot2 will know they are dates
# load scales library to access date_format and date_breaks
library(scales)
ggplot(foo_melt, aes(x=Date, y=value, colour = variable)) +
geom_line() +
scale_x_date(breaks = date_breaks('month'),
labels = date_format('%b'), expand =c(0,0))
Edit 1 average day per month
you can use facet_wrap to facet by month
# using your created foo data set
levels(foo$month) <- sort(month.abb)
foo$month <- factor(foo$month, levels = month.abb)
ggplot(foo, aes(x = hour, y=value, colour = variable)) +
facet_wrap(~month) + geom_line() +
scale_x_continuous(expand = c(0,0)))

Resources