Convert X-axis from date to hrs in ggplot2 - r

I have a recording during 48 h of a probe of pressure of a recipient. After importing data from Excel I get two vars Time (class= "POSIXct" and Probe ('numeric').
'data.frame': 3647 obs. of 2 variables:
$ Date : POSIXct, format: "2020-01-15 17:34:02" "2020-01-15 17:34:42"...
$ probe: num 31.6 35.8 29.9 29.1 30.1...
I plot both vars using ggplot2 and geom_line and I get this graph
The X axis shows time (total 48 h)
However, when I try to format time by using:
library(hms)
data$time<-as_hms(data$Date)
I got a messy plot. I have tried different methods to convert 'data$time' to a different scales but I cannot manage this.
Any help would be appreciated

This is solved with scale_x_datetime by creating a breaks vector and labeling with a date_labels format string.
I have created a test data set, since there is none in the question.
set.seed(2022) # make the example reproducible
start <- as.POSIXct("2020-01-15 17:34:02")
end <- start + 48*60*60
Date <- seq(start, end, by = "15 mins")
probe <- cumsum(rnorm(length(Date)))
data <- data.frame(Date, probe)
library(ggplot2)
breaks_start <- as.POSIXct(format(start, "%Y-%m-%d"))
breaks_end <- as.POSIXct(format(end + 24*60*60, "%Y-%m-%d"))
ggplot(data, aes(Date, probe)) +
geom_line(color = "darkred", size = 1.2) +
scale_x_datetime(
breaks = seq(breaks_start, breaks_end, by = "12 hours"),
date_labels = "%H:%M"
)
Created on 2022-05-28 by the reprex package (v2.0.1)

Related

How to convert datetime format into 'ddmmyyyy' using R?

My dates dataframe looks like this:
Date Values
1JAN2018 80
23DEC2019 21.3
... ...
How can I format this into a ddmmyyyy date so that I can use ggplot to create a time series plot?
What did I do?
Date <- as.Date(Date, '%d%m%Y')
But unfortunately, that didn't seem to do the trick.
Thank you so much! :D
EDIT:
Thanks for the answers. This is my current plot. Is it possible to smoothen this out more? It seems very static:
Both values are measured several times (HH, MM) at the same time each day (around 40 times). When using your code:
ggplot(aug, aes(aug$DATE)) +
#geom_smooth(stat = "identity") +
geom_line(aes(y = aug$VALUE_ONE, colour = "aug$VALUE_ONE")) +
geom_line(aes(y = aug$VALUE_TWO, colour = "aug$VALUE_TWO")) +
ggtitle("Time Series Data)")+
xlab("Time")+
ylab("Value")+
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(hjust = 0.5))
EDIT2:
Thanks again for the answers. To get a better view of the data, the data is as follows:
Date ValueOne ValueTwo Time
1JAN2018 20 11 05:22
1JAN2018 25 12 05:33
1JAN2018 34 44 05:59
1JAN2018 32 55 06:30
1JAN2018 4 88 06:48
1JAN2018 11 78 10:33
1JAN2018 12 100 15:33
Every day has around 40 measures of both ValueOne and ValueTwo at different moments on that day. Because there are so many measurements, the line stays static to me unless I plot a single day for example. In that case it works well. Do you ave any idea?
A simple solution is to use lubridate package
# Install lubridate package
install.packages("lubridate")
# Use lubridate package
library(lubridate)
dmy('23DEC2019')
[1] "2019-12-23"
dmy('1JAN2018')
[1] "2018-01-01"
# Plotting the data in ggplot
library(ggplot2)
ggplot(data, aes(x=date, y=values)) +
geom_smooth(stat = "identity") +
ggtitle("Time Series Data)")+
xlab("Time")+
ylab("Value")+
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(hjust = 0.5))
The anytime package offers functions anytime() and anydate() which do this---from any input format, and without a required format string.
R> library(anytime)
R> anydate(c("23DEC2019", "1JAN2018"))
[1] "2019-12-23" "2018-01-01"
R>
It should be sufficient to do
as.Date(x, format = "%d%b%Y")
However, for some locales this produces NAs
x <- c("1JAN2018", "23DEC2019")
as.Date(x, format = "%d%b%Y")
# [1] "2018-01-01" NA
You see this gives NA for the entry 23DEC2019 (for me).
From ?strptime
## read in date info in format 'ddmmmyyyy'
## This will give NA(s) in some non-English locales; setting the C locale
## as in the commented lines will overcome this on most systems.
## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
So you might also need
lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
Now run above code again
as.Date(x, "%d%b%Y")
#[1] "2018-01-01" "2019-12-23"
And finally change locale back again
Sys.setlocale("LC_TIME", lct)

Time Series plot.ts and x label

I have .csv file with quarters in first col (like 200901, 200902 etc) or I can have them as a row names. In other cols I have some common statistical data (like inflation rate 102.5; 101.5 etc).
The problem is that function plot.ts doesn't show the quarters in x label. Althoug I see nice 7 plots in one card.
My code is simple:
require(ggplot2)
plot.ts(abc, xlab = abc$quarters)
abc - my file with data, abc$quarters - col with number of quarters.
Maybe other function will be better here, but I get annoyed just for thinking it's very close to quite an easy solution.
As comments say, plot.ts isn't a ggplot2 function. Here's an example of what you may be looking for in ggplot2:
library(stocks)
library(tidyverse)
getSymbols("AMZN", src="yahoo", from="2016-07-01")
data.frame(AMZN) %>%
rownames_to_column() %>%
mutate(
rowname = as.Date(rowname, format="%Y-%m-%d")
) %>%
ggplot() +
geom_line(aes(rowname, AMZN.Close)) +
scale_x_date(expand = expand_scale(0), minor_breaks = NULL,
date_breaks = "3 months", date_labels = "%m-%Y") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
I'd suggest working with ggplot2 functions for plotting time series over something like plot.ts since it'd likely be more flexible.
Thanks for answers, they lead me to the following solution:
dane_2$abc2 dane_2$abc3 dane_2$abc4
[1,] 103.5 19.9 37.3
[2,] 103.4 19.5 35.2
[3,] 103.0 25.1 34.7
View(dane_2)
# In dane_2$abc1 I have own numbers of quarters like 20094, so I skip it.
tseries <- ts(dane_2[ ,-1], start = c(2009, 4), frequency = 4)
par(mfrow=c(1,3))
plot(tseries)

Recursive ggplotting of POSIXct data with CORRECT corresponding plot name

need help:
My original df has 50 vars: 49 POSIXct class and 1 categorical outcome var. I want to plot (ggplot) the 49 vars to explore its time distribution as a scatterplot. I want to put the plots into a grid of plots. The trouble is getting the corresponding plot to var name correct. My solution below seems longwinded.
```{r}
library(grid)
library(ggplot2)
library(gridExtra)
str(a[c(1:4,50)])
```
'data.frame': 48 obs. of 4 variables:
$ admd : POSIXct, format: "2017-09-23 12:00:00" "2017-09-23 10:31:00" ...
$ feverd: POSIXct, format: "2017-09-23 12:00:00" "2017-09-23 12:00:00" ...
$ defd : POSIXct, format: "2017-09-23 16:00:00" "2017-09-23 12:13:00" ...
$ ns1d : POSIXct, format: NA "2017-09-23 10:13:00" ...
$ outcome: Factor w/ 2 levels "Death","Discharge": 2 1 2 2 2 2 2 2 2 2 ...
I want to plot these:
intended results/plots: each plot has its corresponding variable name
What I did isn't polished at all:
#plot_fun2
```{r}
plot_fun<- function(df){
ggplot(df, aes_string(x=df[[1]], y=rnorm(1:48))) +
geom_point(aes(col=a$outcome), alpha=0.5, cex=3)+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Time") + ylab("RandomVisualValue")+
scale_colour_manual(name="Outcome",values=c("#FF420E","#89DA59")) +
ggtitle(names(df)[[1]])
}
```
```{r}
sample_list<-list(
data.frame(a[1]), data.frame(a[2]), data.frame(a[3]), data.frame(a[4])) #CAN WE SIMPLIFY THIS? my original data frame has 49 vars!
timedistb<-lapply(sample_list, plot_fun)
do.call(grid.arrange, c(timedistb, ncol=2, nrow=2))
```
Snippet of data: https://github.com/dcicantab5/recover-study/blob/master/c.csv
Don't really understand what the random number y-axis is about, but you may want to try some combination of gather and facet_wrap:
gather: gathers the date time variables in one column datevar that stores column names and one column dateval that stores the respective dates (from each row).
facet_wrap: creates facets for each column name in datevar, where each facets only contains data from the respective date time variables (previous column names of date time variables)
Example code:
library(dplyr)
library(tidyr)
library(ggplot2)
df <- read.csv("https://raw.githubusercontent.com/dcicantab5/recover-study/master/c.csv")
df <- df %>%
gather(datevar, dateval, -X, -outcome) %>%
mutate(rnum = rnorm(1))
ggplot(df, aes(x=dateval, y=rnum)) +
geom_point(aes(col=outcome), alpha=0.5, cex=3)+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Time") + ylab("RandomVisualValue")+
scale_colour_manual(name="Outcome",values=c("#FF420E","#89DA59")) +
facet_wrap(~datevar)

plotting daily rainfall data using geom_step

I have some rainfall data collected continuously from which I have calculated daily totals. Here is some toy data:
Date <- c(seq(as.Date("2016-07-01"), by = "1 day", length.out = 10))
rain_mm <- c(3,6,8,12,0,0,34,23,5,1)
rain_data <- data.frame(Date, rain_mm)
I can plot this data as follows:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity") +
scale_x_date(date_labels = "%d")
Which gives the following:
This seems fine. It is clear how much rainfall there was on a certain day. However, it could also be interpreted that between midday of one day and midday of the next, a certain amount of rain fell, which is wrong. This is especially a problem if the graph is combined with other plots of related continuous variables over the same period.
To get round this issue I could use geom_step as follows:
library(ggplot)
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
scale_x_date(date_labels = "%d")
Which gives:
This is a better way to display the data, and now scale_x_date appears to be a continuous axis. However, it would be nice to get the area below the steps filled but cant seem to find a straight forward way of doing this.
Q1: How can I fill beneath the geom_step? Is it possible?
It may also be useful to convert Date into POSIXct to facilitate identical x-axis in multi-plot figures as discussed in this SO question here.
I can do this as follows:
library(dplyr)
rain_data_POSIX <- rain_data %>% mutate(Date = as.POSIXct(Date))
Date rain_mm
1 2016-07-01 01:00:00 3
2 2016-07-02 01:00:00 6
3 2016-07-03 01:00:00 8
4 2016-07-04 01:00:00 12
5 2016-07-05 01:00:00 0
6 2016-07-06 01:00:00 0
7 2016-07-07 01:00:00 34
8 2016-07-08 01:00:00 23
9 2016-07-09 01:00:00 5
10 2016-07-10 01:00:00 1
However, this gives a time of 01:00 for each date. I would rather have 00:00. Can I change this in the as.POSIXct function call, or do I have to do it afterwards using a separate function? I think it is something to do with tz = "" but cant figure it out.
How can I convert from class Date to POSIXct so that the time generated is 00:00?
Thanks
For your first question, you can work off this example. First, create a time-lagged version of your data:
rain_tl <- mutate( rain_data, rain_mm = lag( rain_mm ) )
Then combine this time-lagged version with the original data, and re-sort by date:
rain_all <- bind_rows( old = rain_data, new = rain_tl, .id="source" ) %>%
arrange( Date, source )
(Note the newly created source column is used to break ties, correctly interlacing the original data with the time-lagged version):
> head( rain_all )
source Date rain_mm
1 new 2016-07-01 NA
2 old 2016-07-01 3
3 new 2016-07-02 3
4 old 2016-07-02 6
5 new 2016-07-03 6
6 old 2016-07-03 8
You can now use the joint matrix to "fill" your steps:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
geom_ribbon( data = rain_all, aes( ymin = 0, ymax = rain_mm ),
fill="tomato", alpha=0.5 ):
This produces the following plot:
For your second question, the problem is that as.POSIX.ct does not pass additional arguments to the converter, so specifying the tz argument does nothing.
You basically have two options:
1) Reformat the output to what you want: format( as.POSIXct( Date ), "%F 00:00" ), which returns a vector of type character. If you want to preserve the object type as POSIXct, you can instead...
2) Cast your Date vector to character prior to passing it to as.POSIX.ct: as.POSIXct( as.character(Date) ), but this will leave off the time entirely, which may be what you want anyway.
If you would like to avoid the hack, you can customize the position in the geom_bar expression.
I found good results with:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity", position = position_nudge(x = 0.51), width = 0.99) +
scale_x_date(date_labels = "%d")

Plot hourly data using ggplot2

I am using ggplot2 to plot my hourly time series data. Data organization is as
> head(df)
timestamp power
1 2015-08-01 00:00:00 584.4069
2 2015-08-01 01:00:00 577.2829
3 2015-08-01 02:00:00 569.0937
4 2015-08-01 03:00:00 561.6945
5 2015-08-01 04:00:00 557.9449
6 2015-08-01 05:00:00 562.4152
I use following ggplot2 command to plot the data:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=pretty_breaks(n=30)) +
theme(axis.text.x = element_text(angle=90,hjust=1))
With this the plotted graph is:
My questions are:
In the plotted graph, why it is showing only the labels corresponding to hour 18. Now, what if I want to display the labels corresponding to hour 12 of each day.
I am plotting hourly data, hoping to see the fine granular details. But, I am not able to see all the hours of entire one month. Can I somehow see the zoomed view for any selected day in the same plot?
Here is a rather long example of scaling dates in ggplot and also a possible interactive way to zoom in on ranges. First, some sample data,
## Make some sample data
library(zoo) # rollmean
set.seed(0)
n <- 745
x <- rgamma(n,.15)*abs(sin(1:n*pi*24/n))*sin(1:n*pi/n/5)
x <- rollmean(x, 3, 0)
start.date <- as.POSIXct('2015-08-01 00:00:00') # the min from your df
dat <- data.frame(
timestamp=as.POSIXct(seq.POSIXt(start.date, start.date + 60*60*24*31, by="hour")),
power=x * 3000)
For interactive zooming, you could try plotly. You need to set it up (get an api-key and username) then just do
library(plotly)
plot_ly(dat, x=timestamp, y=power, text=power, type='line')
and you can select regions of the graph and zoom in on them. You can see it here.
For changing the breaks in the ggplot graphs, here is a function to make date breaks by various intervals at certain hours.
## Make breaks from a starting date at a given hour, occuring by interval,
## length.out is days
make_breaks <- function(strt, hour, interval="day", length.out=31) {
strt <- as.POSIXlt(strt - 60*60*24) # start back one day
strt <- ISOdatetime(strt$year+1900L, strt$mon+1L, strt$mday, hour=hour, min=0, sec=0, tz="UTC")
seq.POSIXt(strt, strt+(1+length.out)*60*60*24, by=interval)
}
One way to zoom in, non-interactively, is to simply subset the data,
library(scales)
library(ggplot2)
library(gridExtra)
## The whole interval, breaks on hour 18 each day
breaks <- make_breaks(min(dat$timestamp), hour=18, interval="day", length.out=31)
p1 <- ggplot(dat,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle("Full Range")
## Look at a specific day, breaks by hour
days <- 20
samp <- dat[format(dat$timestamp, "%d") %in% as.character(days),]
breaks <- make_breaks(min(samp$timestamp), hour=0, interval='hour', length.out=length(days))
p2 <- ggplot(samp,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle(paste("Day:", paste(days, collapse = ", ")))
grid.arrange(p1, p2)
I didn't worked with data time data a lot so my code might look a bit messy... But the solution to 1 is to not use pretty_breaks() but better use concrete breaks and also limit the within the scale_x_datetime() function.
A bad written example might be the following:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"),
breaks=as.POSIXct(sapply(seq(18000, 3600000, 86400), function(x) 0 + x),
origin="2015-10-19 7:00:00"),
limits=c(as.POSIXct(3000, origin="2015-10-19 7:00:00"),
as.POSIXct(30000, origin="2015-10-19 7:00:00"))) +
theme(axis.text.x = element_text(angle=90,hjust=1))
I am not sure how to write the as.POSIXct() more readable... But Basically create the 12 hour point manually and add always a complete day within the range of your data frame...

Resources