My dates dataframe looks like this:
Date Values
1JAN2018 80
23DEC2019 21.3
... ...
How can I format this into a ddmmyyyy date so that I can use ggplot to create a time series plot?
What did I do?
Date <- as.Date(Date, '%d%m%Y')
But unfortunately, that didn't seem to do the trick.
Thank you so much! :D
EDIT:
Thanks for the answers. This is my current plot. Is it possible to smoothen this out more? It seems very static:
Both values are measured several times (HH, MM) at the same time each day (around 40 times). When using your code:
ggplot(aug, aes(aug$DATE)) +
#geom_smooth(stat = "identity") +
geom_line(aes(y = aug$VALUE_ONE, colour = "aug$VALUE_ONE")) +
geom_line(aes(y = aug$VALUE_TWO, colour = "aug$VALUE_TWO")) +
ggtitle("Time Series Data)")+
xlab("Time")+
ylab("Value")+
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(hjust = 0.5))
EDIT2:
Thanks again for the answers. To get a better view of the data, the data is as follows:
Date ValueOne ValueTwo Time
1JAN2018 20 11 05:22
1JAN2018 25 12 05:33
1JAN2018 34 44 05:59
1JAN2018 32 55 06:30
1JAN2018 4 88 06:48
1JAN2018 11 78 10:33
1JAN2018 12 100 15:33
Every day has around 40 measures of both ValueOne and ValueTwo at different moments on that day. Because there are so many measurements, the line stays static to me unless I plot a single day for example. In that case it works well. Do you ave any idea?
A simple solution is to use lubridate package
# Install lubridate package
install.packages("lubridate")
# Use lubridate package
library(lubridate)
dmy('23DEC2019')
[1] "2019-12-23"
dmy('1JAN2018')
[1] "2018-01-01"
# Plotting the data in ggplot
library(ggplot2)
ggplot(data, aes(x=date, y=values)) +
geom_smooth(stat = "identity") +
ggtitle("Time Series Data)")+
xlab("Time")+
ylab("Value")+
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(hjust = 0.5))
The anytime package offers functions anytime() and anydate() which do this---from any input format, and without a required format string.
R> library(anytime)
R> anydate(c("23DEC2019", "1JAN2018"))
[1] "2019-12-23" "2018-01-01"
R>
It should be sufficient to do
as.Date(x, format = "%d%b%Y")
However, for some locales this produces NAs
x <- c("1JAN2018", "23DEC2019")
as.Date(x, format = "%d%b%Y")
# [1] "2018-01-01" NA
You see this gives NA for the entry 23DEC2019 (for me).
From ?strptime
## read in date info in format 'ddmmmyyyy'
## This will give NA(s) in some non-English locales; setting the C locale
## as in the commented lines will overcome this on most systems.
## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
So you might also need
lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
Now run above code again
as.Date(x, "%d%b%Y")
#[1] "2018-01-01" "2019-12-23"
And finally change locale back again
Sys.setlocale("LC_TIME", lct)
Related
I have a recording during 48 h of a probe of pressure of a recipient. After importing data from Excel I get two vars Time (class= "POSIXct" and Probe ('numeric').
'data.frame': 3647 obs. of 2 variables:
$ Date : POSIXct, format: "2020-01-15 17:34:02" "2020-01-15 17:34:42"...
$ probe: num 31.6 35.8 29.9 29.1 30.1...
I plot both vars using ggplot2 and geom_line and I get this graph
The X axis shows time (total 48 h)
However, when I try to format time by using:
library(hms)
data$time<-as_hms(data$Date)
I got a messy plot. I have tried different methods to convert 'data$time' to a different scales but I cannot manage this.
Any help would be appreciated
This is solved with scale_x_datetime by creating a breaks vector and labeling with a date_labels format string.
I have created a test data set, since there is none in the question.
set.seed(2022) # make the example reproducible
start <- as.POSIXct("2020-01-15 17:34:02")
end <- start + 48*60*60
Date <- seq(start, end, by = "15 mins")
probe <- cumsum(rnorm(length(Date)))
data <- data.frame(Date, probe)
library(ggplot2)
breaks_start <- as.POSIXct(format(start, "%Y-%m-%d"))
breaks_end <- as.POSIXct(format(end + 24*60*60, "%Y-%m-%d"))
ggplot(data, aes(Date, probe)) +
geom_line(color = "darkred", size = 1.2) +
scale_x_datetime(
breaks = seq(breaks_start, breaks_end, by = "12 hours"),
date_labels = "%H:%M"
)
Created on 2022-05-28 by the reprex package (v2.0.1)
I have .csv file with quarters in first col (like 200901, 200902 etc) or I can have them as a row names. In other cols I have some common statistical data (like inflation rate 102.5; 101.5 etc).
The problem is that function plot.ts doesn't show the quarters in x label. Althoug I see nice 7 plots in one card.
My code is simple:
require(ggplot2)
plot.ts(abc, xlab = abc$quarters)
abc - my file with data, abc$quarters - col with number of quarters.
Maybe other function will be better here, but I get annoyed just for thinking it's very close to quite an easy solution.
As comments say, plot.ts isn't a ggplot2 function. Here's an example of what you may be looking for in ggplot2:
library(stocks)
library(tidyverse)
getSymbols("AMZN", src="yahoo", from="2016-07-01")
data.frame(AMZN) %>%
rownames_to_column() %>%
mutate(
rowname = as.Date(rowname, format="%Y-%m-%d")
) %>%
ggplot() +
geom_line(aes(rowname, AMZN.Close)) +
scale_x_date(expand = expand_scale(0), minor_breaks = NULL,
date_breaks = "3 months", date_labels = "%m-%Y") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
I'd suggest working with ggplot2 functions for plotting time series over something like plot.ts since it'd likely be more flexible.
Thanks for answers, they lead me to the following solution:
dane_2$abc2 dane_2$abc3 dane_2$abc4
[1,] 103.5 19.9 37.3
[2,] 103.4 19.5 35.2
[3,] 103.0 25.1 34.7
View(dane_2)
# In dane_2$abc1 I have own numbers of quarters like 20094, so I skip it.
tseries <- ts(dane_2[ ,-1], start = c(2009, 4), frequency = 4)
par(mfrow=c(1,3))
plot(tseries)
Goal
Use ggplot2 (latest version) to produce a graph that duplicates the x- or y-axis on both sides of the plot, where the scale is not continuous.
Minimal Reprex
# Example data
dat1 <- tibble::tibble(x = c(rep("a", 50), rep("b", 50)),
y = runif(100))
# Standard scatterplot
p1 <- ggplot2::ggplot(dat1) +
ggplot2::geom_boxplot(ggplot2::aes(x = x, y = y))
When the scale is continuous, this is easy to do with an identity transformation (clearly one-to-one).
# This works
p1 + ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~ .))
However, when the scale is not continuous, this doesn't work, as other scale_* functions don't have a sec.axis argument (which makes sense).
# This doesn't work
p1 + ggplot2::scale_x_discrete(sec.axis = ggplot2::sec_axis(~ .))
Error in discrete_scale(c("x", "xmin", "xmax", "xend"), "position_d", :
unused argument (sec.axis = <environment>)
I also tried using the position argument in the scale_* functions, but this doesn't work either.
# This doesn't work either
p1 + ggplot2::scale_x_discrete(position = c("top", "bottom"))
Error in match.arg(position, c("left", "right", "top", "bottom")) :
'arg' must be of length 1
Edit
For clarity, I was hoping to duplicate the x- or y-axis where the scale is anything, not just discrete (a factor variable). I just used a discrete variable in the minimal reprex for simplicity.
For example, this issue arises in a context where the non-continuous scale is datetime or time format.
Duplicating (and modifying) discrete axis in ggplot2
You can adapt this answer by just putting the same labels on both sides. As far as "you can convert anything non-continuous to a factor, but that's even more inelegant!" from your comment above, that's what a non-continuous axis is, so I'm not sure why that would be a problem for you.
TL:DR Use as.numeric(...) for your categorical aesthetic and manually supply the labels from the original data, using scale_*_continuous(..., sec_axis(~., ...)).
Edited to update:
I happened to look back through this thread and see that it was asked for dates and times. This makes the question worded incorrectly: dates and times are continuous not discrete. Discrete scales are factors. Dates and times are ordered continuous scales. Under the hood, they're just either the days or the seconds since "1970-01-01".
scale_x_date will indeed throw an error if you try to pass a sec.axis argument, even if it's dup_axis. To work around this, you convert your dates/times to a number, and then fool your scales using labels. While this requires a bit of fiddling, it's not too complicated.
library(lubridate)
library(dplyr)
df <- data_frame(tm = ymd("2017-08-01") + 0:10,
y = cumsum(rnorm(length(tm)))) %>%
mutate(tm_num = as.numeric(tm))
df
# A tibble: 11 x 3
tm y tm_num
<date> <dbl> <dbl>
1 2017-08-01 -2.0948146 17379
2 2017-08-02 -2.6020691 17380
3 2017-08-03 -3.8940781 17381
4 2017-08-04 -2.7807154 17382
5 2017-08-05 -2.9451685 17383
6 2017-08-06 -3.3355426 17384
7 2017-08-07 -1.9664428 17385
8 2017-08-08 -0.8501699 17386
9 2017-08-09 -1.7481911 17387
10 2017-08-10 -1.3203246 17388
11 2017-08-11 -2.5487692 17389
I just made a simple vector of 11 days (0 to 10) added to "2017-08-01". If you run as.numeric on that, you get the number of days since the beginning of the Unix epoch. (see ?lubridate::as_date).
df %>%
ggplot(aes(tm_num, y)) + geom_line() +
scale_x_continuous(sec.axis = dup_axis(),
breaks = function(limits) {
seq(floor(limits[1]), ceiling(limits[2]),
by = as.numeric(as_date(days(2))))
},
labels = function(breaks) {as_date(breaks)})
When you plot tm_num against y, it's treated just like normal numbers, and you can use scale_x_continuous(sec.axis = dup_axis(), ...). Then you have to figure out how many breaks you want and how to label them.
The breaks = is a function that takes the limits of the data, and calculates nice looking breaks. First you round the limits, to make sure you get integers (dates don't work well with non-integers). Then you generate a sequence of your desired width (the days(2)). You could use weeks(1) or months(3) or whatever, check out ?lubridate::days. Under the hood, days(x) generates a number of seconds (86400 per day, 604800 per week, etc.), as_date converts that into a number of days since the Unix epoch, and as.numeric converts it back to an integer.
The labels = is a function takes the sequence of integers we just generated and converts those back to displayable dates.
This also works with times instead of dates. While dates are integer days, times are integer seconds (either since the Unix epoch, for datetimes, or since midnight, for times).
Let's say you had some observations that were on the scale of minutes, not days.
The code would be similar, with a few tweaks:
df <- data_frame(tm = ymd_hms("2017-08-01 23:58:00") + 60*0:10,
y = cumsum(rnorm(length(tm)))) %>%
mutate(tm_num = as.numeric(tm))
df
# A tibble: 11 x 3
tm y tm_num
<dttm> <dbl> <dbl>
1 2017-08-01 23:58:00 1.375275 1501631880
2 2017-08-01 23:59:00 2.373565 1501631940
3 2017-08-02 00:00:00 3.650167 1501632000
4 2017-08-02 00:01:00 2.578420 1501632060
5 2017-08-02 00:02:00 5.155688 1501632120
6 2017-08-02 00:03:00 4.022228 1501632180
7 2017-08-02 00:04:00 4.776145 1501632240
8 2017-08-02 00:05:00 4.917420 1501632300
9 2017-08-02 00:06:00 4.513710 1501632360
10 2017-08-02 00:07:00 4.134294 1501632420
11 2017-08-02 00:08:00 3.142898 1501632480
df %>%
ggplot(aes(tm_num, y)) + geom_line() +
scale_x_continuous(sec.axis = dup_axis(),
breaks = function(limits) {
seq(floor(limits[1] / 60) * 60, ceiling(limits[2] / 60) * 60,
by = as.numeric(as_datetime(minutes(2))))
},
labels = function(breaks) {
stamp("Jan 1,\n0:00:00", orders = "md hms")(as_datetime(breaks))
})
Here I updated the dummy data to span 11 minutes from just before midnight to just after midnight. In breaks = I modified it to make sure I got an integer number of minutes to create breaks on, changed as_date to as_datetime, and used minutes(2) to make a break every two minutes. In labels = I added a functional stamp(...)(...), which creates a nice format to display.
Finally just times.
df <- data_frame(tm = milliseconds(1234567 + 0:10),
y = cumsum(rnorm(length(tm)))) %>%
mutate(tm_num = as.numeric(tm))
df
# A tibble: 11 x 3
tm y tm_num
<S4: Period> <dbl> <dbl>
1 1234.567S 0.2136745 1234.567
2 1234.568S -0.6376908 1234.568
3 1234.569S -1.1080997 1234.569
4 1234.57S -0.4219645 1234.570
5 1234.571S -2.7579118 1234.571
6 1234.572S -1.6626674 1234.572
7 1234.573S -3.2298175 1234.573
8 1234.574S -3.2078864 1234.574
9 1234.575S -3.3982454 1234.575
10 1234.576S -2.1051759 1234.576
11 1234.577S -1.9163266 1234.577
df %>%
ggplot(aes(tm_num, y)) + geom_line() +
scale_x_continuous(sec.axis = dup_axis(),
breaks = function(limits) {
seq(limits[1], limits[2],
by = as.numeric(milliseconds(3)))
},
labels = function(breaks) {format((as_datetime(breaks)),
format = "%H:%M:%OS3")})
Here we've got an observation every millisecond for 11 hours starting at t = 20min34.567sec. So in breaks = we dispense with any rounding, since we don't want integers now. Then we use breaks every milliseconds(2). Then labels = needs to be formatted to accept decimal seconds, the "%OS3" means 3 digits of decimals for the seconds place (can accept up to 6, see ?strptime).
Is all of this worth it? Probably not, unless you really really want a duplicated time axis. I'll probably post this as an issue on the ggplot2 GitHub, because dup_axis should "just work" with datetimes.
Option 1: This is not very elegant but it works using the cowplot::align_plots function:
library(cowplot)
library(ggplot2)
dat1 <- tibble::tibble(x = c(rep("a", 50), rep("b", 50)),
y = runif(100))
p <- ggplot2::ggplot(dat1) +
ggplot2::geom_boxplot(ggplot2::aes(x = x, y = y))
p <- p + ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~ .))
p1 <- p + scale_x_discrete(position = c( "bottom"))
p2 <- p + scale_x_discrete(position = c( "top"))
plots <- align_plots(p1, p2, align="hv")
ggdraw() + draw_grob(plots[[1]]) + draw_grob(plots[[2]])
Option 2:
library(forcats)
dat1$num <- as.numeric(fct_recode(dat1$x, "1" = "a", "2" = "b"))
x11();ggplot2::ggplot(dat1, (aes(x = num, y = y, group = num))) +
geom_boxplot()+
ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~ .)) +
scale_x_continuous(position = c("top"), breaks = c(1,2), labels = c("a", "b"),
sec.axis = ggplot2::sec_axis(~ .,breaks = c(1,2), labels = c("a", "b")))
Note: an answer to similar problem was posted [here] using the cowplot package (Duplicating Discrete Axis in ggplot2), but it didn't work for me. The cowplot::switch_axis_position() function has been deprecated.
I have some rainfall data collected continuously from which I have calculated daily totals. Here is some toy data:
Date <- c(seq(as.Date("2016-07-01"), by = "1 day", length.out = 10))
rain_mm <- c(3,6,8,12,0,0,34,23,5,1)
rain_data <- data.frame(Date, rain_mm)
I can plot this data as follows:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity") +
scale_x_date(date_labels = "%d")
Which gives the following:
This seems fine. It is clear how much rainfall there was on a certain day. However, it could also be interpreted that between midday of one day and midday of the next, a certain amount of rain fell, which is wrong. This is especially a problem if the graph is combined with other plots of related continuous variables over the same period.
To get round this issue I could use geom_step as follows:
library(ggplot)
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
scale_x_date(date_labels = "%d")
Which gives:
This is a better way to display the data, and now scale_x_date appears to be a continuous axis. However, it would be nice to get the area below the steps filled but cant seem to find a straight forward way of doing this.
Q1: How can I fill beneath the geom_step? Is it possible?
It may also be useful to convert Date into POSIXct to facilitate identical x-axis in multi-plot figures as discussed in this SO question here.
I can do this as follows:
library(dplyr)
rain_data_POSIX <- rain_data %>% mutate(Date = as.POSIXct(Date))
Date rain_mm
1 2016-07-01 01:00:00 3
2 2016-07-02 01:00:00 6
3 2016-07-03 01:00:00 8
4 2016-07-04 01:00:00 12
5 2016-07-05 01:00:00 0
6 2016-07-06 01:00:00 0
7 2016-07-07 01:00:00 34
8 2016-07-08 01:00:00 23
9 2016-07-09 01:00:00 5
10 2016-07-10 01:00:00 1
However, this gives a time of 01:00 for each date. I would rather have 00:00. Can I change this in the as.POSIXct function call, or do I have to do it afterwards using a separate function? I think it is something to do with tz = "" but cant figure it out.
How can I convert from class Date to POSIXct so that the time generated is 00:00?
Thanks
For your first question, you can work off this example. First, create a time-lagged version of your data:
rain_tl <- mutate( rain_data, rain_mm = lag( rain_mm ) )
Then combine this time-lagged version with the original data, and re-sort by date:
rain_all <- bind_rows( old = rain_data, new = rain_tl, .id="source" ) %>%
arrange( Date, source )
(Note the newly created source column is used to break ties, correctly interlacing the original data with the time-lagged version):
> head( rain_all )
source Date rain_mm
1 new 2016-07-01 NA
2 old 2016-07-01 3
3 new 2016-07-02 3
4 old 2016-07-02 6
5 new 2016-07-03 6
6 old 2016-07-03 8
You can now use the joint matrix to "fill" your steps:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
geom_ribbon( data = rain_all, aes( ymin = 0, ymax = rain_mm ),
fill="tomato", alpha=0.5 ):
This produces the following plot:
For your second question, the problem is that as.POSIX.ct does not pass additional arguments to the converter, so specifying the tz argument does nothing.
You basically have two options:
1) Reformat the output to what you want: format( as.POSIXct( Date ), "%F 00:00" ), which returns a vector of type character. If you want to preserve the object type as POSIXct, you can instead...
2) Cast your Date vector to character prior to passing it to as.POSIX.ct: as.POSIXct( as.character(Date) ), but this will leave off the time entirely, which may be what you want anyway.
If you would like to avoid the hack, you can customize the position in the geom_bar expression.
I found good results with:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity", position = position_nudge(x = 0.51), width = 0.99) +
scale_x_date(date_labels = "%d")
I am using ggplot2 to plot my hourly time series data. Data organization is as
> head(df)
timestamp power
1 2015-08-01 00:00:00 584.4069
2 2015-08-01 01:00:00 577.2829
3 2015-08-01 02:00:00 569.0937
4 2015-08-01 03:00:00 561.6945
5 2015-08-01 04:00:00 557.9449
6 2015-08-01 05:00:00 562.4152
I use following ggplot2 command to plot the data:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=pretty_breaks(n=30)) +
theme(axis.text.x = element_text(angle=90,hjust=1))
With this the plotted graph is:
My questions are:
In the plotted graph, why it is showing only the labels corresponding to hour 18. Now, what if I want to display the labels corresponding to hour 12 of each day.
I am plotting hourly data, hoping to see the fine granular details. But, I am not able to see all the hours of entire one month. Can I somehow see the zoomed view for any selected day in the same plot?
Here is a rather long example of scaling dates in ggplot and also a possible interactive way to zoom in on ranges. First, some sample data,
## Make some sample data
library(zoo) # rollmean
set.seed(0)
n <- 745
x <- rgamma(n,.15)*abs(sin(1:n*pi*24/n))*sin(1:n*pi/n/5)
x <- rollmean(x, 3, 0)
start.date <- as.POSIXct('2015-08-01 00:00:00') # the min from your df
dat <- data.frame(
timestamp=as.POSIXct(seq.POSIXt(start.date, start.date + 60*60*24*31, by="hour")),
power=x * 3000)
For interactive zooming, you could try plotly. You need to set it up (get an api-key and username) then just do
library(plotly)
plot_ly(dat, x=timestamp, y=power, text=power, type='line')
and you can select regions of the graph and zoom in on them. You can see it here.
For changing the breaks in the ggplot graphs, here is a function to make date breaks by various intervals at certain hours.
## Make breaks from a starting date at a given hour, occuring by interval,
## length.out is days
make_breaks <- function(strt, hour, interval="day", length.out=31) {
strt <- as.POSIXlt(strt - 60*60*24) # start back one day
strt <- ISOdatetime(strt$year+1900L, strt$mon+1L, strt$mday, hour=hour, min=0, sec=0, tz="UTC")
seq.POSIXt(strt, strt+(1+length.out)*60*60*24, by=interval)
}
One way to zoom in, non-interactively, is to simply subset the data,
library(scales)
library(ggplot2)
library(gridExtra)
## The whole interval, breaks on hour 18 each day
breaks <- make_breaks(min(dat$timestamp), hour=18, interval="day", length.out=31)
p1 <- ggplot(dat,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle("Full Range")
## Look at a specific day, breaks by hour
days <- 20
samp <- dat[format(dat$timestamp, "%d") %in% as.character(days),]
breaks <- make_breaks(min(samp$timestamp), hour=0, interval='hour', length.out=length(days))
p2 <- ggplot(samp,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle(paste("Day:", paste(days, collapse = ", ")))
grid.arrange(p1, p2)
I didn't worked with data time data a lot so my code might look a bit messy... But the solution to 1 is to not use pretty_breaks() but better use concrete breaks and also limit the within the scale_x_datetime() function.
A bad written example might be the following:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"),
breaks=as.POSIXct(sapply(seq(18000, 3600000, 86400), function(x) 0 + x),
origin="2015-10-19 7:00:00"),
limits=c(as.POSIXct(3000, origin="2015-10-19 7:00:00"),
as.POSIXct(30000, origin="2015-10-19 7:00:00"))) +
theme(axis.text.x = element_text(angle=90,hjust=1))
I am not sure how to write the as.POSIXct() more readable... But Basically create the 12 hour point manually and add always a complete day within the range of your data frame...