I have some rainfall data collected continuously from which I have calculated daily totals. Here is some toy data:
Date <- c(seq(as.Date("2016-07-01"), by = "1 day", length.out = 10))
rain_mm <- c(3,6,8,12,0,0,34,23,5,1)
rain_data <- data.frame(Date, rain_mm)
I can plot this data as follows:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity") +
scale_x_date(date_labels = "%d")
Which gives the following:
This seems fine. It is clear how much rainfall there was on a certain day. However, it could also be interpreted that between midday of one day and midday of the next, a certain amount of rain fell, which is wrong. This is especially a problem if the graph is combined with other plots of related continuous variables over the same period.
To get round this issue I could use geom_step as follows:
library(ggplot)
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
scale_x_date(date_labels = "%d")
Which gives:
This is a better way to display the data, and now scale_x_date appears to be a continuous axis. However, it would be nice to get the area below the steps filled but cant seem to find a straight forward way of doing this.
Q1: How can I fill beneath the geom_step? Is it possible?
It may also be useful to convert Date into POSIXct to facilitate identical x-axis in multi-plot figures as discussed in this SO question here.
I can do this as follows:
library(dplyr)
rain_data_POSIX <- rain_data %>% mutate(Date = as.POSIXct(Date))
Date rain_mm
1 2016-07-01 01:00:00 3
2 2016-07-02 01:00:00 6
3 2016-07-03 01:00:00 8
4 2016-07-04 01:00:00 12
5 2016-07-05 01:00:00 0
6 2016-07-06 01:00:00 0
7 2016-07-07 01:00:00 34
8 2016-07-08 01:00:00 23
9 2016-07-09 01:00:00 5
10 2016-07-10 01:00:00 1
However, this gives a time of 01:00 for each date. I would rather have 00:00. Can I change this in the as.POSIXct function call, or do I have to do it afterwards using a separate function? I think it is something to do with tz = "" but cant figure it out.
How can I convert from class Date to POSIXct so that the time generated is 00:00?
Thanks
For your first question, you can work off this example. First, create a time-lagged version of your data:
rain_tl <- mutate( rain_data, rain_mm = lag( rain_mm ) )
Then combine this time-lagged version with the original data, and re-sort by date:
rain_all <- bind_rows( old = rain_data, new = rain_tl, .id="source" ) %>%
arrange( Date, source )
(Note the newly created source column is used to break ties, correctly interlacing the original data with the time-lagged version):
> head( rain_all )
source Date rain_mm
1 new 2016-07-01 NA
2 old 2016-07-01 3
3 new 2016-07-02 3
4 old 2016-07-02 6
5 new 2016-07-03 6
6 old 2016-07-03 8
You can now use the joint matrix to "fill" your steps:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
geom_ribbon( data = rain_all, aes( ymin = 0, ymax = rain_mm ),
fill="tomato", alpha=0.5 ):
This produces the following plot:
For your second question, the problem is that as.POSIX.ct does not pass additional arguments to the converter, so specifying the tz argument does nothing.
You basically have two options:
1) Reformat the output to what you want: format( as.POSIXct( Date ), "%F 00:00" ), which returns a vector of type character. If you want to preserve the object type as POSIXct, you can instead...
2) Cast your Date vector to character prior to passing it to as.POSIX.ct: as.POSIXct( as.character(Date) ), but this will leave off the time entirely, which may be what you want anyway.
If you would like to avoid the hack, you can customize the position in the geom_bar expression.
I found good results with:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity", position = position_nudge(x = 0.51), width = 0.99) +
scale_x_date(date_labels = "%d")
Related
I am trying to plot a time series from an excel file in R Studio. It has a single column named 'Dates'. This column contains datetime data of customer visits in the form 2/15/2014 6:17:22 AM. The datetime was originally in char format and I converted it into a Large POSIXct value using lubridate:
tsData <- mdy_hms(fullUsage$Dates)
Which gives me a value:
POSIXct[1:25,354], format: "2018-04-13 10:18:14" "2018-04-14 13:27:11" .....
I then tried converting it into a time series object using the code below:
require(xts)
visitTimes.ts <- xts(tsData, start = 1, order.by=as.POSIXct(tsData))
plot(visitTimes.ts)
ts_plot(visitTimes.ts)
ts_info(visitTimes.ts)
Im not 100% sure but it looks like the plot is coming out using the sum count of visits. I believe my problem may be in correctly indexing my data using the dates. I apologize in advance if this is a simple issue to deal with I am still learning R. I have included the screenshot of my plot.
yes you are right, you need to provide both the date column (x axis) and the value (y axis)
here's a simple example:
v1 <- data.frame(Date = mdy_hms(c("1-1-2020-00-00-00", "1-2-2020-00-00-00", "1-3-2020-00-00-00")), Value = c(1, 3, 6))
v2 <- xts(v1["Value"], order.by = v1[, "Date"])
plot(v2)
first argument of xts takes the x values, on the order.by i leave the actual ts object
You need to count the number of events in each time period and plot these values on the y axis. You didn't provide enough data for a reproducible example, so I have created a small example. We'll use the tidyverse packages dplyr and lubridate to help us out here:
library(lubridate)
library(dplyr)
library(ggplot2)
set.seed(69)
fullUsage <- data.frame(Dates = as.POSIXct("2020-01-01") +
minutes(round(cumsum(rexp(10000, 1/25))))
)
head(fullUsage)
#> Dates
#> 1 2020-01-01 00:02:00
#> 2 2020-01-01 00:15:00
#> 3 2020-01-01 00:22:00
#> 4 2020-01-01 00:29:00
#> 5 2020-01-01 01:13:00
#> 6 2020-01-01 01:27:00
First of all, we will create columns that show the hour of day and the month that events occurred:
fullUsage$hours <- hour(fullUsage$Dates)
fullUsage$month <- floor_date(fullUsage$Dates, "month")
Now we can effectively just count the number of events per month and plot this number for each month:
fullUsage %>%
group_by(month) %>%
summarise(n = length(hours)) %>%
ggplot(aes(month, n)) +
geom_col()
And we can do the same for the hour of day:
fullUsage %>%
group_by(hours) %>%
summarise(n = length(hours)) %>%
ggplot(aes(hours, n)) +
geom_col() +
scale_x_continuous(breaks = 0:23) +
labs(y = "Hour of day")
Created on 2020-08-05 by the reprex package (v0.3.0)
I have a 24 hour data starting from 7:30 today (for example), until 7:30 the next day, because I didn't link the date to the line plot, R sorts the hour starting from 00:00 despite the data starting at 7:30, I am a beginner in R, and I don't know where to begin to even solve this problem, should I try linking the date also to the X axis, or is there a better solution?
My time function somehow didn't work either, it used to work when I was plotting data for 15 minute increments.
library(chron)
d <- read.csv(file="data.csv", header = T)
t <- times(d$Time)
plot(t,d$MCO2, type="l")
Graph created from the 24 hour data I have :
Graph created from a 15 minute data using the same code :
I wanted the outcome to be from 7:30 to 7:30 the next day, but it showed now a decimal number from 0.0 to 1
Here is the link to the data, just in case:
https://www.dropbox.com/s/wsg437gu00e5t08/Data%20210519.csv?dl=0
The question is actually about combining a date column and a time column to create a timestamp containing date AND time. Note that I suggest to process everything as if we are in GMT timezone. You can pick whatever timezone you want, then stick to it.
# use ggplot
library(ggplot2)
# assume everything happens in GMT timezone
Sys.setenv( TZ = "GMT" )
# replicating the data: a measurement result sampled at 1 sec interval
t <- seq(start, end, by = "1 sec")
Time24 <- trimws(strftime(t, format = "%k:%M:%OS", tz="GMT"))
Date <- strftime(t, format = "%d/%m/%Y", tz="GMT")
head(Time24)
head(Date)
d <- data.frame(Date, Time24)
# this is just a random data of temperature
d$temp <- rnorm(length(d$Date),mean=25,sd=5)
head(d)
# the resulting data is as follows
# Date Time24 temp
#1 22/05/2019 0:00:00 22.67185
#2 22/05/2019 0:00:01 19.91123
#3 22/05/2019 0:00:02 19.57393
#4 22/05/2019 0:00:03 15.37280
#5 22/05/2019 0:00:04 31.76683
#6 22/05/2019 0:00:05 26.75153
# this is the answer to the question
# which is combining the the date and the time column of the data
# note we still assume that this happens in GMT
t <- as.POSIXct(paste(d$Date,d$Time24,sep=" "), format = "%d/%m/%Y %H:%M:%OS", tz="GMT")
# print the data into a plot
png(filename = "test.png", width = 800, height = 600, units = "px", pointsize = 22 )
ggplot(d,aes(x=t,y=temp)) + geom_line() +
scale_x_datetime(date_breaks = "3 hour",
date_labels = "%H:%M\n%d-%b")
The problem is that the function times does not include information about the day. This is a problem since your data spans two days.
The data type you use should be able to include information about the day. Posix is this data type. Also, since Posix is the go-to date-time object in R it is much easier to plot.
Before plotting the data, the time column should have the correct difference in days. When just transforming the column with as.POSIXct, the times of day 2 are read as if it is from day 1. This is why we have to add 24 hours to the correct entries.
After that, it is just a matter of plotting. I added an example of the package of ggplot2 since I prefer these plots.
You might notice that using as.POSIXct will add an incorrect date to your time information. Don't bother about this, you use this date just as a dummy date. You don't use this date itself, you just use it to be able to work with the difference in days.
library(ggplot2)
# Read in your data set
d <- read.csv(file="Data 210519.csv", header = T)
# Read column into R date-time object
t <- as.POSIXct(d$Time24, format = "%H:%M:%OS")
# Add 24 hours to time the time on day 2.
startOfDayTwo <- as.POSIXct("00:00:00", format = "%H:%M:%OS")
endOfDayTwo <- as.POSIXct("07:35:00", format = "%H:%M:%OS")
t[t >= startOfDayTwo & t <= endOfDayTwo] <- t[t >= startOfDayTwo & t <= endOfDayTwo] + 24*60*60
plot(t,d$MCO2, type="l")
# arguably a nicer plot
ggplot(d,aes(x=t,y=MCO2)) + geom_line() +
scale_x_datetime(date_breaks = "2 hour",
date_labels = "%I:%M %p")
I currently have a plot of my data that looks like this:
However because of the negative spike in around 2017, the graph shows values above and below the x axis. How do I make it so the graph only shows values above the x axis?
This is the code I am currently using to produce my graph
plot(dandpw)
addLegend(lty = 1)
mydata
> head(dandpw)
QLD1.Price NSW1.Price VIC1.Price SA1.Price TAS1.Price
2008-01-07 10:30:00 33.81019 36.52777 49.66935 216.45379 30.88968
2008-01-14 10:30:00 45.09321 37.55887 49.04155 248.33518 51.16057
2008-01-21 10:30:00 27.22551 29.57798 31.28935 31.56158 45.99226
2008-01-28 10:30:00 26.14283 27.32113 30.20470 31.90042 53.48170
2008-02-04 10:30:00 91.86961 36.77000 37.09027 37.57167 56.28464
2008-02-11 10:30:00 62.60607 28.83509 34.95866 35.18217 55.78961
dput(head(dandpw
You can do this in two ways. Since there is no usable dput (only the picture), I assume your data is in a data frame.
You can remove negative numbers from your dataset
You can put limits on the y-axis shown in the chart (using ggplot2)
Method 1 (not recommended as it alters your data):
#remove negatives and replace with NA. Can also replace with 0 if desired
dandpw[dandpw < 0] <- NA
Method 2:
#assume dandpw is data frame
library(tidyverse)
names(dandpw)[1] <- "date" #looks like your date column might not be named
#ggplot prefers long format
dandpw <- dandpw %>% gather(variables, values, -date)
ggplot(data = dandpw, aes(x = date, y = values, color = variables)) +
geom_line() +
coord_cartesian(ylim = c(0, max(dandpw$values, na.rm = T) * 1.1 ))
I have a time series which shows the electricity load for every 15min during one year. I already filtered to show only one specific weekday.
My dataframe:
Date Timestamp Weekday Load
2017-01-02 00:00:00 Monday 272
2017-01-02 00:15:00 Monday 400
2017-01-02 00:30:00 Monday 699
2017-01-02 00:45:00 Monday 764
2017-01-02 01:00:00 Monday 983
..
..
2017-01-09 00:45:00 Monday 764
2017-01-09 01:00:00 Monday 983
..
2017-12-25 23:45:00 Monday 983
Now I want to plot several line diagrams for every monday in one diagram:
x axis = Timestamp
y axis = Load
I tried with ggplot:
ggplot(Loadprofile, aes(x= Timestamp, y = Load, color = Date)) + geom_line()
But this brings me following error
Error: Aesthetics must be either length 1 or the same as the data (4992): x, y, colour
That is the output, the x-axis does not look continious though?
enter image description here
Any suggestions?
Your problem is that you need Date to be a factor, but when it is on a Date form, ggplot takes it as a continuous variable.
I simulated some data, just to be able to do the graph, the following code is the one I used to generate the data:
library(tidyverse)
library(lubridate)
DateTimes <- seq(
from=as.POSIXct("2017-1-02 0:00", tz="UTC"),
to=as.POSIXct("2017-1-09 23:59", tz="UTC"),
by="15 min"
)
DF <- data.frame(Date = as.Date(DateTimes), timestamp = strftime(DateTimes, format="%H:%M:%S"), Weekday = weekdays(DateTimes)) %>% filter(Weekday == "Monday") %>% mutate(load = as.numeric(timestamp)*20 + -as.numeric(timestamp)^2 + rnorm(nrow(DF), sd = 1000) + (as.numeric(Date))) %>% mutate(load = ifelse(Date < ymd("2017_01_4"), load -5000, load))
Once I have done that, if I do the following:
ggplot(DF, aes(x = timestamp, y = load)) + geom_line(aes(group = as.factor(Date), color = as.factor(Date
I get the following graph
I think that is what you need, if you need more help formating the x axis and legend let me know
Cheers
I am using ggplot2 to plot my hourly time series data. Data organization is as
> head(df)
timestamp power
1 2015-08-01 00:00:00 584.4069
2 2015-08-01 01:00:00 577.2829
3 2015-08-01 02:00:00 569.0937
4 2015-08-01 03:00:00 561.6945
5 2015-08-01 04:00:00 557.9449
6 2015-08-01 05:00:00 562.4152
I use following ggplot2 command to plot the data:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=pretty_breaks(n=30)) +
theme(axis.text.x = element_text(angle=90,hjust=1))
With this the plotted graph is:
My questions are:
In the plotted graph, why it is showing only the labels corresponding to hour 18. Now, what if I want to display the labels corresponding to hour 12 of each day.
I am plotting hourly data, hoping to see the fine granular details. But, I am not able to see all the hours of entire one month. Can I somehow see the zoomed view for any selected day in the same plot?
Here is a rather long example of scaling dates in ggplot and also a possible interactive way to zoom in on ranges. First, some sample data,
## Make some sample data
library(zoo) # rollmean
set.seed(0)
n <- 745
x <- rgamma(n,.15)*abs(sin(1:n*pi*24/n))*sin(1:n*pi/n/5)
x <- rollmean(x, 3, 0)
start.date <- as.POSIXct('2015-08-01 00:00:00') # the min from your df
dat <- data.frame(
timestamp=as.POSIXct(seq.POSIXt(start.date, start.date + 60*60*24*31, by="hour")),
power=x * 3000)
For interactive zooming, you could try plotly. You need to set it up (get an api-key and username) then just do
library(plotly)
plot_ly(dat, x=timestamp, y=power, text=power, type='line')
and you can select regions of the graph and zoom in on them. You can see it here.
For changing the breaks in the ggplot graphs, here is a function to make date breaks by various intervals at certain hours.
## Make breaks from a starting date at a given hour, occuring by interval,
## length.out is days
make_breaks <- function(strt, hour, interval="day", length.out=31) {
strt <- as.POSIXlt(strt - 60*60*24) # start back one day
strt <- ISOdatetime(strt$year+1900L, strt$mon+1L, strt$mday, hour=hour, min=0, sec=0, tz="UTC")
seq.POSIXt(strt, strt+(1+length.out)*60*60*24, by=interval)
}
One way to zoom in, non-interactively, is to simply subset the data,
library(scales)
library(ggplot2)
library(gridExtra)
## The whole interval, breaks on hour 18 each day
breaks <- make_breaks(min(dat$timestamp), hour=18, interval="day", length.out=31)
p1 <- ggplot(dat,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle("Full Range")
## Look at a specific day, breaks by hour
days <- 20
samp <- dat[format(dat$timestamp, "%d") %in% as.character(days),]
breaks <- make_breaks(min(samp$timestamp), hour=0, interval='hour', length.out=length(days))
p2 <- ggplot(samp,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle(paste("Day:", paste(days, collapse = ", ")))
grid.arrange(p1, p2)
I didn't worked with data time data a lot so my code might look a bit messy... But the solution to 1 is to not use pretty_breaks() but better use concrete breaks and also limit the within the scale_x_datetime() function.
A bad written example might be the following:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"),
breaks=as.POSIXct(sapply(seq(18000, 3600000, 86400), function(x) 0 + x),
origin="2015-10-19 7:00:00"),
limits=c(as.POSIXct(3000, origin="2015-10-19 7:00:00"),
as.POSIXct(30000, origin="2015-10-19 7:00:00"))) +
theme(axis.text.x = element_text(angle=90,hjust=1))
I am not sure how to write the as.POSIXct() more readable... But Basically create the 12 hour point manually and add always a complete day within the range of your data frame...