Plotting time-series data - r

I am trying to plot time-series data showing the count of observations over a 24 hr period. I have turned my POSIXct variable into a table that looks like this:
ABTable1 <- table(cut(AB_Final1$datetime, breaks="30 mins"))
2016-12-17 00:36:00 2016-12-17 01:06:00 2016-12-17 01:36:00 2016-12-17 02:06:00
2 3 1 1
I want to know how to plot this on a plot running from 00:00 to 23:59. At the moment if I try plot it it runs from 00:36. Is there a way I can make a table that includes all 30 min intervals for this time while retaining my counts? I have to do this multiple times for many plots.
Thanks!

You will need to define your break points like this:
breakpoints<-seq(from= as.POSIXct("2016-12-17"), to= as.POSIXct("2016-12-18"), by="30 min")
Then you can substitute these breaks into your cut function:
ABTable1 <- table(cut(AB_Final1$datetime, breaks=breakpoints))

Related

Plot data over time in R

I'm working with a dataframe including the columns 'timestamp' and 'amount'. The data can be produced like this
sample_size <- 40
start_date = as.POSIXct("2020-01-01 00:00")
end_date = as.POSIXct("2020-01-03 00:00")
timestamps <- as.POSIXct(sample(seq(start_date, end_date, by=60), sample_size))
amount <- rpois(sample_size, 5)
df <- data.frame(timestamps=timestamps, amount=amount)
Now I'd like to plot the sum of the amount entries for some timeframe (like every hour, 30 min, 20 min). The final plot would look like a histogram of the timestamps but should not just count how many timestamps fell into the timeframe, but what amount fell into the timeframe.
How can I approach this? I could create an extra vector with the amount of each timeframe, but don't know how to proceed.
Also I'd like to add a feature to reduce by hour. Such that just just one day is plotted (notice the range between start_date and end_date is two days) and in each timeframe (lets say every hour) the amount of data located in this hour is plotted. In this case the data
2020-01-01 13:03:00 5
2020-01-02 13:21:00 10
2020-01-02 13:38:00 1
2020-01-01 13:14:00 3
would produce a bar of height sum(5, 10, 1, 3) = 19 in the timeframe 13:00-14:00. How can I implement the plotting to easily switch between these two modes (plot days/plot just one day and reduce)?
EDIT: Following the advice of #Gregor Thomas I added a grouping column like this:
df$time_group <- lubridate::floor_date(df$timestamps, unit="20 minutes")
Now I'm wondering how to ignore the dates and thus reduce by 20 minute frame (independent of date).

Binning and making histogram for dates in R

I am new to using dates in R so sorry if this is a basic question. I have a data set that has the name of fracking wells and their job end date as listed below:
df = as.data.frame(df)
head(df)
`WellName JobEndDate
1 WILLIAM VALENTINE 1 5/19/1982 12:00:00 AM
2 LIZARD HEAD 1-8H RE 2/7/1995 12:00:00 AM
3 North Westbrook Unit/Well No. 3032 6/11/1996 12:00:00 AM
4 Olene Reagan 3-1 12/13/2001 12:00:00 AM
5 CNX3 9/22/2008 12:00:00 AM
7 CNX2 1/22/2009 12:00:00 AM`
It is a large file with about 100,000 entries that go until 2017. I want to create a histogram based on the job end date. To do that, I figured I would place the dates into bins, breaking by months. However, I am struggling with placing them into bins so that each month has a number corresponding to how many wells were finished in each month. Therefore, I am also struggling with the histogram. I would appreciate any help!! Thank you!
First, extract month from every date
library(data.table)
df$months <- month(df$JobEndDate)
Then, make your plot:
library(ggplot2)
ggplot(df, aes(x='months')) + geom_histogram()
# alternate
hist(table(df$months))

Plotting a variable measured monthly with a variable measured yearly in the same plot (R)

Here are two samples of datasets I would like to plot together on the same plot:
>head(df1)
Date y
1 2015-10-01 6217.734
2 2015-09-01 6242.592
3 2015-08-01 6772.145
4 2015-07-01 6865.719
and
>head(df2)
Year x
1 1980 5760
2 1981 4765
3 1982 2620
4 1983 7484
Given that df2$Year and df1$Date overlap date ranges and df1$y and df2$x are of the same scale, how can I best plot y and x against time on the same plot given that x is measured only yearly and y monthly?
I imagine it will require converting Year to an arbitrary date (1980-01-01, 1981-01-01). But beyond that, other than altering my df2 data.frame to having twelve observations per year with the same x value per observation, then combining the two data.frames, I cannot think of what to do.
I would prefer to use ggplot2 if there is a solution there.
Can you try this out for me?
library(dygraphs)
library(xts)
rename one of your variable to match the other scaled variable
rename Year to match the other's date
then do
prep <- cbind(df1, df2)
ts_object <- as.xts(prep[,2:ncol(prep)], prep$Year)
dygraph(ts_object)
Note that you are providing literally NO data for me to work with here. If you can do so that'd be great. Try using dput(df1), and dput(df2), and post the output of these commands

How to create a proper time series `ts()` object with weekly figures in R

I often used the ts() objects for yearly, quarterly or monthly time series, but now I would like to use it for weekly. Now the challenge is that not every year has the same number of weeks (either 52 or 53 weeks). How to deal with this?
I usually take the first day of the week as an identifier for the week (e.g. 2013-05-20 or 2013-05-27).
Can anybody advise how I would create a proper weekly time serie for the following dataset (x).
Date Qty
2013-05-20 25
2013-05-27 60
....
Something along the lines of:
ts <- ts(x$Qty, start=as.Date(x$Date), frequency=????)
Thank you for your assistance.
DF <- read.table(text="Date Qty
2013-05-20 25
2013-05-27 60",header=TRUE)
DF$Date <- as.Date(DF$Date)
library(xts)
my.xts <- as.xts(DF[,-1,drop=FALSE],order.by=DF$Date)
as.ts(my.xts)
# Time Series:
# Start = 1
# End = 8
# Frequency = 0.142857142857143
# [1] 25 60

R plot density smoothed time series

I wish to make a probability distribution of some time series data. My data is in the following format
00:00, 3
01:00, 50
05:00, 13
10:00, 34
17:00, 80
21:00, 100
The time column has some missing values that R will have to interpolate. I want to get a nice smooth curve to highlight the busy periods. I have tried with ts, density and plot but these don't produce what I'm after. For example,
data1 <- read.csv(file="c:\\abc\\ts.csv", head=FALSE, sep=",")
data1$V1 <- strptime(data1$V1, format="%H:%M")
plot(data1$V2, density(data1$V1), type="l")
But this gives me lines drawn in crazy order and as a probability distribution.
I think you are definitely after package zoo, which has several functions to deal with NAs. See na.aggregate, na.approx and na.locf also.
You made it a little harder than you might realize. I'll make it easier for now by adding a date in front of your times.
Also, I added a variable "texinp" and a textConnection() statement so you can cut/paste the following code and run it directly. The data is loaded into variable texinp and is read by the read.zoo statement in a similar way to reading a .csv file. For now, this will allow you to plot things and gives you an idea of how to read .csv files using read.zoo.
library(zoo)
library(chron)
texinp <- "
Time, Mydata
2011-02-06 00:00, 3
2011-02-06 01:00, 50
2011-02-06 05:00, 13
2011-02-06 10:00, 34
2011-02-06 17:00, 80
2011-02-06 21:00, 100"
myd.zoo <- read.zoo(textConnection(texinp), header=TRUE, FUN = as.chron, sep=",")
myd.zoo
plot(myd.zoo)
From your question, you talked about "busy periods". I may be wrong, but I'm assuming that the value of 100 at time 21:00 is the "busiest period". If that's true, then you don't need a density plot, and the above plot is what you're after.
Let me know if I'm wrong.

Resources