Binning and making histogram for dates in R - r

I am new to using dates in R so sorry if this is a basic question. I have a data set that has the name of fracking wells and their job end date as listed below:
df = as.data.frame(df)
head(df)
`WellName JobEndDate
1 WILLIAM VALENTINE 1 5/19/1982 12:00:00 AM
2 LIZARD HEAD 1-8H RE 2/7/1995 12:00:00 AM
3 North Westbrook Unit/Well No. 3032 6/11/1996 12:00:00 AM
4 Olene Reagan 3-1 12/13/2001 12:00:00 AM
5 CNX3 9/22/2008 12:00:00 AM
7 CNX2 1/22/2009 12:00:00 AM`
It is a large file with about 100,000 entries that go until 2017. I want to create a histogram based on the job end date. To do that, I figured I would place the dates into bins, breaking by months. However, I am struggling with placing them into bins so that each month has a number corresponding to how many wells were finished in each month. Therefore, I am also struggling with the histogram. I would appreciate any help!! Thank you!

First, extract month from every date
library(data.table)
df$months <- month(df$JobEndDate)
Then, make your plot:
library(ggplot2)
ggplot(df, aes(x='months')) + geom_histogram()
# alternate
hist(table(df$months))

Related

Getting the same day across different years in R

I have a dataset for a time series spanning a couple of years with daily observations. I'm trying to smooth some clearly wrong data inserted there (for example, negative values when the variable cannot take values below zero) and what I came up with was trying to smooth it or "interpolate" it by using both the mean of the days around that observation and the mean of the same day or couple of days from previous years, as I have yearly seasonality (I'm still unsure about this part, any comment would be greatly appreciated).
So my question is whether I can easily access the same day acrosss different years.
Here's a dummy example of my data:
library(tidyverse)
library(lubridate)
date value
2016-10-01 00:00:00 28
2016-10-02 00:00:00 25
2016-10-03 00:00:00 24
2016-10-04 00:00:00 22
2016-10-05 00:00:00 -6
2016-10-06 00:00:00 26
I have that for years 2016 through 2020. So in this example I would use the dates around 2016-10-05 AND I would like to use the dates around the 5th of October from years 2017 to 2020 to kind of maintain the seasonality, but maybe this is incorrect.
I tried to use +years() from lubridate but I still have to do things manually and I would like to kind of autimatize things.
If your question is solely "whether [you] can easily access the same day [across] different years", you could do that as follows:
# say your data frame is called df
library(lubridate)
day(df$date)
This will return the day part of the date for every entry in that column of your data frame.
Edit to reply to comment from asker:
This is a very basic way to specify the day and month for which you would like to obtain the corresponding rows in your data frame:
df[day(df$dates) == 5 & month(df$dates) == 10, ]

Turn date column into days from beginning integer Rstudio

Hi everyone so I am currently plotting time series graphs in Rstudio, I have created a nice time series graph however I would actually like the x axis not to be showing me the date but more like an integer showing a number from the starting date of the graph.
Time Series Graph
Such as instead of seeing 01/01/2021 I want to see day 100, as in its the 100th day of recording data.
Do i need to create another column converting all the days into a numerical value then plot this?
If so how do i do this. At the moment all i have is a Date column and the value i am plotting column.
Column Data
Thanks
Assuming you want 01/01/2021 as first day you could use that as a reference and calculate the number of days passed since the first day of recording and plot that, this should give you more like an integer showing a number from the starting date.
Not sure what your data frame looks like so hopefully this helps.
Using lubridate
library(lubridate)
df
Date
1 01/01/2021
2 02/01/2021
3 03/01/2021
4 04/01/2021
df$days <- yday(dmy(df$Date)) -1
Output:
Date days
1 01/01/2021 0
2 02/01/2021 1
3 03/01/2021 2
4 04/01/2021 3
Which is indeed a numeric
str(df$days)
num [1:4] 0 1 2 3
This a simulation of dates
date.simulation = as.Date(1:100, "2001-01-01")
factor(date.simulation-min(date.simulation))
You just subtract the dates to the minimum date. And you need it as a factor for plotting purposes.

Plotting time-series data

I am trying to plot time-series data showing the count of observations over a 24 hr period. I have turned my POSIXct variable into a table that looks like this:
ABTable1 <- table(cut(AB_Final1$datetime, breaks="30 mins"))
2016-12-17 00:36:00 2016-12-17 01:06:00 2016-12-17 01:36:00 2016-12-17 02:06:00
2 3 1 1
I want to know how to plot this on a plot running from 00:00 to 23:59. At the moment if I try plot it it runs from 00:36. Is there a way I can make a table that includes all 30 min intervals for this time while retaining my counts? I have to do this multiple times for many plots.
Thanks!
You will need to define your break points like this:
breakpoints<-seq(from= as.POSIXct("2016-12-17"), to= as.POSIXct("2016-12-18"), by="30 min")
Then you can substitute these breaks into your cut function:
ABTable1 <- table(cut(AB_Final1$datetime, breaks=breakpoints))

Plotting a variable measured monthly with a variable measured yearly in the same plot (R)

Here are two samples of datasets I would like to plot together on the same plot:
>head(df1)
Date y
1 2015-10-01 6217.734
2 2015-09-01 6242.592
3 2015-08-01 6772.145
4 2015-07-01 6865.719
and
>head(df2)
Year x
1 1980 5760
2 1981 4765
3 1982 2620
4 1983 7484
Given that df2$Year and df1$Date overlap date ranges and df1$y and df2$x are of the same scale, how can I best plot y and x against time on the same plot given that x is measured only yearly and y monthly?
I imagine it will require converting Year to an arbitrary date (1980-01-01, 1981-01-01). But beyond that, other than altering my df2 data.frame to having twelve observations per year with the same x value per observation, then combining the two data.frames, I cannot think of what to do.
I would prefer to use ggplot2 if there is a solution there.
Can you try this out for me?
library(dygraphs)
library(xts)
rename one of your variable to match the other scaled variable
rename Year to match the other's date
then do
prep <- cbind(df1, df2)
ts_object <- as.xts(prep[,2:ncol(prep)], prep$Year)
dygraph(ts_object)
Note that you are providing literally NO data for me to work with here. If you can do so that'd be great. Try using dput(df1), and dput(df2), and post the output of these commands

R: ggplot with datetime on x axis

I'm trying to plot 5 days of historical stock data in R using ggplot. Datetime on the x axis, and the stock values ('close') on the y axis. I only want to show the minutes of the day when the stock market is open, and my data set is limited to 7 hours a day of values x 5 days.
But when I plot it with ggplot, the scale is changed so I get all the hours per day.
ggplot(data = df_stock, aes(x = datetime, y = close)) +
geom_line()
I've tried googling this and using the R help function. I'm quite new to R so my apologies if this is very easy to solve. I hope someone can guide me in the right direction.
While a minimal example would be helpful to address this question, a more general answer which could be useful can be given.
ggplot2 does not have a facility for axis-breaking, as it's not considered good practice. However what you're doing here is actually a transformation of the variable --- hours open rather than hours of the day. So you'll have to transform the variable itself. You can do this using the lubridate package. Let's take two days and imagine the market is open between 10am and 5pm.
require(lubridate)
dates <- c("2014-01-01 10:00:00 UTC", "2014-01-01 13:00:00 UTC", "2014-01-01 17:00:00 UTC", "2014-01-02 13:00:00 UTC", "2014-01-02 17:00:00 UTC")
dates <- ymd_hms(dates)
Now you will need to scale the data to only have the times you want. You can call the parts of the date with `lubridate', and divide by the hours in the day, where here they are 24 rather than 7.
hour(dates) <- hour(dates)-10
scaleddates <- day(dates)-1 + hour(dates)/7 + minute(dates)/60/7 + second(dates)/60/60/7
scaleddates
[1] 0.0000000 0.4285714 1.0000000 1.4285714 2.0000000
Now you can plot the graph, with the x-axis reading 'Days' rather than 'Dates'. The x-axis is now how far you are through the working day.

Resources