Count number of obeserved Month in R - r

I have a data with Date as follows:
2010-01-01
2010-02-07
2010-02-09
2010-03-09
2010-04-06
....
2021-03-31
2021-04-10
I want an output with number of observed Month based on Date as above such as: 1,2,3...100
I tried this code as.numeric(as.factor(format(flights.input$Date,"%m")))
But it stops counting at 12, and counts again from 1 while I want to count consecutively.

You can try:
data.table::setDT(df)[, NumberOfMonth := rleid(format(as.Date(as.character(Date)), "%m"))]

We. can use rle from base R to create the sequence after extracting the month from the 'Date' column
fm1 <- format(flights.input$Date, "%m")
with(rle(fm1), rep(seq_along(values), lengths))

Related

R how to replace/gsub a vector of values by another vector of values in a datatable

I have data with dates in a not directly usable format. I have data that are either annual, quaterly or mensual. Annual are stored correctly, quaterly are in the form 1Q2010, and monthly JAN2010.
So something like
library(tidyverse)
library(data.table)
MWE <- data.table(date=c("JAN2020","FEB2020","1Q2020","2020"),
value=rnorm(4,2,1))
> MWE
date value
1: JAN2020 2.5886057
2: FEB2020 0.5913031
3: 1Q2020 1.6237973
4: 2020 1.4093762
I want to have them in a standard format. I thing a decently readable way to do that is to replace the non standard elements, so to have these elements :
Date_Brute <- c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC","1Q","2Q","3Q","4Q")
Replaced by these ones
Date_Standardisee <- c("01-01","01-02","01-03","01-04","01-05","01-06","01-07", "01-08","01-09","01-10","01-11","01-12","01-01","01-04","01-07","01-10")
Now I think gsub does not work with vectors. I have found this answer that suggests using stingr::str_replace_all but I have not been able to make it function in a data.table.
I am open to other functions to replace a vector by another one, but would like to avoid for instance slicing the data, and using specific date lectures functions.
Desired output :
> MWE
date value
1: 01-01-2020 2.5886057
2: 01-02-2020 0.5913031
3: 01-01-2020 1.6237973
4: 2020 1.4093762
You can try with lubridate::parse_date_time() and which takes a vector of candidate formats to attempt in the conversion:
library(lubridate)
library(data.table)
MWE[, date := parse_date_time(date, orders = c("bY","qY", "Y"))]
date value
1: 2020-01-01 -0.4948354
2: 2020-02-01 1.0227036
3: 2020-01-01 2.6285688
4: 2020-01-01 1.9158595
We can use grep with as.yearqtr and as.yearmon to convert those 'date' elements into Date class and further change it to the specified format
library(zoo)
library(data.table)
MWE[grep('Q', date), date := format(as.Date(as.yearqtr(date,
'%qQ %Y')), '%d-%m-%Y')]
MWE[grep("[A-Z]", date), date := format(as.Date(as.yearmon(date)), '%d-%m-%Y')]
-output
MWE
# date value
#1: 01-01-2020 0.8931051
#2: 01-02-2020 2.9813625
#3: 01-01-2020 1.1918638
#4: 2020 2.8001267
Or another option is fcoalecse with myd from lubridate
library(lubridate)
MWE[, date := fcoalesce(format(myd(date, truncated = 2), '%d-%m-%Y'), date)]

R convert yy-mm string to date format [duplicate]

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Calculate difference between timestamps

I try to find the difference between two timestamps.
The codeQ:
survey <- data.frame(date=c("07/2012","07/2012"),tx_start=c("01/2012","01/2012"))
survey$date_diff <- as.Date(as.character(survey$date), format="%m/%Y")-
as.Date(as.character(survey$tx_start), format="%m/%Y")
survey
I expect to have in the new column the different but I take NA
The results:
> survey
date tx_start date_diff
1 07/2012 01/2012 NA days
2 07/2012 01/2012 NA days
What should I change to replace as.Date for months or years?
Update based on comment of Gregor:
> survey <- data.frame(date=c("07/2012","07/2012"),tx_start=c("01/2012","01/2012"))
> survey$date <- as.Date(paste0("01/", as.character(survey$date)), "%d/%m/%Y")
> survey$tx_start <- as.Date(paste0("01/", as.character(survey$tx_start)), "%d/%m/%Y")
> survey$date_diff <- as.Date(survey$date, format="%d/%m/%Y")-
+ as.Date(survey$tx_start, format="%d/%m/%Y")
> survey
date tx_start date_diff
1 2012-07-01 2012-01-01 182 days
2 2012-07-01 2012-01-01 182 days
I usually convert my dates to POSIXct format. Then, when direct differences are taken with normal syntax, you get an answer in units of seconds. There is a difftime() function in base R that you can use as well:
survey <- data.frame(date=c("07/2012","07/2012"),tx_start=c("01/2012","01/2012"))
# Dates are finicky, add a day so that conversion will work
survey$date2 <- paste0("01/",survey$date)
survey$tx_start2 <- paste0("01/",survey$tx_start)
# conversion
survey$date2 <- as.POSIXct(x=survey$date2,format="%d/%m/%Y")
survey$tx_start2 <- as.POSIXct(x=survey$tx_start2,format="%d/%m/%Y")
# take the difference
survey$date_diff <- with(survey,difftime(time1=date2,time2=tx_start2,units="hours"))

Converting yearmon column to last date of the month in R

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Change column Year in R

I have an excel file, in the date column, it shows from 1/1/15 to 12/31/15. I want to change all 15(year) to 14, so that all Date looks like from 1/1/14 to 12/31/14. How to do that in R? Right now I just use replace function manually changed the date. But there are 150000 more records....
If you don't want to convert to 'Date' class and keep the same format, one option would be sub. Here we match the last two characters that are 14 and replace it with 15.
sub('14$', '15', v1)
#[1] "1/1/15" "12/31/15" "1/1/15"
data
v1 <- c('1/1/15', '12/31/15', '1/1/14')
You could use lubridate where you can just subtract 'x' number of years.
library(lubridate)
# some random 2015 dates
df <- data.frame(dates = mdy("01/13/2015", "02/25/2015"))
# subtract 1 year
df$dates <- with(df, dates - years(1))
df
dates
1 2014-01-13
2 2014-02-25

Resources