I would like to extract ONLY the quarter from a date, e.g., to get an integer 1 from the date "2003-02-08". I have been trying something along this line
library(mondate)
as.yearqtr(dat$DATE)
"2003 Q1"
as.character(as.yearqtr(dat$DATE))[1]
"2003 Q1"
which hasn't been giving my desired result. Of course I can write conditions as follows
library(data.table)
data$DATE = as.Date(data$DATE, format='%d%b%Y')
data$month=month(data$DATE)
setDT(data)[month==1, quarter:=1]
...
This will work, but is not elegant at all. Is there a more beautiful way of doing this?
Thank you lmo and user2100721! I really wish I could accept all of the answers!
There is a base R function, quarters, that more or less accomplishes what you want, though it prepends "Q". So
quarters(as.Date("2001-05-01"))
[1] "Q2"
If it is important to get rid of the "Q", you could use substr
substr(quarters(as.Date("2001-05-01")), 2, 2)
[1] "2"
Other date-related base R functions, such as weekdays and months can be found in help page ?quarters.
I would do:
# example data
DT = data.table(id = 1:10, d = as.IDate("2003-02-08") + seq(100, by=50, length.out=10))
DT[, qtr := quarter(d)]
id d qtr
1: 1 2003-05-19 2
2: 2 2003-07-08 3
3: 3 2003-08-27 3
4: 4 2003-10-16 4
5: 5 2003-12-05 4
6: 6 2004-01-24 1
7: 7 2004-03-14 1
8: 8 2004-05-03 2
9: 9 2004-06-22 2
10: 10 2004-08-11 3
The quarter function is provided by data.table and works on both Date and IDate vectors. (IDate uses integer storage.)
lubridate package has the same function. We can use that also. I am using #Frank's DT
DT[, qtr := lubridate::quarter(d)]
dint package also is suitable for that:
library("dint")
d=as.Date("2015-01-01")
get_quarter(d)
you can find more about this package here.
Related
My questions concerns lagging data in r where r should be aware of the time index. I hope the question has not been asked in any further thread. Lets consider a simple setup:
df <- data.frame(date=as.Date(c("1990-01-01","1990-02-01","1990-01-15","1990-03-01","1990-05-01","1990-07-01","1993-01-02")), value=1:7)
This code should generate a table like
date
value
1990-01-01
1
1990-02-01
2
1990-01-15
3
1990-03-01
4
1990-05-01
5
1990-07-01
6
And my aim is now to try to lag the "value" by e.g. one month such that e.g when I try to compute the lagged value of "1990-05-01" (which would be 1990-04-01 but is not present in the data) should then generate an NA in the specific row. When I use the standard lag function r is not aware of the time index and simply uses the value "4" of 1990-03-01 which is not what I want. Has anyone an idea what I could do here?
Thanks in advance! :)
All the best,
Leon
You can try %m-% for lagged month like below
library(lubridate)
transform(
df,
value_lag = value[match(date %m-% months(1), date)]
)
which gives
date value value_lag
1 1990-01-01 1 NA
2 1990-02-01 2 1
3 1990-01-15 3 NA
4 1990-03-01 4 2
5 1990-05-01 5 NA
6 1990-07-01 6 NA
7 1993-01-02 7 NA
For an example with multiple columns lets consider:
df <- data.frame(date=as.Date(c("1990-01-01","1990-02-01","1990-01-15","1990-03-01","1990-05-01","1990-07-01","1993-01-02")), value=1:7, value2=7:13)
I recently found myself the following solution:
df %>%
as_tibble() %>%
mutate(across(2:ncol(df), .fns= function(x){x[match(date %m-% months(lags),date)]}, .names="{.col}_lag"))
Thanks to your code #ThomasisCoding. :)
I would like to extract ONLY the quarter from a date, e.g., to get an integer 1 from the date "2003-02-08". I have been trying something along this line
library(mondate)
as.yearqtr(dat$DATE)
"2003 Q1"
as.character(as.yearqtr(dat$DATE))[1]
"2003 Q1"
which hasn't been giving my desired result. Of course I can write conditions as follows
library(data.table)
data$DATE = as.Date(data$DATE, format='%d%b%Y')
data$month=month(data$DATE)
setDT(data)[month==1, quarter:=1]
...
This will work, but is not elegant at all. Is there a more beautiful way of doing this?
Thank you lmo and user2100721! I really wish I could accept all of the answers!
There is a base R function, quarters, that more or less accomplishes what you want, though it prepends "Q". So
quarters(as.Date("2001-05-01"))
[1] "Q2"
If it is important to get rid of the "Q", you could use substr
substr(quarters(as.Date("2001-05-01")), 2, 2)
[1] "2"
Other date-related base R functions, such as weekdays and months can be found in help page ?quarters.
I would do:
# example data
DT = data.table(id = 1:10, d = as.IDate("2003-02-08") + seq(100, by=50, length.out=10))
DT[, qtr := quarter(d)]
id d qtr
1: 1 2003-05-19 2
2: 2 2003-07-08 3
3: 3 2003-08-27 3
4: 4 2003-10-16 4
5: 5 2003-12-05 4
6: 6 2004-01-24 1
7: 7 2004-03-14 1
8: 8 2004-05-03 2
9: 9 2004-06-22 2
10: 10 2004-08-11 3
The quarter function is provided by data.table and works on both Date and IDate vectors. (IDate uses integer storage.)
lubridate package has the same function. We can use that also. I am using #Frank's DT
DT[, qtr := lubridate::quarter(d)]
dint package also is suitable for that:
library("dint")
d=as.Date("2015-01-01")
get_quarter(d)
you can find more about this package here.
I am working on a data frame that contains 2 columns as follows:
time frequency
2014-01-06 13
2014-01-07 30
2014-01-09 56
My issue is that I am interested in counting the days of which frequency is 0. The data is pulled using RPostgreSQL/RSQLite so there is no datetime given unless there is a value (i.e. unless frequency is at least 1). If I was interested in counting these dates that don't actually exist in the data frame, is there an easy way to go about doing it? I.E. If we consider the date range 2014-01-01 to 20-14-01-10, I would want it to count 7
My only thought was to brute force create a separate dataframe with every date (note that this is 4+ years of dates which would be an immense undertaking) and then merging the two dataframes and counting the number of NA values. I'm sure there is a more elegant solution than what I've thought of.
Thanks!
Sort by date and then look for gaps.
start <- as.Date("2014-01-01")
time <- as.Date(c("2014-01-06", "2014-01-07","2014-01-09"))
end <- as.Date("2014-01-10")
time <- sort(unique(time))
# Include start and end dates, so the missing dates are 1/1-1/5, 1/8, 1/10
d <- c(time[1]- start,
diff(time) - 1,
end - time[length(time)] )
d # [1] 5 0 1 1
sum(d) # 7 missing days
And now for which days are missing...
(gaps <- data.frame(gap_starts = c(start,time+1)[d>0],
gap_length = d[d>0]))
# gap_starts gap_length
# 1 2014-01-01 5
# 2 2014-01-08 1
# 3 2014-01-10 1
for (g in 1:nrow(gaps)){
start=gaps$gap_starts[g]
length=gaps$gap_length[g]
for(i in start:(start+length-1)){
print(as.Date(i, origin="1970-01-01"))
}
}
# [1] "2014-01-01"
# [1] "2014-01-02"
# [1] "2014-01-03"
# [1] "2014-01-04"
# [1] "2014-01-05"
# [1] "2014-01-08"
# [1] "2014-01-10"
I want to the frequency of observations "after 19:00" independently of date. What would be the quickest and most logical way?
As I told R that the Date column is a date as.Date, I would like to tell R that Time is a time column... and then just ask "Time > "19:00:00"" but this does not seem to be possible.
I tried as.POSIXct(Time, format= "%H:%M:%S") but this function adds a date of today to my column which creates annoying clutter and unprofessional look.
I could use substr(as.character(Time),1,2) > 19 but that doesn't feel very elegant either.
Date Time
1 2014-01-01 17:16:48
2 2014-01-01 18:57:36
3 2014-01-01 19:40:48
4 2014-01-01 19:40:48
5 2014-01-01 20:09:36
6 2014-01-01 20:24:00
library(data.table)
## Convert (by reference) your data to a data.table
setDT(dat)
dat[, .N
, by = list(above_1900 = hour(as.POSIXlt(Time, format="%H:%M:%S")) > 19)]
above_19 N
1: FALSE 4
2: TRUE 2
I have following data set:
>d
x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012
I want:
> d
x date
1 1 31-12-2013
2 2 31-12-2010
3 3 31-12-2011
4 4 31-12-2012
i.e. Last day, last month and the year of the date object.
Please Help!
You can also just use the ceiling_date function in LUBRIDATE package.
You can do something like -
library(lubridate)
last_date <- ceiling_date(date,"year") - days(1)
ceiling_date(date,"year") gives you the first date of the next year and to get the last date of the current year, you subtract this by 1 or days(1).
Hope this helps.
Another option using lubridate package:
## using d from Roland answer
transform(d,last =dmy(paste0('3112',year(dmy(date)))))
x date last
1 1 1-3-2013 2013-12-31
2 2 2-4-2010 2010-12-31
3 3 2-5-2011 2011-12-31
4 4 1-6-2012 2012-12-31
d <- read.table(text="x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012", header=TRUE)
d$date <- as.Date(d$date, "%d-%m-%Y")
d$date <- as.POSIXlt(d$date)
d$date$mon <- 11
d$date$mday <- 31
d$date <- as.Date(d$date)
# x date
#1 1 2013-12-31
#2 2 2010-12-31
#3 3 2011-12-31
#4 4 2012-12-31
1) cut.Date Define cut_year to give the first day of the year. Adding 366 gets us to the next year and then applying cut_year again gets us to the first day of the next year. Finally subtract 1 to get the last day of the year. The code uses base functionality only.
cut_year <- function(x) as.Date(cut(as.Date(x), "year"))
transform(d, date = cut_year(cut_year(date) + 366) - 1)
2) format
transform(d, date = as.Date(format(as.Date(date), "%Y-12-31")))
3) zoo A "yearmon" class variable stores the date as a year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec. Thus taking its floor and adding 11/12 gets one to Dec and as.Date.yearmon(..., frac = 1) uses the last of the month instead of the first.
library(zoo)
transform(d, date = as.Date(floor(as.yearmon(as.Date(date))) + 11 / 12, frac = 1))
Note: The inner as.Date in cut_year and in the other two solutions can be omitted if it is known that date is already of "Date" class.
ADDED additional solutions.