I have an R script that I run monthly. I'd like to subset my data frame to only show data within a 6 month time period, but each month I'd like the time period to move forward one month.
Original data frame from Sept.:
ID Name Date
1 John 1/1/2020
2 Adam 5/2/2020
3 Kate 9/30/2020
4 Jill 10/15/2020
After subsetting for only dates from May 1, 2020 - Sept. 30, 2020:
ID Name Date
2 Adam 5/2/2020
3 Kate 9/30/2020
The next month when I run my script, I'd like the dates it's subsetting to move forward by one month, so June 1, 2020 - Oct. 31, 2020:
ID Name Date
3 Kate 9/30/2020
4 Jill 10/15/2020
Right now, I'm changing this part of my script manually each month, ie:
df$Date >= subset(df$Date >= '2020-05-01' & df$date <= '2020-09-30')
Is there a way to make this automatic, so that I don't have to manually move forward the date one month every time?
We can use between after converting the 'Date' to Date class
library(dplyr)
library(lubridate)
start <- as.Date("2020-05-01")
end <- as.Date("2020-09-30")
df1 %>%
mutate(Date = mdy(Date)) %>%
filter(between(Date, start, end))
# ID Name Date
#1 2 Adam 2020-05-02
#2 3 Kate 2020-09-30
In the next month, we can change the 'start', 'end' by adding 1 month
start <- start %m+% months(1)
end <- ceiling_date(end %m+% months(1), 'months') - days(1)
start
#[1] "2020-06-01"
end
#[1] "2020-10-31"
using base R and no package dependency.
Data:
dt <- read.table(text = 'ID Name Date
1 John 1/1/2020
2 Adam 3/2/2021
3 Kate 12/30/2020
4 Jill 5/15/2021', header = TRUE, stringsAsFactors = FALSE)
Code:
date_format <- "%m/%d/%Y"
dt$Date <- as.Date(dt$Date, format = date_format)
today <- Sys.Date()
six_month <- today+(6*30)
start <- as.Date(paste(format(today, "%m"), "01",
format(today, "%Y"), sep = "/"),
format = date_format)
end <- as.Date(paste(format(six_month, "%m"), "31",
format(six_month, "%Y"), sep = "/"),
format = date_format)
dt[with(dt, Date >= start & Date <= end), ]
# ID Name Date
# 2 2 Adam 2021-03-02
# 3 3 Kate 2020-12-30
# 4 4 Jill 2021-05-15
This is a very simple solution:
library(lubridate)
t <- today() #automatic
t <- as.Date('2020-11-26') # manual (you can change it as you like)
start <- floor_date(t %m-% months(6), unit="months")
end <- floor_date(t %m-% months(1), unit="months")-1
df$Date >= subset(df$Date >= start & df$date <= end)
Related
I have the following df with the Date column having hourly marks for an entire year:
Date TD RN D.RN Press Temp G.Temp. Rad
1 2018-01-01 00:00:00 154.0535 9.035156 1.416667 950.7833 7.000000 60.16667 11.27000
2 2018-01-01 01:00:00 154.5793 9.663900 1.896667 951.2000 6.766667 59.16667 11.23000
3 2018-01-01 01:59:59 154.5793 7.523438 2.591667 951.0000 6.066667 65.16667 11.23500
4 2018-01-01 02:59:59 154.0535 7.994792 2.993333 951.1833 5.733333 64.00000 11.16833
5 2018-01-01 03:59:59 154.4041 6.797526 3.150000 951.4833 5.766667 57.83333 11.13500
6 2018-01-01 04:59:59 155.1051 12.009766 3.823333 951.0833 5.216667 61.33333 11.22167
I want to add a factor column 'Quarters' that indicates each quarter according to the 'Date'.
As far as I understand I can do that by:
Radiation$Quarter<-cut(Radiation$Date, breaks = "quarters", labels = c("Q1", "Q2", "Q3", "Q4"))
But I also want to add a factor column 'Day/Night' which indicates whether it's day or night, having:
Day → 8am - 8pm
Night → 8pm - 8am
It seems like with the cut() function there's no way to indicate time ranges.
You can use an ifelse/case_when statement after extracting hour from time.
library(dplyr)
library(lubridate)
df %>%
mutate(hour = hour(Date),
label = case_when(hour >= 8 & hour <= 19 ~ 'Day',
TRUE ~ 'Night'))
In base R :
df$hour = as.integer(format(df$Date, '%H'))
transform(df, label = ifelse(hour >= 8 & hour <= 19, 'Day', 'Night'))
We can also do
library(dplyr)
library(lubridate)
df %>%
mutate(hour = hour(Date),
label = case_when(between(hour, 8, 19) ~ "Day", TRUE ~ "Night"))
I would like to know is there a way to transform dates like this
"2016-01-8" in "20160101q" which means the first half of January 2016 or
"20160127" in "20160102q" which means the second half of January 2016 for example and thank you in advance?
here is a solution makeing use of data.table and the lubridate-packages.
It uses the lubridate::days_in_month()-function, to determine the number of days in the month of the date. This is necessairey, since February has (normally) 28 days, so day 15 of February --> 02q. But January has 31 days, so day 15 of January --> 01q.
The logic for calculating the q-periode is:
If day_number / number_of_days_in_month > 0.5 --> q periode = 02q,
else q_period --> 01q.
Then a paste0 command is used to crete the text for in de q_date-column. sprintf() is used to add leading zero for single-digit monthnumbers.
library(data.table)
library(lubridate)
#sample data
data <- data.table( date = as.Date( c("2019-12-30", "2020-01-15", "2020-02-15", "2020-02-14") ) )
# date
# 1: 2019-12-30
# 2: 2020-01-15
# 3: 2020-02-15
# 4: 2020-02-14
#if the day / #days of month > 0.5, date is in q2, else q1
data[ lubridate::mday(date) / lubridate::days_in_month(date) > 0.5,
q_date := paste0( lubridate::year(date), sprintf( "%02d", lubridate::month(date) ), "02q" ) ]
data[ is.na( q_date ),
q_date := paste0( lubridate::year(date), sprintf( "%02d", lubridate::month(date) ), "01q" ) ]
# date q_date
# 1: 2019-12-30 20191202q
# 2: 2020-01-15 20200101q
# 3: 2020-02-15 20200202q
# 4: 2020-02-14 20200201q
you can try with mutate and paste0, first you decompose the date in day, month and year. then create a variable that says if we are in the first or second half of the month, then paste the sting text of month, year and the variable containing "01q" or "02q" depending on the period
date<- c("2016-01-8",
"2016-01-27")
id <- c(1,2)
x <- as.data.frame(cbind(id, date))
library(tidyverse)
library(lubridate)
x = x %>%
mutate(date = ymd(date)) %>%
mutate_at(vars(date), funs(year, month, day))
x$half <- "01q"
x$half[day>15] <- "02q"
paste0(x$year,x$month,x$half)
I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"
I have the following data frame representing user subscriptions:
User StartDate EndDate
1 2015-09-03 2015-10-17
2 2015-10-27 2015-12-25
...
How can I transform it into a time series that gives me the count of active monthly subscriptions over time (assuming it is active in the month if at least for one day in that month). Something like this (based on the example above, assuming only 2 records):
Month Count
2015-08 0
2015-09 1
2015-10 2
2015-11 1
2015-12 1
2016-01 0
Rem: I took some arbitrary start and end dates for the time series, to make the example clear.
Prepare the data and make sure that the date columns are actually stored as dates:
data <- read.table(text = "User StartDate EndDate
1 2015-09-03 2015-10-17
2 2015-10-27 2015-12-25", header = TRUE)
data$StartDate <- as.Date(StartDate)
data$EndDate <- as.Date(EndDate))
This function returns a vector with all month that are within a subscription:
library(lubridate)
subscr_month <- function(start, end) {
start <- floor_date(start, "month")
seq <- seq(start, end, by = "1 month")
months <- format(seq, format = "%Y-%m")
return(months)
}
It uses the function floor_date() from the lubridate package. It is necessary to round of the start date, because otherwise the last month might be missing. For example, for user 2, if you add two month to the start date, you end up on 2015-12-27, which is after the end date, such that no date from December will be included in seq. The last line converts the Dates to character that only include year and month.
Now, you can apply this function to each start and end date from your data using mapply(). Afterwards, table() creates a table of counts of all dates in the resulting list:
all_month <- mapply(subscr_month, data$StartDate, data$EndDate, SIMPLIFY = FALSE)
table(unlist(all_month))
## 2015-09 2015-10 2015-11 2015-12
## 1 2 1 1
You can also convert the table to a data frame:
as.data.frame(table(unlist(all_month)))
## Var1 Freq
## 1 2015-09 1
## 2 2015-10 2
## 3 2015-11 1
## 4 2015-12 1
Your example output also includes the counts for months that do not appear in the data set. If you want to have this, you can convert the vector of months to a factor and set the levels to all the months you want to include:
month_list <- format(seq(as.Date("2015-08-01"), as.Date("2016-01-01"), by = "1 month"), format = "%Y-%m")
all_month_factor <- factor(unlist(all_month), levels = month_list)
table(all_month_factor)
## all_month_factor
## 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01
## 0 1 2 1 1 0
read the data frame mentioned.
df = structure(list(StartDate = structure(c(16681, 16735), class = "Date"),
EndDate = structure(c(16735, 16794), class = "Date")), class = "data.frame", .Names = c("StartDate",
"EndDate"), row.names = c(NA, -2L))
Could make good use of do in dplyr package and seq
df %>%
rowwise() %>% do({
w <- seq(.$StartDate,.$EndDate,by = "15 days") #for month difference less than 1 complete month
m <- format(w,"%Y-%m") %>% unique
data.frame(Month = m)
}) %>%
group_by(Month) %>%
summarise(Count = length(Month))
Here is an example of a subset data in .csv files. There are three columns with no header. The first column represents the date/time and the second column is load [kw] and the third column is 1= weekday, 0 = weekends/ holiday.
9/9/2010 3:00 153.94 1
9/9/2010 3:15 148.46 1
I would like to program in R, so that it selects the first and second column within time ranges from 10:00 to 20:00 for all weekdays (when the third column is 1) within a month of September and do not know what's the best and most efficient way to code.
code dt <- read.csv("file", header = F, sep=",")
#Select a column with weekday designation = 1, weekend or holiday = 0
y <- data.frame(dt[,3])
#Select a column with timestamps and loads
x <- data.frame(dt[,1:2])
t <- data.frame(dt[,1])
#convert timestamps into readable format
s <- strptime("9/1/2010 0:00", format="%m/%d/%Y %H:%M")
e <- strptime("9/30/2010 23:45", format="%m/%d/%Y %H:%M")
range <- seq(s,e, by = "min")
df <- data.frame(range)
OP ask for "best and efficient way to code" this without showing "inefficient code", so #Justin is right.
It's seems that the OP is new to R (and it's officially the summer of love) so I give it a try and I have a solution (not sure about efficiency..)
index <- c("9/9/2010 19:00", "9/9/2010 21:15", "10/9/2010 11:00", "3/10/2010 10:30")
index <- as.POSIXct(index, format = "%d/%m/%Y %H:%M")
set.seed(1)
Data <- data.frame(Date = index, load = rnorm(4, mean = 120, sd = 10), weeks = c(0, 1, 1, 1))
## Data
## Date load weeks
## 1 2010-09-09 19:00:00 113.74 0
## 2 2010-09-09 21:15:00 121.84 1
## 3 2010-09-10 11:00:00 111.64 1
## 4 2010-10-03 10:30:00 135.95 1
cond <- expression(format(Date, "%H:%M") < "20:00" &
format(Date, "%H:%M") > "10:00" &
weeks == 1 &
format(Date, "%m") == "09")
subset(Data, eval(cond))
## Date load weeks
## 3 2010-09-10 11:00:00 111.64 1