R- Create a new field applying condition on a Date field - r

I am new in R. I am working with windows 10. I have R Studio and R version 3.5.0.
I have a table with one field dateTime format.
2012-02-02 10:04:00
2012-08-13 11:38:00
2012-07-13 14:00:00
2012-09-26 08:45:00
2012-10-24 05:39:00
2012-02-03 03:33:00
2012-05-02 06:30:00
2012-06-27 09:00:00
2012-07-09 10:16:00
2012-11-22 13:13:00
I need to create a new field that splits the data between summer and winter:
From May to September would be Summer and from October to April would be Winter. Based on the result of this new field, create another one that separates the data between times of the day: morning, noon, afternoon and night for summer and the same for winter. The conditions would be:
For Summer
* Morning Summer: 5 am – 10 am
* Noon Summer: 10 am -12 pm
* Afternoon Summer: 12 pm -8 pm
* Night summer 8 pm – 5 am
For Winter
* Morning Winter: 7 am – 11 am
* Noon Winter: 11 am -12 pm
* Afternoon Winter: 12 pm -4 pm
* Night Winter 4 pm – 7 am
the result would be something like this:
date | season | time Of Day
'2012-02-02 10:04:00' | winter | morning
'2012-08-13 11:38:00' | summer | noon
'2012-07-13 14:00:00' | summer | afternoon
'2012-09-26 08:45:00' | summer | morning
'2012-10-24 05:39:00' | winter | night
'2012-02-03 03:33:00' | winter | night
'2012-05-02 06:30:00' | summer | morning
'2012-12-27 09:00:00' | winter | morning
'2012-07-09 10:16:00' | summer | morning
'2012-11-22 13:13:00' | winter | afternoon
For the first case, (split between summer and winter) I tried to use case_when, but it did not work:
df %>%
mutate(season = case_when(
month(.$date) > 4 & month(.$date)< 10 ~ "summer",
month(.$date) < 5 & month(.$date) > 10 ~ "winter"
))
Error in mutate_impl(.data, dots) :
Evaluation error: do not know how to convert 'x' to class
<U+0093>POSIXlt<U+0094>.
I tried to find something about the error, but to be honest I did not get how to solve the problem. I tried to use library "lubridate" but still doesn't work.
Any idea of how to do it?

df %>% mutate_if(is.character, as.POSIXct) %>%
mutate(season = case_when(
month(date) > 4 & month(date) < 10 ~ "summer",
month(date) < 5 & month(date) > 10 ~ "winter"
))
Data
data <- read.table(text="
date
'2012-02-02 10:04:00'
'2012-08-13 11:38:00'
'2012-07-13 14:00:00'
'2012-09-26 08:45:00'
",header=T, stringsAsFactors = F)

Related

get start date of week from week number and year only

I have a dataset where the data is reported by week and year like: YYWW. I have split it into to columns: Year and Week.
I need to get a date from the week: Week_start_date. My weeks start at mondays, so I would like to get the monday and sunday date from each week.
ID
YYWW
year
week
Week_start_date
Week_end_date
1
1504
2015
04
?
?
2
1651
2016
51
?
?
3
1251
2012
51
?
?
4
1447
2014
47
?
?
How do I extract the week start date from just a week number and year?
I've looked at several threads at SO, but haven't found a solution yet.
I have tried looking at different threads, but encounters problems using their solutions. Most seaches for "convert week number and year to date" on google and SO returns the opposite: Getting a weeknumber from a date. This guy answered by Vince, have maybe some similar issues, but I can't get the code to do the job: https://communities.sas.com/t5/SAS-Programming/Converting-week-number-to-start-date/td-p/106456
Use INTNX() with the WEEK interval and increment from the first of the year.
Use +1 to get Monday/Sunday dates.
You may need to tweak to match the dates you need.
data have;
infile cards dlm='09'x;
input ID $ YYWW year week ;
format year 8. week z2.;
cards;
1 1504 2015 04
2 1651 2016 51
3 1251 2012 51
4 1447 2014 47
;;;;
data want;
set have;
week_start = intnx('week', mdy(1, 1, year), week, 'b')+1;
week_end = intnx('week', mdy(1, 1, year), week, 'e')+1;
format week_: date9.;
run;
Use one of the WEEK... informats. But you will need to insert the letter W between the YEAR and WEEK number.
data have;
input ID $ YYWW year week ;
cards;
1 1504 2015 04
2 1651 2016 51
3 1251 2012 51
4 1447 2014 47
;;;;
data want;
set have;
week_start=input(cats(year,'W',put(week,Z2.)),weekv.);
week_end=week_start+6;
format week_: yymmdd10.;
run;
Results
Obs ID YYWW year week week_start week_end
1 1 1504 2015 4 2015-01-19 2015-01-25
2 2 1651 2016 51 2016-12-19 2016-12-25
3 3 1251 2012 51 2012-12-17 2012-12-23
4 4 1447 2014 47 2014-11-17 2014-11-23

R function to select specific week days [duplicate]

This question already has answers here:
How to filter or subset specific date and time intervals in R? Lubridate?
(2 answers)
Closed 2 years ago.
I am working on a project and would be happy about your help.
I am working with stocks and the effect of weekdays on performance, is there a way to take all the values (for instance the S&P 500) of a data frame (df) from a specific weekday (e.g. Tuesday) and enter these values in a different data frame (df2) in a new column?
Thank you very much,
Ferdinand
df <- read.csv("AAPL.csv") # from Yahoo! Finance
> head(df)
Date Open High Low Close Adj.Close Volume
1 2019-07-10 201.85 203.73 201.56 203.23 200.8332 17897100
2 2019-07-11 203.31 204.39 201.71 201.75 199.3706 20191800
3 2019-07-12 202.45 204.00 202.20 203.30 200.9023 17595200
4 2019-07-15 204.09 205.87 204.00 205.21 202.7898 16947400
5 2019-07-16 204.59 206.11 203.50 204.50 202.0882 16866800
6 2019-07-17 204.05 205.09 203.27 203.35 200.9517 14107500
df$Day <- format(as.Date(df$Date), "%A") # Get the day
idx <- df$Day == "Tuesday" # Where are the Tuesdays ?
df2 <- df[idx, ] # Logical indexing
> head(df2)
Date Open High Low Close Adj.Close Volume Day
5 2019-07-16 204.59 206.11 203.50 204.50 202.0882 16866800 Tuesday
10 2019-07-23 208.46 208.91 207.29 208.84 206.3770 18355200 Tuesday
15 2019-07-30 208.76 210.16 207.31 208.78 206.3177 33935700 Tuesday
20 2019-08-06 196.31 198.07 194.04 197.00 194.6766 35824800 Tuesday
25 2019-08-13 201.02 212.14 200.48 208.97 207.2901 47218500 Tuesday
30 2019-08-20 210.88 213.35 210.32 210.36 208.6689 26884300 Tuesday
Your function :
myfunction <- function(mydf) {
df$Day <- format(as.Date(df$Date), "%A")
idx <- df$Day == "Tuesday"
df2 <- df[idx, ]
}
Testing myfunction :
> out = myfunction(df)
> head(out)
Date Open High Low Close Adj.Close Volume Day
5 2019-07-16 204.59 206.11 203.50 204.50 202.0882 16866800 Tuesday
10 2019-07-23 208.46 208.91 207.29 208.84 206.3770 18355200 Tuesday
15 2019-07-30 208.76 210.16 207.31 208.78 206.3177 33935700 Tuesday
20 2019-08-06 196.31 198.07 194.04 197.00 194.6766 35824800 Tuesday
25 2019-08-13 201.02 212.14 200.48 208.97 207.2901 47218500 Tuesday
30 2019-08-20 210.88 213.35 210.32 210.36 208.6689 26884300 Tuesday

Dividing data based on custom date range

I have a time series which spans multiple years and want to divide it into four categories based on date (ie, 15 April - 10 May, 11 May - 10 July, and so on). My first thought was to use lubridate to define each time period with interval() and then use %within% to determine whether an event occurs within it or not.
df
id datetime
1 HAR10 2019-06-26 04:35:06
2 HAR05 2019-08-05 19:15:00
3 HAR07 2018-07-26 01:01:00
4 HAR07 2018-07-24 23:36:00
5 HAR05 2019-08-27 18:59:43
6 HAR05 2019-07-12 03:33:00
7 HAR07 2018-08-09 16:21:00
8 HAR07 2019-05-01 00:04:28
9 HAR04 2019-07-01 05:25:00
10 HAR07 2018-07-18 15:17:00
perA <- interval(ymd(20190511), ymd(20190710))
df %within% perA
I immediately ran into a problem with year, since I want to get all events from, say, April - May, regardless of what year they occurred, but interval is year-specific so the interval defined above works for my 2019 data but not my 2018 data. I could define a new set of intervals for each year, but that seems very messy.
Another problem is that a vector of TRUE and FALSE, which %within% returns, is not what I need. I need to assign each event to a category based on which time range it falls within.
My second thought was to use filter(), but I don't think that solves either of my problems. Any help is appreciated!
You can easily extract the month, day or even hour and set to the same year across dates. I made up some groups. This is a dplyr solution, but you should be able to easily convert to base if you prefer.
library(dplyr)
library(lubridate)
df %>%
mutate(noyeardate = as.Date(paste(2000, month(datetime), day(datetime), sep = "-")),
interval = case_when(noyeardate %within% interval(ymd(20000101), ymd(20000331)) ~ "Group 1",
noyeardate %within% interval(ymd(20000401), ymd(20000630)) ~ "Group 2",
noyeardate %within% interval(ymd(20000701), ymd(20000930)) ~ "Group 3",
noyeardate %within% interval(ymd(20001001), ymd(20001231)) ~ "Group 4"))
id datetime noyeardate interval
1 HAR10 2018-07-18 15:17:00 2000-07-18 Group 3
2 HAR05 2018-07-24 23:36:00 2000-07-24 Group 3
3 HAR07 2018-07-26 01:01:00 2000-07-26 Group 3
4 HAR07 2018-08-09 16:21:00 2000-08-09 Group 3
5 HAR05 2019-05-01 00:04:28 2000-05-01 Group 2
6 HAR05 2019-06-26 04:35:06 2000-06-26 Group 2
7 HAR07 2019-07-01 05:25:00 2000-07-01 Group 3
8 HAR07 2019-07-12 03:33:00 2000-07-12 Group 3
9 HAR04 2019-08-05 19:15:00 2000-08-05 Group 3
10 HAR07 2019-08-27 18:59:43 2000-08-27 Group 3
Data:
df <- data.frame(id = c("HAR10", "HAR05", "HAR07", "HAR07", "HAR05", "HAR05", "HAR07", "HAR07", "HAR04", "HAR07"),
datetime = as.POSIXct(c("2018-07-18 15:17:00", "2018-07-24 23:36:00",
"2018-07-26 01:01:00", "2018-08-09 16:21:00", "2019-05-01 00:04:28",
"2019-06-26 04:35:06", "2019-07-01 05:25:00", "2019-07-12 03:33:00",
"2019-08-05 19:15:00", "2019-08-27 18:59:43")))

Calculating difference between two dates grouped by a variable

I'm looking for some help writing more efficient code.
I have the following data set.
Report| ReportPeriod|ObsDate
1 | 15 |2017-12-31 00:00:00
1 | 15 |2017-12-31 06:00:00
1 | 15 |2017-12-31 12:30:00
2 | 11 |2018-01-01 07:00:00
2 | 11 |2018-01-01 13:00:00
2 | 11 |2018-01-01 16:30:00
First column is "Report" which is a unique identifier for a particular report.
In the data set, there are only two reports (1 & 2).
Second column is "ReportPeriod", which is same for a particular report. Report 1 is 15 hrs long and Report 2 is 11 hrs long.
Column three "ObsDate" is different observations in a particular report.
Problem: I need to find out the time difference between observations grouped by "Report". I did that with the following code.
example<- data.frame(Report=c(1,1,1,2,2,2), ReportPeriod=c(15,15,15,11,11,11),
ObsDate=c(as.POSIXct("2017-12-31 00:00:00"), as.POSIXct("2017-12-31 06:00:00"),
as.POSIXct("2017-12-31 12:30:00"), as.POSIXct("2018-01-01 07:00:00"),
as.POSIXct("2018-01-01 13:00:00"), as.POSIXct("2018-01-01 16:30:00")))
example<- example %>% group_by(Report) %>%
mutate(DiffPeriod= (ObsDate-lag(ObsDate)))
The output is:
Report| ReportPeriod|ObsDate |DiffPeriod
1 | 15 |2017-12-31 00:00:00|NA
1 | 15 |2017-12-31 06:00:00|6.0
1 | 15 |2017-12-31 12:30:00|6.5
2 | 11 |2018-01-01 07:00:00|NA
2 | 11 |2018-01-01 13:00:00|6.0
2 | 11 |2018-01-01 16:30:00|3.5
Now the first two entries of the "Report" are NA. These values should be the sum of the DiffPeriod subtracted from the total report period "ReportPeriod".
I did that using the following code.
xyz<- data.frame()
for (i in unique(example$Report)) {
df<- example %>% filter(Report==i)
hrs<- sum(df$DiffPeriod, na.rm = TRUE)
tot<- df$ReportPeriod[1]
bal<- tot-hrs
df$DiffPeriod[1]<- bal
xyz<- xyz %>% bind_rows(df)
}
The final output is :
Report| ReportPeriod|ObsDate |DiffPeriod
1 | 15 |2017-12-31 00:00:00|2.5
1 | 15 |2017-12-31 06:00:00|6.0
1 | 15 |2017-12-31 12:30:00|6.5
2 | 11 |2018-01-01 07:00:00|1.5
2 | 11 |2018-01-01 13:00:00|6.0
2 | 11 |2018-01-01 16:30:00|3.5
Is there a better/more efficient way to do what I did in the for-loop above in the tidyverse?
Thanks.
Assuming ReportPeriod would always be in hours we can first get the difference between ObsDate and lag(ObsDate) and then replace NA which would be only first row by taking difference between first value of ReportPeriod with sum of DiffPeriod for each group (Report).
library(dplyr)
example %>%
group_by(Report) %>%
mutate(DiffPeriod= difftime(ObsDate, lag(ObsDate), units = "hours"),
DiffPeriod = replace(DiffPeriod, is.na(DiffPeriod),
ReportPeriod[1] - sum(DiffPeriod, na.rm = TRUE)))
# Report ReportPeriod ObsDate DiffPeriod
# <dbl> <dbl> <dttm> <time>
#1 1 15 2017-12-31 00:00:00 2.5 hours
#2 1 15 2017-12-31 06:00:00 6.0 hours
#3 1 15 2017-12-31 12:30:00 6.5 hours
#4 2 11 2018-01-01 07:00:00 1.5 hours
#5 2 11 2018-01-01 13:00:00 6.0 hours
#6 2 11 2018-01-01 16:30:00 3.5 hours

Subsetting a dataframe based on the values of two or more columns

I would like to subset a timeseries dataframe based on my requirement.
I have a dataframe something similar to the one mentioned below.
> df
Date Year Month Day Time Parameter
2012-04-19 2012 04 19 7:00:00 26
2012-04-19 2012 04 19 7:00:00 20
.................................................
2012-05-01 2012 05 01 00:00:00 23
2012-05-01 2012 05 01 00:30:00 22
.................................................
2015-04-30 2015 04 30 23:30:00 20
.................................................
2015-05-01 2015 05 01 00:00:00 26
From the dataframe similar to this I will like to select all the data from the first of May 2012 2012-05-01 to the end of April 2015-04-30, regardless of the starting and end date of the dataframe.
However, I am familiar with the grep function to select the data from one particular column. I have been using the following code with grep and with.
# To select one particular year
> df.2012 <- df[grep("2012", df$Year),]
# To select two or more years at the same time
> df.sel.yr <- df[grep("201[2-5]", df$Year),]
# To select one particular month of a particular year.
> df.Dec.2012 <- df[with(df, Year=="2012" & Month=="12"), ]
With several Lines of commands i will be able to do it. But it would save a lot of time if I can do it with only few or one line of command.
Any help will be appreciated. Thank you in advance.
If your date column is not of class date first convert it to one by,
df$Date <- as.Date(df$Date)
and then you can subset the date by,
df[df$Date >= as.Date("2012-05-01") & df$Date <= as.Date("2015-04-30"), ]
# Date Year Month Day Time Parameter
#3 2012-05-01 2012 5 1 00:00:00 23
#4 2012-05-01 2012 5 1 00:30:00 22
#5 2015-04-30 2015 4 30 23:30:00 20

Resources