Create "Week_Start" variable in R - r

I have a dataframe that look like the one below.
bus_date <- as.Date(c('2017-04-03', '2017-04-04', '2017-04-06', '2017-04-11', '2017-04-13', '2017-04-17'))
sales <- c(100, 110, 120, 200, 300, 100)
daily_sales <- data.frame(bus_date, sales)
It is a sales table at the daily level.
I want to create a new variable called "Week_Start" which is the date of the business week. I have implemented various solutions which allow me to record a week number (1-52) but I need the actual week starting date.
if (bus_date is a Monday)
return(bus_date)
else
return(Monday before bus_date)
So my resulting dataframe would look like:
Week_Start <- as.Date(c('2017-04-03', '2017-04-03', '2017-04-03', '2017-04-10', '2017-04-10', '2017-04-17'))
daily_sales2 <- data.frame(bus_date, sales, Week_Start)
I know there is probably an easy way to do this, but unsure where to begin. Thanks.

From ?strptime
%w Weekday as decimal number (0–6, Sunday is 0).
%W Week of the year as decimal number (00–53) using Monday as the
first day of week (and typically with the first Monday of the year as
day 1 of week 1). The UK convention.
as.Date(format(daily_sales$bus_date, "%Y-%W-1"), format = "%Y-%W-%w")
#[1] "2017-04-03" "2017-04-03" "2017-04-03" "2017-04-10" "2017-04-10" "2017-04-17"

Here's how you can do that with floor_date from lubridate. By default, floor_date gives you the preceding Sunday. +1 gives you Monday.
library(lubridate)
daily_sales$Week_Start <- floor_date(daily_sales$bus_date,unit="week")+1
daily_sales
bus_date sales Week_Start
1 2017-04-03 100 2017-04-03
2 2017-04-04 110 2017-04-03
3 2017-04-06 120 2017-04-03
4 2017-04-11 200 2017-04-10
5 2017-04-13 300 2017-04-10
6 2017-04-17 100 2017-04-17

Related

How do I transform week (and year) to date of the first date (i.e. Monday) of the week in R?

In R, I have a dataset with the week number and the year (see below) and I want to transform it into the date the corresponding Monday of the week.I used the as.Date() function. This works well, except for the first week, where the function return NA as the corresponding Monday of the first week belongs to the previous year. I thus want that the function returns the date of Monday, even if it is not the same year. Any idea?
data.frame(week = paste(2022,0:5,sep="-")) %>%
mutate(week2 = paste(week,"1",sep="-"),
date=as.Date(week2, "%Y-%W-%w"))
week week2 date
1 2022-0 2022-0-1 <NA>
2 2022-1 2022-1-1 2022-01-03
3 2022-2 2022-2-1 2022-01-10
4 2022-3 2022-3-1 2022-01-17
5 2022-4 2022-4-1 2022-01-24
6 2022-5 2022-5-1 2022-01-31
Your code is good. But you have only one problem: NA if week == 0. If week number is zero then first day of the year is not monday. So you can check if week number is zero (and find monday in previous year) or do as in you example. For example as.Date("2018-1-1", "%Y-%W-%w") == 2018-01-01 (week == 1, not 0).
foo <- function(year, week) {
if (week == 0) {
year <- year - 1
week <- data.table::week(as.Date(paste0(year, "-12-31"))) - 1
}
return(as.Date(paste0(c(year, week, "1"), collapse = "-"), "%Y-%W-%w"))
}

Filtering out time data from R data frame

So i have a dataset in R:
IncidentID Time Vehicle
19002 4:48 Car
19003 12:30 Motorcycle
19004 14:00 Car
19005 9:30 Bicycle
And I'm trying to filter out some data, since its quite a large dataset. The above is just a few examples of data.
I want to filter out the data according to the time, where say i want to obtain the data where the Time is between 12pm to 6pm (18:00 in 24 hour format), hence i would have:
IncidentID Time Vehicle
19003 12:30 Motorcycle
19004 14:00 Car
I did:
incident <- read.csv("incident.csv")
afternoon_incident <- incident[which(incident$Time >= 12 && incident$Time <= 18),]
But I'm getting the error saying:
1: In Ops.factor(web$Time, 6:0) : ‘>=’ not meaningful for factors
2: In Ops.factor(web$Time, 12:0) : ‘<=’ not meaningful for factors
You can use lubridate to convert Time field into time object and then extract hour for filtering:
library(lubridate)
incident$Time <- hm(as.character(incident$Time))
incident[which(hour(incident$Time) >= 12 & hour(incident$Time) <= 18), ]
You need to first convert the Time into actual date-time object using as.POSIXct and then compare.
As you want to subset based on hour, we can extract only hour part of the data using format and keep rows which are in between 12 and 18 hour. Using base R, we can do
df$hour <- as.numeric(format(as.POSIXct(df$Time, format = "%H:%M"), "%H"))
subset(df, hour >= 12 & hour <= 18)
# IncidentID Time Vehicle hour
#2 19003 12:30 Motorcycle 12
#3 19004 14:00 Car 14
You can remove the hour column later if not needed.
For a general solution, we can create a date-time column and then compare
df$datetime <- as.POSIXct(df$Time, format = "%H:%M")
subset(df, datetime >= as.POSIXct("12:30:00", format = "%T") &
datetime <= as.POSIXct("18:30:00", format = "%T"))

subset data by time interval if I have all data between time interval

I have a data frame that looks like this:
X id mat.1 mat.2 mat.3 times
1 1 1 Anne 1495206060 18.5639404 2017-05-19 11:01:00
2 2 1 Anne 1495209660 9.0160321 2017-05-19 12:01:00
3 3 1 Anne 1495211460 37.6559161 2017-05-19 12:31:00
4 4 1 Anne 1495213260 31.1218856 2017-05-19 13:01:00
....
164 164 1 Anne 1497825060 4.8098351 2017-06-18 18:31:00
165 165 1 Anne 1497826860 15.0678781 2017-06-18 19:01:00
166 166 1 Anne 1497828660 4.7636241 2017-06-18 19:31:00
What I would like is to subset the data set by time interval (all data between 11 AM and 4 PM) if there are data points for each hour at least (11 AM, 12, 1, 2, 3, 4 PM) within each day. I want to ultimately sum the values from mat.3 per time interval (11 AM to 4 PM) per day.
I did tried:
sub.1 <- subset(t,format(times,'%H')>='11' & format(times,'%H')<='16')
but this returns all the data from any of the times between 11 AM and 4 PM, but often I would only have data for e.g. 12 and 1 PM for a given day.
I only want the subset from days where I have data for each hour from 11 AM to 4 PM. Any ideas what I can try?
A complement to #Henry Navarro answer for solving an additional problem mentioned in the question.
If I understand in proper way, another concern of the question is to find the dates such that there are data points at least for each hour of the given interval within the day. A possible way following the style of #Henry Navarro solution is as follows:
library(lubridate)
your_data$hour_only <- as.numeric(format(your_data$times, format = "%H"))
your_data$days <- ymd(format(your_data$times, "%Y-%m-%d"))
your_data_by_days_list <- split(x = your_data, f = your_data$days)
# the interval is narrowed for demonstration purposes
hours_intervals <- 11:13
all_hours_flags <- data.frame(days = unique(your_data$days),
all_hours_present = sapply(function(Z) (sum(unique(Z$hour_only) %in% hours_intervals) >=
length(hours_intervals)), X = your_data_by_days_list), row.names = NULL)
your_data <- merge(your_data, all_hours_flags, by = "days")
There is now the column "all_hours_present" indicating that the data for a corresponding day contains at least one value for each hour in the given hours_intervals. And you may use this column to subset your data
subset(your_data, all_hours_present)
Try to create a new variable in your data frame with only the hour.
your_data$hour<-format(your_data$times, format="%H:%M:%S")
Then, using this new variable try to do the next:
#auxiliar variable with your interval of time
your_data$aux_var<-ifelse(your_data$hour >"11:00:00" || your_data$hour<"16:00:00" ,1,0)
So, the next step is filter your data when aux_var==1
your_data[which(your_data$aux_var ==1),]

Dates in R calculating number of days

I'm having data as
customer_id Last_city First city recent_date
1020 Jaipur Gujarat 20130216
1021 Delhi Lucknow 20130129
1022 Mumbai Punjab 20130221
and I want to find the number of days from recent date and today (for every record).
difftime function calculates time difference in days, hours, minutes, etc.
First, need to parse the date string into a date representation (e.g. Date or POSIXct) then compare that to the current date/time.
# create dummy data.frame for testing
df <- data.frame("customer_id"=1020, "Last_city"="Jaipur",
"First_city"="Gujarat", "recent_date"="20130216",
stringsAsFactors = FALSE)
now <- Sys.Date()
# parse date into date type (Note: %Y=4-digit year, %y=2-digit year)
df$date = as.Date(df$recent_date, format = "%Y%m%d")
# next calculate the difference between recent date and current time
df$diff = as.double(difftime(now, df$date, units = c("days")))
> df
customer_id Last_city First_city recent_date date diff
1 1020 Jaipur Gujarat 20130216 2013-02-16 1604
If wanted the difference in weeks then
> as.double(difftime(now, df$date, units = c("weeks")))
[1] 229.1429

Split date data (m/d/y) into 3 separate columns

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.

Resources