Let's say that I have a date in R and it's formatted as follows.
date
2012-02-01
2012-02-01
2012-02-02
Is there any way in R to add another column with the day of the week associated with the date? The dataset is really large, so it would not make sense to go through manually and make the changes.
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
So after adding the days, it would end up looking like:
date day
2012-02-01 Wednesday
2012-02-01 Wednesday
2012-02-02 Thursday
Is this possible? Can anyone point me to a package that will allow me to do this?
Just trying to automatically generate the day by the date.
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
df$day <- weekdays(as.Date(df$date))
df
## date day
## 1 2012-02-01 Wednesday
## 2 2012-02-01 Wednesday
## 3 2012-02-02 Thursday
Edit: Just to show another way...
The wday component of a POSIXlt object is the numeric weekday (0-6 starting on Sunday).
as.POSIXlt(df$date)$wday
## [1] 3 3 4
which you could use to subset a character vector of weekday names
c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday")[as.POSIXlt(df$date)$wday + 1]
## [1] "Wednesday" "Wednesday" "Thursday"
Use the lubridate package and function wday:
library(lubridate)
df$date <- as.Date(df$date)
wday(df$date, label=TRUE)
[1] Wed Wed Thurs
Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
Look up ?strftime:
%A Full weekday name in the current locale
df$day = strftime(df$date,'%A')
Let's say you additionally want the week to begin on Monday (instead of default on Sunday), then the following is helpful:
require(lubridate)
df$day = ifelse(wday(df$time)==1,6,wday(df$time)-2)
The result is the days in the interval [0,..,6].
If you want the interval to be [1,..7], use the following:
df$day = ifelse(wday(df$time)==1,7,wday(df$time)-1)
... or, alternatively:
df$day = df$day + 1
This should do the trick
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
dow <- function(x) format(as.Date(x), "%A")
df$day <- dow(df$date)
df
#Returns:
date day
1 2012-02-01 Wednesday
2 2012-02-01 Wednesday
3 2012-02-02 Thursday
start = as.POSIXct("2017-09-01")
end = as.POSIXct("2017-09-06")
dat = data.frame(Date = seq.POSIXt(from = start,
to = end,
by = "DSTday"))
# see ?strptime for details of formats you can extract
# day of the week as numeric (Monday is 1)
dat$weekday1 = as.numeric(format(dat$Date, format = "%u"))
# abbreviated weekday name
dat$weekday2 = format(dat$Date, format = "%a")
# full weekday name
dat$weekday3 = format(dat$Date, format = "%A")
dat
# returns
Date weekday1 weekday2 weekday3
1 2017-09-01 5 Fri Friday
2 2017-09-02 6 Sat Saturday
3 2017-09-03 7 Sun Sunday
4 2017-09-04 1 Mon Monday
5 2017-09-05 2 Tue Tuesday
6 2017-09-06 3 Wed Wednesday
form comment of JStrahl format(as.Date(df$date),"%w"), we get number of current day :
as.numeric(format(as.Date("2016-05-09"),"%w"))
Related
I currently have a dataset with multiple different time formats(AM/PM, numeric, 24hr format) and I'm trying to turn them all into 24hr format. Is there a way to standardize mixed format columns?
Current sample data
time
12:30 PM
03:00 PM
0.961469907
0.913622685
0.911423611
09:10 AM
18:00
Desired output
new_time
12:30:00
15:00:00
23:04:31
21:55:37
21:52:27
09:10:00
18:00:00
I know how to do them all individually(an example below), but is there a way to do it all in one go because I have a large amount of data and can't go line by line?
#for numeric time
> library(chron)
> x <- c(0.961469907, 0.913622685, 0.911423611)
> times(x)
[1] 23:04:31 21:55:37 21:52:27
The decimal times are a pain but we can parse them first, feed them back as a character then use lubridate's parse_date_time to do them all at once
library(tidyverse)
library(chron)
# Create reproducible dataframe
df <-
tibble::tibble(
time = c(
"12:30 PM",
"03:00 PM",
0.961469907,
0.913622685,
0.911423611,
"09:10 AM",
"18:00")
)
# Parse times
df <-
df %>%
dplyr::mutate(
time_chron = chron::times(as.numeric(time)),
time_chron = if_else(
is.na(time_chron),
time,
as.character(time_chron)),
time_clean = lubridate::parse_date_time(
x = time_chron,
orders = c(
"%I:%M %p", # HH:MM AM/PM 12 hour format
"%H:%M:%S", # HH:MM:SS 24 hour format
"%H:%M")), # HH:MM 24 hour format
time_clean = hms::as_hms(time_clean)) %>%
select(-time_chron)
Which gives us
> df
# A tibble: 7 × 2
time time_clean
<chr> <time>
1 12:30 PM 12:30:00
2 03:00 PM 15:00:00
3 0.961469907 23:04:31
4 0.913622685 21:55:37
5 0.911423611 21:52:27
6 09:10 AM 09:10:00
7 18:00 18:00:00
I have a list of every day from 2018-01-01 to 2018-06-01. It is a vector and it looks like this:
dates <- c("2018-01-01", "2018-01-02", "2018-01-03", ... , "2018-05-30", "2018-06-01")
I want to make a data frame where the first column has each of those dates and the second column has their day of the week. I am assuming that 2018-01-01 is a Monday.
date day
2018-01-01 Monday
2018-01-02 Tuesday
2018-01-03 Wednesday
... ...
2018-06-01 Monday
I'm working on a data frame towards that end, but I was curious for a better way to recycle through the days of the week than the solution I put together.
day <- NULL
for (i in 1:length(dates)) {
x <- i
while (x > 7) {
x <- i - 7
}
day <- c(day, days[x])
}
cbind(dates,day)
We can use weekdays to get day of the week and put it in a dataframe.
data.frame(dates, day = weekdays(dates))
# dates day
#1 2018-01-01 Monday
#2 2018-01-02 Tuesday
#3 2018-01-03 Wednesday
#4 2018-05-30 Wednesday
#5 2018-06-01 Friday
EDIT
If we don't want to use any in-built function we can create a vector of days and lookup from there. Considering the first day is "Monday" we can use the modulo operator to find the relevant day for rest of the dates
days <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
day <- days[(as.numeric(dates - dates[1]) %% 7) + 1]
day
#[1] "Monday" "Tuesday" "Wednesday" "Wednesday" "Friday"
and then put them in dataframe
data.frame(dates, day)
# dates day
#1 2018-01-01 Monday
#2 2018-01-02 Tuesday
#3 2018-01-03 Wednesday
#4 2018-05-30 Wednesday
#5 2018-06-01 Friday
data
dates<-as.Date(c("2018-01-01","2018-01-02","2018-01-03","2018-05-30","2018-06-01"))
I have a dataframe object, and among the fields in it, I have a dates:
df$dates
I need to add a column which is 'Week Starting', i.e.
df[,'WeekStart']= manipulation
Where the week start is the date of the Monday of that week. i.e.: today is Thursday 24/09/15, would have an entry as '21-Sept'. Next thursday, 01/10/15, would be '28-Sept'.
I see that there is a weekday() function which will convert a day into a week-day, but how can I deal with this most recent monday?
A base R approach with the function strftime.
df$Week.Start <- dates-abs(1-as.numeric(strftime(df$dates, "%u")))
This can be a one-liner but we'll create a few variables to see what's happening. The %u format pattern for dates returns the day of the week as a single decimal number. We can convert that number to numeric and subtract the distance from our dates. We can then subtract that vector from our date column.
day_of_week <- as.numeric(strftime(df$dates, "%u"))
day_diff <- abs(1-day_of_week)
df$Week.Start <- dates-day_diff
# dates Week.Start
# 1 2042-10-22 2042-10-20
# 2 2026-08-14 2026-08-10
# 3 2018-11-23 2018-11-19
# 4 2017-08-21 2017-08-21
# 5 2022-05-26 2022-05-23
# 6 2037-05-27 2037-05-25
Data
set.seed(7)
all_dates <- seq(Sys.Date(), Sys.Date()+10000, by="days")
dates <- sample(all_dates, 20)
df <- data.frame(dates)
Simples:
dates <-(Sys.Date()+1:30)
week.starts <- as.Date(sapply (dates, function(d) { return (d + (-6 - as.POSIXlt(d)$wday %% -7 ))}), origin = "1970-01-01")
and running as
d <- data.frame(dataes=dates, monday=week.starts)
gives
dataes monday
1 2015-09-25 2015-09-21
2 2015-09-26 2015-09-21
3 2015-09-27 2015-09-21
4 2015-09-28 2015-09-28
5 2015-09-29 2015-09-28
6 2015-09-30 2015-09-28
7 2015-10-01 2015-09-28
8 2015-10-02 2015-09-28
9 2015-10-03 2015-09-28
10 2015-10-04 2015-09-28
11 2015-10-05 2015-10-05
12 2015-10-06 2015-10-05
13 2015-10-07 2015-10-05
14 2015-10-08 2015-10-05
15 2015-10-09 2015-10-05
16 2015-10-10 2015-10-05
17 2015-10-11 2015-10-05
18 2015-10-12 2015-10-12
19 2015-10-13 2015-10-12
20 2015-10-14 2015-10-12
21 2015-10-15 2015-10-12
22 2015-10-16 2015-10-12
23 2015-10-17 2015-10-12
24 2015-10-18 2015-10-12
25 2015-10-19 2015-10-19
26 2015-10-20 2015-10-19
27 2015-10-21 2015-10-19
28 2015-10-22 2015-10-19
29 2015-10-23 2015-10-19
30 2015-10-24 2015-10-19
Similar approach, example:
# data
d <- data.frame(date = as.Date( c("20/09/2015","24/09/2015","28/09/2015","01/10/2015"), "%d/%m/%Y"))
# get monday
d$WeekStart <- d$date - 6 - (as.POSIXlt(d$date)$wday %% -7)
d
# result
# date WeekStart
# 1 2015-09-20 2015-09-14
# 2 2015-09-24 2015-09-21
# 3 2015-09-28 2015-09-28
# 4 2015-10-01 2015-09-28
How about just subtracting from the dates the number of days required to get to the previous Monday? e.g if your data is
dates <- as.Date(c("2000-07-12", "2005-02-19", "2010-09-01"))
weekdays(dates)
# [1] "Wednesday" "Saturday" "Wednesday"
then you can compare this to a vector
wdays <- setNames(0:6, c("Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday"))
and subtract the required number of days from each date, ie
dates - wdays[weekdays(dates)]
# Wednesday Saturday Wednesday
#"2000-07-10" "2005-02-14" "2010-08-30"
will give the dates of the Monday preceding each date in dates. To test:
weekdays(dates - wdays[weekdays(dates)])
#Wednesday Saturday Wednesday
# "Monday" "Monday" "Monday"
Everything can be written also in one line as
dates - match(weekdays(dates), c("Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday")) + 1
#"2000-07-10" "2005-02-14" "2010-08-30"
a[1] <-as.Date("2016-08-20")
Finding Next day (Here "Monday")
a[1] + match("Monday",weekdays(seq(a[1]+1, a[1]+6,"days")))
"2016-08-22"
Finding Last Day (Here "Friday")
a[1] + (match("Friday",weekdays(seq(a[1]+1, a[1]+6,"days")))-7)
"2016-08-19"
A simple base-R way if your dates are properly coded as date class in R: as.Date(unclass(dates)-unclass(dates)%%7-3). You unclass it do get number of days since 1970-01-01. Then subtract remainder from division on 7 (day of the week!). Then subtract 3 because 1970-01-01 was Thursday –
Also you can group your data by week, and then create a column of "minimal date of that week". Here is how to do it in data.table package:
df=data.table(df)
df[,lastMonday:=min(dates),by=.(week(dates))]
It should work if you dont have spaces in dates.
Also, in some locales week starts with sunday, so you should be careful.
And you will need additional grouping variable, if your dates span for more than a year
If you want nearest any day and hour to the current date, use this function:
dayhour <- function(day,hour){
k <- as.Date(Sys.time())+day-as.numeric(format(strptime(Sys.time(),format="%Y-%m-%d %H:%M:%S"), format ='%u'))
dh <- format(strptime(paste(k,hour), format="%Y-%m-%d %H"), format="%A %H")
return(dh)
}
For the weekdays use 0 to 6 as day argument for sunday to saturday respectively:
> dayhour(0,17)
[1] "Sunday 17"
I have dataset consisting of two columns (timestamp and power) as:
str(df2)
'data.frame': 720 obs. of 2 variables:
$ timestamp: POSIXct, format: "2015-08-01 00:00:00" "2015-08-01 01:00:00" " ...
$ power : num 124 149 118 167 130 ..
This dataset is of entire one month duration. I want to create two subsets of it - one containing the weekend data, and other one containing weekday (Monday - Friday) data. In other words, one dataset should contain data corresponding to saturday and sunday and the other one should contain data of other days. Both of the subsets should retain both of the columns. How can I do this in R?
I tried to use the concept of aggregate and split, but I am not clear in the function parameter (FUN) of aggregate, how should I specify a divison of dataset.
You can use R base functions to do this, first use strptime to separate date data from first column and then use function weekdays.
Example:
df1<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00"),power=1:2)
df1$day<-strptime(df1[,1], "%Y-%m-%d")
df1$weekday<-weekdays(df1$day)
df1
timestamp power day weekday
2015-08-01 00:00:00 1 2015-08-01 Saturday
2015-10-13 00:00:00 2 2015-10-13 Tuesday
Building on top of #ShruS example:
df<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00", "2015-10-11 00:00:00", "2015-10-14 00:00:00"))
df$day<-strptime(df[,1], "%Y-%m-%d")
df$weekday<-weekdays(df$day)
df1 = subset(df,df$weekday == "Saturday" | df$weekday == "Sunday")
df2 = subset(df,df$weekday != "Saturday" & df$weekday != "Sunday")
> df
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
3 2015-10-11 00:00:00 2015-10-11 Sunday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
> df1
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
3 2015-10-11 00:00:00 2015-10-11 Sunday
> df2
timestamp day weekday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
Initially, I tried for complex approaches using extra libraries, but at the end, I came out with a basic approach using R.
#adding day column to existing set
df2$day <- weekdays(as.POSIXct(df2$timestamp))
# creating two data_subsets, i.e., week_data and weekend_data
week_data<- data.frame(timestamp=factor(), power= numeric(),day= character())
weekend_data<- data.frame(timestamp=factor(),power=numeric(),day= character())
#Specifying weekend days in vector, weekend
weekend <- c("Saturday","Sunday")
for(i in 1:nrow(df2)){
if(is.element(df2[i,3], weekend)){
weekend_data <- rbind(weekend_data, df2[i,])
} else{
week_data <- rbind(week_data, df2[i,])
}
}
The datasets created, i.e., weekend_data and week_data are my required sub datasets.
Let's say that I have a date in R and it's formatted as follows.
date
2012-02-01
2012-02-01
2012-02-02
Is there any way in R to add another column with the day of the week associated with the date? The dataset is really large, so it would not make sense to go through manually and make the changes.
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
So after adding the days, it would end up looking like:
date day
2012-02-01 Wednesday
2012-02-01 Wednesday
2012-02-02 Thursday
Is this possible? Can anyone point me to a package that will allow me to do this?
Just trying to automatically generate the day by the date.
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
df$day <- weekdays(as.Date(df$date))
df
## date day
## 1 2012-02-01 Wednesday
## 2 2012-02-01 Wednesday
## 3 2012-02-02 Thursday
Edit: Just to show another way...
The wday component of a POSIXlt object is the numeric weekday (0-6 starting on Sunday).
as.POSIXlt(df$date)$wday
## [1] 3 3 4
which you could use to subset a character vector of weekday names
c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday")[as.POSIXlt(df$date)$wday + 1]
## [1] "Wednesday" "Wednesday" "Thursday"
Use the lubridate package and function wday:
library(lubridate)
df$date <- as.Date(df$date)
wday(df$date, label=TRUE)
[1] Wed Wed Thurs
Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
Look up ?strftime:
%A Full weekday name in the current locale
df$day = strftime(df$date,'%A')
Let's say you additionally want the week to begin on Monday (instead of default on Sunday), then the following is helpful:
require(lubridate)
df$day = ifelse(wday(df$time)==1,6,wday(df$time)-2)
The result is the days in the interval [0,..,6].
If you want the interval to be [1,..7], use the following:
df$day = ifelse(wday(df$time)==1,7,wday(df$time)-1)
... or, alternatively:
df$day = df$day + 1
This should do the trick
df = data.frame(date=c("2012-02-01", "2012-02-01", "2012-02-02"))
dow <- function(x) format(as.Date(x), "%A")
df$day <- dow(df$date)
df
#Returns:
date day
1 2012-02-01 Wednesday
2 2012-02-01 Wednesday
3 2012-02-02 Thursday
start = as.POSIXct("2017-09-01")
end = as.POSIXct("2017-09-06")
dat = data.frame(Date = seq.POSIXt(from = start,
to = end,
by = "DSTday"))
# see ?strptime for details of formats you can extract
# day of the week as numeric (Monday is 1)
dat$weekday1 = as.numeric(format(dat$Date, format = "%u"))
# abbreviated weekday name
dat$weekday2 = format(dat$Date, format = "%a")
# full weekday name
dat$weekday3 = format(dat$Date, format = "%A")
dat
# returns
Date weekday1 weekday2 weekday3
1 2017-09-01 5 Fri Friday
2 2017-09-02 6 Sat Saturday
3 2017-09-03 7 Sun Sunday
4 2017-09-04 1 Mon Monday
5 2017-09-05 2 Tue Tuesday
6 2017-09-06 3 Wed Wednesday
form comment of JStrahl format(as.Date(df$date),"%w"), we get number of current day :
as.numeric(format(as.Date("2016-05-09"),"%w"))