I know using the lubridate package, I can generate the respective weekday for each date of entry. I am now dealing with a large dataset having a lot of date entries and I wish to extract weekdays for each date entries. I think it is quite impossible to search for each date and to find weekdays. I will love to have a function that will allow me to insert my date column from my data frame and will produce days corresponding to each dates of the frame.
my frame is like
uinq_id Product_ID Date_of_order count
1 Aarkios04_2014-09-09 Aarkios04 2014-09-09 10
2 ABEE01_2014-08-18 ABEE01 2014-08-18 1
3 ABEE01_2014-08-19 ABEE01 2014-08-19 0
4 ABEE01_2014-08-20 ABEE01 2014-08-20 0
5 ABEE01_2014-08-21 ABEE01 2014-08-21 0
6 ABEE01_2014-08-22 ABEE01 2014-08-22 0
i am trying to generate
uinq_id Product_ID Date_of_order count weekday
1 Aarkios04_2014-09-09 Aarkios04 2014-09-09 10 Tues
2 ABEE01_2014-08-18 ABEE01 2014-08-18 1 Mon
3 ABEE01_2014-08-19 ABEE01 2014-08-19 0 Tues
4 ABEE01_2014-08-20 ABEE01 2014-08-20 0 Wed
5 ABEE01_2014-08-21 ABEE01 2014-08-21 0 Thurs
6 ABEE01_2014-08-22 ABEE01 2014-08-22 0 Fri
any help will be highly beneficial.
thank you.
Using weekdays from base R you can do this for a vector all at once:
temp = data.frame(timestamp = Sys.Date() + 1:20)
> head(temp)
timestamp
1 2016-06-01
2 2016-06-02
3 2016-06-03
4 2016-06-04
5 2016-06-05
6 2016-06-06
temp$weekday = weekdays(temp$timestamp)
> head(temp)
timestamp weekday
1 2016-06-01 Wednesday
2 2016-06-02 Thursday
3 2016-06-03 Friday
4 2016-06-04 Saturday
5 2016-06-05 Sunday
6 2016-06-06 Monday
We can use format to get the output
df1$weekday <- format(as.Date(df1$Date_of_order), "%a")
df1$weekday
#[1] "Tue" "Mon" "Tue" "Wed" "Thu" "Fri"
According to ?strptime
%a - Abbreviated weekday name in the current locale on this platform.
(Also matches full name on input: in some locales there are no
abbreviations of names.)
library(lubridate)
date <- as.Date(yourdata$Date_of_order, format = "%Y/%m/%d")
yourdata$WeekDay <- weekdays(date)
Related
I have a dataset with a column date like this :
id
date
1
2018-02-13
2
2019-02-28
3
2014-01-23
4
2021-10-28
5
2022-01-23
6
2019-09-28
I want to select data that have a date lesser than 15th February
I tried an ifelse statement to define 15th february for each year in my database but i want to know if there is an easier way to do so !
Here's the result i'm expecting
id
date
1
2018-02-13
3
2014-01-23
5
2022-01-23
Thanks in advance
You could use lubridate
library(lubridate)
library(dplyr)
d %>% filter(month(date)<2 | (month(date)==2 & day(date)<=15))
Output:
id date
1 1 2018-02-13
2 3 2014-01-23
3 5 2022-01-23
This is a base R option. Using format check if the month/year is between January 1st and February 15th.
df[between(format(df$date, "%m%d"), "0101", "0215"),]
Output
id date
1 1 2018-02-13
3 3 2014-01-23
5 5 2022-01-23
Let's say I have a dataframe of timestamps with the corresponding number of tickets sold at that time.
Timestamp ticket_count
(time) (int)
1 2016-01-01 05:30:00 1
2 2016-01-01 05:32:00 1
3 2016-01-01 05:38:00 1
4 2016-01-01 05:46:00 1
5 2016-01-01 05:47:00 1
6 2016-01-01 06:07:00 1
7 2016-01-01 06:13:00 2
8 2016-01-01 06:21:00 1
9 2016-01-01 06:22:00 1
10 2016-01-01 06:25:00 1
I want to know how to calculate the number of tickets sold within a certain time frame of all tickets. For example, I want to calculate the number of tickets sold up to 15 minutes after all tickets. In this case, the first row would have three tickets, the second row would have four tickets, etc.
Ideally, I'm looking for a dplyr solution, as I want to do this for multiple stores with a group_by() function. However, I'm having a little trouble figuring out how to hold each Timestamp fixed for a given row while simultaneously searching through all Timestamps via dplyr syntax.
In the current development version of data.table, v1.9.7, non-equi joins are implemented. Assuming your data.frame is called df and the Timestamp column is POSIXct type:
require(data.table) # v1.9.7+
window = 15L # minutes
(counts = setDT(df)[.(t=Timestamp+window*60L), on=.(Timestamp<t),
.(counts=sum(ticket_count)), by=.EACHI]$counts)
# [1] 3 4 5 5 5 9 11 11 11 11
# add that as a column to original data.table by reference
df[, counts := counts]
For each row in t, all rows where df$Timestamp < that_row is fetched. And by=.EACHI instructs the expression sum(ticket_count) to run for each row in t. That gives your desired result.
Hope this helps.
This is a simpler version of the ugly one I wrote earlier..
# install.packages('dplyr')
library(dplyr)
your_data %>%
mutate(timestamp = as.POSIXct(timestamp, format = '%m/%d/%Y %H:%M'),
ticket_count = as.numeric(ticket_count)) %>%
mutate(window = cut(timestamp, '15 min')) %>%
group_by(window) %>%
dplyr::summarise(tickets = sum(ticket_count))
window tickets
(fctr) (dbl)
1 2016-01-01 05:30:00 3
2 2016-01-01 05:45:00 2
3 2016-01-01 06:00:00 3
4 2016-01-01 06:15:00 3
Here is a solution using data.table. Also incorporating different stores.
Example data:
library(data.table)
dt <- data.table(Timestamp = as.POSIXct("2016-01-01 05:30:00")+seq(60,120000,by=60),
ticket_count = sample(1:9, 2000, T),
store = c(rep(c("A","B","C","D"), 500)))
Now apply the following:
ts <- dt$Timestamp
for(x in ts) {
end <- x+900
dt[Timestamp <= end & Timestamp >= x ,CS := sum(ticket_count),by=store]
}
This gives you
Timestamp ticket_count store CS
1: 2016-01-01 05:31:00 3 A 13
2: 2016-01-01 05:32:00 5 B 20
3: 2016-01-01 05:33:00 3 C 19
4: 2016-01-01 05:34:00 7 D 12
5: 2016-01-01 05:35:00 1 A 15
---
1996: 2016-01-02 14:46:00 4 D 10
1997: 2016-01-02 14:47:00 9 A 9
1998: 2016-01-02 14:48:00 2 B 2
1999: 2016-01-02 14:49:00 2 C 2
2000: 2016-01-02 14:50:00 6 D 6
i have time variable : "00:00:29","00:06:39","20:43:15"....
and I want to recode to new vector - time based work shifts:
07:00:00 - 13:00:00 - 1
13:00:00 - 20:00:00 - 2
23:00:00 - 7:00:00 - 3
thanks for any idea :)
Assuming the time variables are strings as shown, this seems to work:
secNr <- function(x){ sum(as.numeric(unlist(strsplit(x,":",fixed=TRUE))) * c(3600,60,1)) }
workShift <- function(x)
{
n <- which.max(secNr(x) >= c(secNr("23:00:00"),secNr("20:00:00"),secNr("13:00:00"),secNr("07:00:00"),secNr("00:00:00")))
c(3,NA,2,1,3)[n]
}
"workShift" computes the work shift of one such time string. If you have a vector of time strings, use "sapply". Example:
> Time <- sprintf("%i:%02i:00", 0:23, sample(0:59,24))
> Shift <- sapply(Time,"workShift")
> Shift
0:37:00 1:17:00 2:35:00 3:09:00 4:08:00 5:28:00 6:03:00 7:43:00 8:27:00 9:38:00 10:48:00 11:50:00 12:58:00 13:32:00 14:05:00 15:39:00 16:56:00
3 3 3 3 3 3 3 1 1 1 1 1 1 2 2 2 2
17:00:00 18:22:00 19:02:00 20:42:00 21:11:00 22:15:00 23:01:00
2 2 2 NA NA NA 3
I am quite new to R and have been struggling with trying to convert my data and could use some much needed help.
I have a dataframe which is approx. 70,000*2. This data covers a whole year (52 weeks/365 days). A portion of it looks like this:
Create.Date.Time Ticket.ID
1 2013-06-01 12:59:00 INCIDENT684790
2 2013-06-02 07:56:00 SERVICE684793
3 2013-06-02 09:39:00 SERVICE684794
4 2013-06-02 14:14:00 SERVICE684796
5 2013-06-02 17:20:00 SERVICE684797
6 2013-06-03 07:20:00 SERVICE684799
7 2013-06-03 08:02:00 SERVICE684839
8 2013-06-03 08:04:00 SERVICE684841
9 2013-06-03 08:04:00 SERVICE684842
10 2013-06-03 08:08:00 SERVICE684843
I am trying to get the number of tickets in every hour of the week (that is, hour 1 to hour 168) for each week. Hour 1 would start on Monday at 00.00, and hour 168 would be Sunday 23.00-23.59. This would be repeated for each week. I want to use the Create.Date.Time data to calculate the hour of the week the ticket is in, say for:
2013-06-01 12:59:00 INCIDENT684790 - hour 133,
2013-06-03 08:08:00 SERVICE684843 - hour 9
I am then going to do averages for each hour and plot those. I am completely at a loss as to where to start. Could someone please point me to the right direction?
Before addressing the plotting aspect of your question, is this the format of data you are trying to get? This uses the package lubridate which you might have to install (install.packages("lubridate",dependencies=TRUE)).
library(lubridate)
##
Events <- paste(
sample(c("INCIDENT","SERVICE"),20000,replace=TRUE),
sample(600000:900000,20000)
)
t0 <- as.POSIXct(
"2013-01-01 00:00:00",
format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")
Dates <- sort(t0 + sample(0:(3600*24*365-1),20000))
Weeks <- week(Dates)
wDay <- wday(Dates,label=TRUE)
Hour <- hour(Dates)
##
hourShift <- function(time,wday){
hShift <- sapply(wday, function(X){
if(X=="Mon"){
0
} else if(X=="Tues"){
24*1
} else if(X=="Wed"){
24*2
} else if(X=="Thurs"){
24*3
} else if(X=="Fri"){
24*4
} else if(X=="Sat"){
24*5
} else {
24*6
}
})
##
tOut <- hour(time) + hShift + 1
return(tOut)
}
##
weekHour <- hourShift(time=Dates,wday=wDay)
##
Data <- data.frame(
Event=Events,
Timestamp=Dates,
Week=Weeks,
wDay=wDay,
dayHour=Hour,
weekHour=weekHour,
stringsAsFactors=FALSE)
##
This gives you:
> head(Data)
Event Timestamp Week wDay dayHour weekHour
1 SERVICE 783405 2013-01-01 00:13:55 1 Tues 0 25
2 INCIDENT 860015 2013-01-01 01:06:41 1 Tues 1 26
3 INCIDENT 808309 2013-01-01 01:10:05 1 Tues 1 26
4 INCIDENT 835509 2013-01-01 01:21:44 1 Tues 1 26
5 SERVICE 769239 2013-01-01 02:04:59 1 Tues 2 27
6 SERVICE 762269 2013-01-01 02:07:41 1 Tues 2 27
Dear all, I have a data frame which comes directly from a sensor. The data provide the date and time in a single column. I want R to be able to recognise this data and then create an adjacent column in the data frame which gives a number that corresponds to a new day in the time and date column. For example 25/02/2011 13:34 in data$time.date would give 1 in the new column data$day, and 26/02/2011 13:34 in data$time.date would get 2 and so on....
Does anyone know how to go about solving this? Thanks in advance for any help.
You can use cut() and convert to numeric the factor resulting from that call. Here is an example with dummy data:
> sdate <- as.POSIXlt("25/02/2011 13:34", format = "%d/%m/%Y %H:%M")
> edate <- as.POSIXlt("02/03/2011 13:34", format = "%d/%m/%Y %H:%M")
> df <- data.frame(dt = seq(from = sdate, to = edate, by = "hours"))
> head(df)
dt
1 2011-02-25 13:34:00
2 2011-02-25 14:34:00
3 2011-02-25 15:34:00
4 2011-02-25 16:34:00
5 2011-02-25 17:34:00
6 2011-02-25 18:34:00
We cut the date time column into days using cut(). This results in a factor with the dates as labels. We convert this factor to numerics to get 1, 2, ...:
> df <- within(df, day <- cut(dt, "day", labels = FALSE))
> head(df, 13)
dt day
1 2011-02-25 13:34:00 1
2 2011-02-25 14:34:00 1
3 2011-02-25 15:34:00 1
4 2011-02-25 16:34:00 1
5 2011-02-25 17:34:00 1
6 2011-02-25 18:34:00 1
7 2011-02-25 19:34:00 1
8 2011-02-25 20:34:00 1
9 2011-02-25 21:34:00 1
10 2011-02-25 22:34:00 1
11 2011-02-25 23:34:00 1
12 2011-02-26 00:34:00 2
13 2011-02-26 01:34:00 2
You can achieve this using cut.POSIXt. For example:
dat <- data.frame(datetimes = Sys.time() - seq(360000, 0, by=-3600))
dat$day <- cut(dat$datetimes, breaks="day", labels=FALSE)
Note that this assumes your date time column is correclty formated as a date-time class.
See ?DateTimeClasses for details.