Splitting created sequence of dates into separate columns - r

I created a dataframe (Dates) of dates/times every 3 hours from 1981-2010 as follows:
# Create dates and times
start <- as.POSIXct("1981-01-01")
interval <- 60
end <- start + as.difftime(10957, units="days")
Dates = data.frame(seq(from=start, by=interval*180, to=end))
colnames(Dates) = "Date"
I now want to split the data into four separate columns with year, month, day and hour. I tried so split the dates using the following code:
Date.split = strsplit(Dates, "-| ")
But I get the following error:
Error in strsplit(Dates, "-| ") : non-character argument
If I try to convert the Dates data to characters then it completely changes the dates, e.g.
Dates.char = as.character(Dates)
gives the following output:
Dates.char Large Character (993.5 kB)
chr "c(347155200, 347166000 ...
I'm getting lost with the conversion between character and numeric and don't know where to go from here. Any insights much appreciated.

One way is to use format.
head(
setNames(
cbind(Dates,
format(Dates, "%Y"), format(Dates, "%m"), format(Dates, "%d"),
format(Dates, "%H")),
c("dates", "year", "month", "day", "hour"))
)
dates year month day hour
1 1981-01-01 00:00:00 1981 01 01 00
2 1981-01-01 03:00:00 1981 01 01 03
3 1981-01-01 06:00:00 1981 01 01 06
4 1981-01-01 09:00:00 1981 01 01 09
5 1981-01-01 12:00:00 1981 01 01 12
6 1981-01-01 15:00:00 1981 01 01 15

A very concise way is to decompose the POSIXlt record:
Dates = cbind(Dates, do.call(rbind, lapply(Dates$Date, as.POSIXlt)))
or
Dates <- data.frame(Dates, unclass(as.POSIXlt(Dates$Date)))
It will return you some aditional data, however. You can filter further
# Date sec min hour mday mon year wday yday isdst zone # gmtoff
# 1 1981-01-01 00:00:00 0 0 0 1 0 81 4 0 0 -03 # -10800
# 2 1981-01-01 03:00:00 0 0 3 1 0 81 4 0 0 -03 # -10800
# 3 1981-01-01 06:00:00 0 0 6 1 0 81 4 0 0 -03 # -10800

Related

convert day-number within the year to month/day format

I am trying to convert a day-number within the year to month/day format.
With this df:
set.seed(123)
df1 <- data.frame(Year = rep(15,100), DayNum = seq(78,177,1), Hour = sample(0:23,100,replace = T))
df2 <- data.frame(Year = rep(16,100), DayNum = seq(78,177,1), Hour = sample(0:23,100,replace = T))
df <- rbind(df1, df2)
> head(df)
Year DayNum Hour
1 15 78 6
2 15 79 18
3 15 80 9
4 15 81 21
5 15 82 22
6 15 83 1
> tail(df)
Year DayNum Hour
195 16 172 22
196 16 173 11
197 16 174 9
198 16 175 15
199 16 176 3
200 16 177 13
which has 100 records for 2015 and 2016, how can I make a POSIXct date/time column?
While there are a number of related posts with a Julian date from a beginning origin (usually 1970-01-01), I could not find any posts with a day-number within a year and with a variable year (i.e. 2015 and 2016).
The as.POSIXct function has an option to specify the origin date when converting from a "Julian" date to the date/time object:
#calculate the origin date based on the year column
df$origin<-as.Date(paste0("20", df$Year,"-01-01"))
#convert the Julian day to a date/time object
as.POSIXct(df$JulianDay, origin=df$origin)
One may need to consider adding the timezone option for completeness:
as.POSIXct(df$JulianDay, origin=df$origin, tz="GMT")
You might need something like this, use %j to specify the day of the year:
strptime(with(df, paste(Year, DayNum, Hour)), "%y %j %H")
# [1] "2015-03-19 06:00:00 EDT"
# [2] "2015-03-20 18:00:00 EDT"
# [3] "2015-03-21 09:00:00 EDT"
# [4] "2015-03-22 21:00:00 EDT"
# [5] "2015-03-23 22:00:00 EDT"

extract weekdays from a set of dates in R

I know using the lubridate package, I can generate the respective weekday for each date of entry. I am now dealing with a large dataset having a lot of date entries and I wish to extract weekdays for each date entries. I think it is quite impossible to search for each date and to find weekdays. I will love to have a function that will allow me to insert my date column from my data frame and will produce days corresponding to each dates of the frame.
my frame is like
uinq_id Product_ID Date_of_order count
1 Aarkios04_2014-09-09 Aarkios04 2014-09-09 10
2 ABEE01_2014-08-18 ABEE01 2014-08-18 1
3 ABEE01_2014-08-19 ABEE01 2014-08-19 0
4 ABEE01_2014-08-20 ABEE01 2014-08-20 0
5 ABEE01_2014-08-21 ABEE01 2014-08-21 0
6 ABEE01_2014-08-22 ABEE01 2014-08-22 0
i am trying to generate
uinq_id Product_ID Date_of_order count weekday
1 Aarkios04_2014-09-09 Aarkios04 2014-09-09 10 Tues
2 ABEE01_2014-08-18 ABEE01 2014-08-18 1 Mon
3 ABEE01_2014-08-19 ABEE01 2014-08-19 0 Tues
4 ABEE01_2014-08-20 ABEE01 2014-08-20 0 Wed
5 ABEE01_2014-08-21 ABEE01 2014-08-21 0 Thurs
6 ABEE01_2014-08-22 ABEE01 2014-08-22 0 Fri
any help will be highly beneficial.
thank you.
Using weekdays from base R you can do this for a vector all at once:
temp = data.frame(timestamp = Sys.Date() + 1:20)
> head(temp)
timestamp
1 2016-06-01
2 2016-06-02
3 2016-06-03
4 2016-06-04
5 2016-06-05
6 2016-06-06
temp$weekday = weekdays(temp$timestamp)
> head(temp)
timestamp weekday
1 2016-06-01 Wednesday
2 2016-06-02 Thursday
3 2016-06-03 Friday
4 2016-06-04 Saturday
5 2016-06-05 Sunday
6 2016-06-06 Monday
We can use format to get the output
df1$weekday <- format(as.Date(df1$Date_of_order), "%a")
df1$weekday
#[1] "Tue" "Mon" "Tue" "Wed" "Thu" "Fri"
According to ?strptime
%a - Abbreviated weekday name in the current locale on this platform.
(Also matches full name on input: in some locales there are no
abbreviations of names.)
library(lubridate)
date <- as.Date(yourdata$Date_of_order, format = "%Y/%m/%d")
yourdata$WeekDay <- weekdays(date)

Produce weekly average plots from large dataset in R

I am quite new to R and have been struggling with trying to convert my data and could use some much needed help.
I have a dataframe which is approx. 70,000*2. This data covers a whole year (52 weeks/365 days). A portion of it looks like this:
Create.Date.Time Ticket.ID
1 2013-06-01 12:59:00 INCIDENT684790
2 2013-06-02 07:56:00 SERVICE684793
3 2013-06-02 09:39:00 SERVICE684794
4 2013-06-02 14:14:00 SERVICE684796
5 2013-06-02 17:20:00 SERVICE684797
6 2013-06-03 07:20:00 SERVICE684799
7 2013-06-03 08:02:00 SERVICE684839
8 2013-06-03 08:04:00 SERVICE684841
9 2013-06-03 08:04:00 SERVICE684842
10 2013-06-03 08:08:00 SERVICE684843
I am trying to get the number of tickets in every hour of the week (that is, hour 1 to hour 168) for each week. Hour 1 would start on Monday at 00.00, and hour 168 would be Sunday 23.00-23.59. This would be repeated for each week. I want to use the Create.Date.Time data to calculate the hour of the week the ticket is in, say for:
2013-06-01 12:59:00 INCIDENT684790 - hour 133,
2013-06-03 08:08:00 SERVICE684843 - hour 9
I am then going to do averages for each hour and plot those. I am completely at a loss as to where to start. Could someone please point me to the right direction?
Before addressing the plotting aspect of your question, is this the format of data you are trying to get? This uses the package lubridate which you might have to install (install.packages("lubridate",dependencies=TRUE)).
library(lubridate)
##
Events <- paste(
sample(c("INCIDENT","SERVICE"),20000,replace=TRUE),
sample(600000:900000,20000)
)
t0 <- as.POSIXct(
"2013-01-01 00:00:00",
format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")
Dates <- sort(t0 + sample(0:(3600*24*365-1),20000))
Weeks <- week(Dates)
wDay <- wday(Dates,label=TRUE)
Hour <- hour(Dates)
##
hourShift <- function(time,wday){
hShift <- sapply(wday, function(X){
if(X=="Mon"){
0
} else if(X=="Tues"){
24*1
} else if(X=="Wed"){
24*2
} else if(X=="Thurs"){
24*3
} else if(X=="Fri"){
24*4
} else if(X=="Sat"){
24*5
} else {
24*6
}
})
##
tOut <- hour(time) + hShift + 1
return(tOut)
}
##
weekHour <- hourShift(time=Dates,wday=wDay)
##
Data <- data.frame(
Event=Events,
Timestamp=Dates,
Week=Weeks,
wDay=wDay,
dayHour=Hour,
weekHour=weekHour,
stringsAsFactors=FALSE)
##
This gives you:
> head(Data)
Event Timestamp Week wDay dayHour weekHour
1 SERVICE 783405 2013-01-01 00:13:55 1 Tues 0 25
2 INCIDENT 860015 2013-01-01 01:06:41 1 Tues 1 26
3 INCIDENT 808309 2013-01-01 01:10:05 1 Tues 1 26
4 INCIDENT 835509 2013-01-01 01:21:44 1 Tues 1 26
5 SERVICE 769239 2013-01-01 02:04:59 1 Tues 2 27
6 SERVICE 762269 2013-01-01 02:07:41 1 Tues 2 27

create timestamp from dataframe columns and use as xaxis in plot

I've got some a dataframe with the date, hour, and minute in columns. I would like to plot the value column by some sort of timestamp. Anyway to do this?
> head(fl)
date hour min value
1 2014-02-23 0 0 81
2 2014-02-23 0 1 65
3 2014-02-23 0 2 73
4 2014-02-23 0 3 81
5 2014-02-23 0 4 89
6 2014-02-23 0 5 69
...
Right now I'm using ggplot2, but it combines the minutes of every hour and day together :(
ggplot( fl, aes( min, value) ) + geom_line()
Any thoughts?
An as.POSIXct alternative giving the same result as #RobertKrzyzanowski
fl <- data.frame(date=c('2014-02-23', '2014-02-22'), hour = c(0,0), min = c(1,2))
fl$stamp <- with(fl, as.POSIXct( paste(date,hour,min), format="%Y-%m-%d %H %M"))
#> fl
# date hour min stamp
#1 2014-02-23 0 1 2014-02-23 00:01:00
#2 2014-02-22 0 2 2014-02-22 00:02:00
Try the function ISOdatetime(Year, Month, Day, Hour, Min, Sec):
fl <- data.frame(date=c('2014-02-23', '2014-02-22'), hour = c(0,0), min = c(1,2))
zip <- function(x) do.call(Map, append(list(c), x))
args <- unname(append(zip(strsplit(as.character(fl$date), '-')), list(fl$hour, fl$min, 0)))
fl$timestamp <- do.call(ISOdatetime, args)
print(fl)
# date hour min timestamp
# 1 2014-02-23 0 1 2014-02-23 00:01:00
# 2 2014-02-12 0 2 2014-02-12 00:02:00
library(lubridate)
fl$datetime <- ymd_hm(paste(fl$date,fl$hour,fl$min,sep='-'))
ggplot(fl, aes( datetime, value) ) + geom_line()
df<- within(df[5:6], { DT=format(as.POSIXct(paste(df$date, df$time, sep = ' ')),
"%Y-%m-%d %H:%M:%S")
})

How to subset data.frame by weeks and then sum?

Let's say I have several years worth of data which look like the following
# load date package and set random seed
library(lubridate)
set.seed(42)
# create data.frame of dates and income
date <- seq(dmy("26-12-2010"), dmy("15-01-2011"), by = "days")
df <- data.frame(date = date,
wday = wday(date),
wday.name = wday(date, label = TRUE, abbr = TRUE),
income = round(runif(21, 0, 100)),
week = format(date, format="%Y-%U"),
stringsAsFactors = FALSE)
# date wday wday.name income week
# 1 2010-12-26 1 Sun 91 2010-52
# 2 2010-12-27 2 Mon 94 2010-52
# 3 2010-12-28 3 Tues 29 2010-52
# 4 2010-12-29 4 Wed 83 2010-52
# 5 2010-12-30 5 Thurs 64 2010-52
# 6 2010-12-31 6 Fri 52 2010-52
# 7 2011-01-01 7 Sat 74 2011-00
# 8 2011-01-02 1 Sun 13 2011-01
# 9 2011-01-03 2 Mon 66 2011-01
# 10 2011-01-04 3 Tues 71 2011-01
# 11 2011-01-05 4 Wed 46 2011-01
# 12 2011-01-06 5 Thurs 72 2011-01
# 13 2011-01-07 6 Fri 93 2011-01
# 14 2011-01-08 7 Sat 26 2011-01
# 15 2011-01-09 1 Sun 46 2011-02
# 16 2011-01-10 2 Mon 94 2011-02
# 17 2011-01-11 3 Tues 98 2011-02
# 18 2011-01-12 4 Wed 12 2011-02
# 19 2011-01-13 5 Thurs 47 2011-02
# 20 2011-01-14 6 Fri 56 2011-02
# 21 2011-01-15 7 Sat 90 2011-02
I would like to sum 'income' for each week (Sunday thru Saturday). Currently I do the following:
Weekending 2011-01-01 = sum(df$income[1:7]) = 487
Weekending 2011-01-08 = sum(df$income[8:14]) = 387
Weekending 2011-01-15 = sum(df$income[15:21]) = 443
However I would like a more robust approach which will automatically sum by week. I can't work out how to automatically subset the data into weeks. Any help would be much appreciated.
First use format to convert your dates to week numbers, then plyr::ddply() to calculate the summaries:
library(plyr)
df$week <- format(df$date, format="%Y-%U")
ddply(df, .(week), summarize, income=sum(income))
week income
1 2011-52 413
2 2012-01 435
3 2012-02 379
For more information on format.date, see ?strptime, particular the bit that defines %U as the week number.
EDIT:
Given the modified data and requirement, one way is to divide the date by 7 to get a numeric number indicating the week. (Or more precisely, divide by the number of seconds in a week to get the number of weeks since the epoch, which is 1970-01-01 by default.
In code:
df$week <- as.Date("1970-01-01")+7*trunc(as.numeric(df$date)/(3600*24*7))
library(plyr)
ddply(df, .(week), summarize, income=sum(income))
week income
1 2010-12-23 298
2 2010-12-30 392
3 2011-01-06 294
4 2011-01-13 152
I have not checked that the week boundaries are on Sunday. You will have to check this, and insert an appropriate offset into the formula.
This is now simple using dplyr. Also I would suggest using cut(breaks = "week") rather than format() to cut the dates into weeks.
library(dplyr)
df %>% group_by(week = cut(date, "week")) %>% mutate(weekly_income = sum(income))
I Googled "group week days into weeks R" and came across this SO question. You mention you have multiple years, so I think we need to keep up with both the week number and also the year, so I modified the answers there as so format(date, format = "%U%y")
In use it looks like this:
library(plyr) #for aggregating
df <- transform(df, weeknum = format(date, format = "%y%U"))
ddply(df, "weeknum", summarize, suminc = sum(income))
#----
weeknum suminc
1 1152 413
2 1201 435
3 1202 379
See ?strptime for all the format abbreviations.
Try rollapply from the zoo package:
rollapply(df$income, width=7, FUN = sum, by = 7)
# [1] 487 387 443
Or, use period.sum from the xts package:
period.sum(xts(df$income, order.by=df$date), which(df$wday %in% 7))
# [,1]
# 2011-01-01 487
# 2011-01-08 387
# 2011-01-15 443
Or, to get the output in the format you want:
data.frame(income = period.sum(xts(df$income, order.by=df$date),
which(df$wday %in% 7)),
week = df$week[which(df$wday %in% 7)])
# income week
# 2011-01-01 487 2011-00
# 2011-01-08 387 2011-01
# 2011-01-15 443 2011-02
Note that the first week shows as 2011-00 because that's how it is entered in your data. You could also use week = df$week[which(df$wday %in% 1)] which would match your output.
This solution is influenced by #Andrie and #Chase.
# load plyr
library(plyr)
# format weeks as per requirement (replace "00" with "52" and adjust corresponding year)
tmp <- list()
tmp$y <- format(df$date, format="%Y")
tmp$w <- format(df$date, format="%U")
tmp$y[tmp$w=="00"] <- as.character(as.numeric(tmp$y[tmp$w=="00"]) - 1)
tmp$w[tmp$w=="00"] <- "52"
df$week <- paste(tmp$y, tmp$w, sep = "-")
# get summary
df2 <- ddply(df, .(week), summarize, income=sum(income))
# include week ending date
tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week==x, "date"])[[1]])
df2$week.ending <- sapply(tmp$week.ending, as.character)
# week income week.ending
# 1 2010-52 487 2011-01-01
# 2 2011-01 387 2011-01-08
# 3 2011-02 443 2011-01-15
df.index = df['week'] #the the dt variable as index
df.resample('W').sum() #sum using resample
With dplyr:
df %>%
arrange(date) %>%
mutate(week = as.numeric(date - date[1])%/%7) %>%
group_by(week) %>%
summarise(weekincome= sum(income))
Instead of date[1] you can have any date from when you want to start your weekly study.

Resources