I have a folder with several files where the name of each file is the respective userID. Something like this:
Time Sms
1 2012-01-01 00:00:00 10
2 2012-01-01 00:30:00 11
3 2012-01-01 01:00:00 13
4 2012-01-01 01:30:00 10
How can i aggretate by moth, week, hour and minute? Something like this:
Month DayofWeek hour min SMS
1 Mon 0 0 14 <-mean
1 Mon 0 30 12
1 Mon 1 0 17
1 Mon 1 30 21
.............................
12 Sunday 23 30 12
I had a similar issue aggregating hourly data into daily data. This is the code that worked for me.
fun <- function(s,i,j) { sum(s[i:(i+j-1)]) }
radday<-sapply(X=seq(1,24*nb_of_days,24),FUN=fun,s=your_time_series,j=24)
This sums data across a period j, which in my case since I was summing over 24 hours was 24. By changing the j value you can adjust it for your different periods of hour, day, week, month assuming that you have a constant period.
thanks for the help. I solved my problem by applying this code:
df<-aggregate(Sms~month(Time)+weekdays(Time)+hour(Time)+minute(Time),df,FUN='mean')
Related
This probably seems straightforward, but I am pretty stumped.
I have a set of dates ~ August 1 of each year and need to sum sales by week number. The earliest date is 2008-12-08 (YYYY-MM-DD). I need to create a "week_id" field where week #1 begins on 2008-12-08. And the date 2011-09-03 is week 142. Note that this is different since the calculation of week number does not reset every year.
I am putting up a small example dataset here:
data <- data.frame(
dates = c("2008-12-08", "2009-08-10", "2010-03-31", "2011-10-16", "2008-06-03", "2009-11-14" , "2010-05-05", "2011-09-03"))
data$date = as.Date(data$date)
Any help is appreciated
data$week_id = as.numeric(data$date - as.Date("2008-12-08")) %/% 7 + 1
This would take the day difference between the two dates and find the integer number of 7 days elapsed. I add one since we want the dates where zero weeks have elapsed since the start to be week 1 instead of week 0.
dates date week_id
1 2008-12-07 2008-12-07 0 # added for testing
2 2008-12-08 2008-12-08 1
3 2008-12-09 2008-12-09 1 # added for testing
4 2008-12-14 2008-12-14 1 # added for testing
5 2008-12-15 2008-12-15 2 # added for testing
6 2009-08-10 2009-08-10 36
7 2010-03-31 2010-03-31 69
8 2011-10-16 2011-10-16 149
9 2008-06-03 2008-06-03 -26
10 2009-11-14 2009-11-14 49
11 2010-05-05 2010-05-05 74
12 2011-09-03 2011-09-03 143
Hello I am trying to find the week number for a series of date over three years. However R is not giving the correct week number. I am generating a seq of dates from 2016-04-01 to 2019-03-30 and then I am trying to calculate week over three years such that I get the week number 54, 55 , 56 and so on.
However when I check the week 2016-04-03 R shows the week number as 14 where as when cross checked with excel it is the week number 15 and also it simply calculates 7 days and does not reference the actual calendar days. Also the week number starts from 1 for every start of year
The code looks like this
days <- seq(as.Date("2016-04-03"),as.Date("2019-03-30"),'days')
weekdays <- data.frame('days'=days, Month = month(days), week = week(days),nweek = rep(1,length(days)))
This is how the results looks like
days week
2016-04-01 14
2016-04-02 14
2016-04-03 14
2016-04-04 14
2016-04-05 14
2016-04-06 14
2016-04-07 14
2016-04-08 15
2016-04-09 15
2016-04-10 15
2016-04-11 15
2016-04-12 15
However when checked from excel this is what I get
days week
2016-04-01 14
2016-04-02 14
2016-04-03 15
2016-04-04 15
2016-04-05 15
2016-04-06 15
2016-04-07 15
2016-04-08 15
2016-04-09 15
2016-04-10 16
2016-04-11 16
2016-04-12 16
Can someone please help me identify wherever I am going wrong.
Thanks a lot in advance!!
Not anything that you're doing wrong per se, there is just a difference in how R (I presume you're using the lubridate package) and Excel calculate week numbers.
R will calculate week numbers based on the seven day block from 1 January that year; but
Excel calculates week numbers based on a week starting from Sunday.
Taking the first few days of January 2016 for an example. On, Friday, 1 January 2016, both R and Excel will say this is week 1.
On Sunday, 3 January 2016:
this is within the first seven days of the start of the year so R will return week number 1; but
it is a Sunday, so Excel ticks over to week number 2.
Try this:
ifelse(test = weekdays.Date(days[1]) == "Sunday", yes = epiweek(days[1]), no = epiweek(days[1]) + 1) + cumsum(weekdays.Date(days) == "Sunday")
This tests whether the first day is a Sunday or not and returns an appropriate week number starting point, then adds on one more week number each Sunday. Gives the same week number if there's overlap between years.
I have start and end times of some commercial event for a couple of locations. The event may or may not take place on each day and the event duration does not overlap. For example, run this:
inputdata = data.frame(
location = c('x','x','y','z','z'),
start = c(as.POSIXct("2010/1/1 8:28:00"),as.POSIXct("2010/1/2 7:20:00"),
as.POSIXct("2010/1/1 10:22:00"),
as.POSIXct("2010/1/5 13:28:00"),as.POSIXct("2010/1/7 15:39:00")),
end = c(as.POSIXct("2010/1/1 13:25:00"),as.POSIXct("2010/1/2 10:09:00"),
as.POSIXct("2010/1/1 15:24:00"),
as.POSIXct("2010/1/6 00:28:00"),as.POSIXct("2010/1/7 19:34:00"))
)
The input data looks like:
location start end
1 x 2010-01-01 08:28:00 2010-01-01 13:25:00
2 x 2010-01-02 07:20:00 2010-01-02 10:09:00
3 y 2010-01-01 10:22:00 2010-01-01 15:24:00
4 z 2010-01-05 13:28:00 2010-01-06 00:28:00
5 z 2010-01-07 15:39:00 2010-01-07 19:34:00
I want to construct an hourly dataset with three columns: 1.location, 2.hour, and 3.indicator and each row is for a pair of location and sharp hour (for instance, as.POSIXct("2010/1/1 13:00:00")) where indicator is a dummy, =1 if this hour is between some event start and end times for the location.
For instance, let's say the output hourly data are for 2010-01-01 to 2010-01-07. Run this:
output = data.frame(
location = rep(c('x','y','z'),
each=length(seq(as.POSIXct("2010/1/1"), as.POSIXct("2010/1/7 23:00:00"), "hours"))),
hour = rep(seq(as.POSIXct("2010/1/1"), as.POSIXct("2010/1/7 23:00:00"), "hours"),3),
indicator = rep(0,3*length(seq(as.POSIXct("2010/1/1"), as.POSIXct("2010/1/7 23:00:00"), "hours"))))
So we get the first six rows look like this:
location hour indicator
1 x 2010-01-01 00:00:00 0
2 x 2010-01-01 01:00:00 0
3 x 2010-01-01 02:00:00 0
4 x 2010-01-01 03:00:00 0
5 x 2010-01-01 04:00:00 0
6 x 2010-01-01 05:00:00 0
Now, we need to change the value of indicator to 1 if the hour in the same row has an event in effect for the location in the same row.
For instance, since location x has an event between 8:28 am on 2010/1/1 and 13:25 pm on 2010/1/1. So the rows for 7 am to 14 pm should look like this:
location hour indicator
8 x 2010-01-01 07:00:00 0
9 x 2010-01-01 08:00:00 1
10 x 2010-01-01 09:00:00 1
11 x 2010-01-01 10:00:00 1
12 x 2010-01-01 11:00:00 1
13 x 2010-01-01 12:00:00 1
14 x 2010-01-01 13:00:00 1
15 x 2010-01-01 14:00:00 0
It seems that I can do exhaustively search for each pair of location and hour and update the value of indicator is the hour is between the start and end hour of some event at that location. But I doubt this is the best way.
Or I am thinking that I can first, convert the input data to hourly data where the hour would be there only if they are between the start and end hour. In other words, the converted data should look like:
location hour indicator
1 x 2010-01-01 08:00:00 1
2 x 2010-01-01 09:00:00 1
3 x 2010-01-01 10:00:00 1
4 x 2010-01-01 11:00:00 1
5 x 2010-01-01 12:00:00 1
6 x 2010-01-01 13:00:00 1
7 x 2010-01-02 07:00:00 1
8 x 2010-01-02 08:00:00 1
9 x 2010-01-02 09:00:00 1
10 x 2010-01-02 10:00:00 1
11 y 2010-01-01 10:00:00 1
12 y 2010-01-01 11:00:00 1
and then I go from there to get the correct indicators for each hour for each location. Though, I don't know how to convert the start/end hours to hourly observations.
This is all I get for this problem so far.
With this said, I do not have a solution and would like to ask for help.
Also, all I want is that output with three columns. When contributing, please do not constrained by my thoughts which may not be efficient.
It is worth mentioning that the actual problem covers 5 years and there are 30 locations. So the algorithm needs to be efficient.
Here is a way to do this with a cross join.
library(dplyr)
hours =
data_frame(hour = seq(as.POSIXct("2010/1/1"),
as.POSIXct("2010/1/7 23:00:00"),
"hours") ) %>%
merge(inputdata %>% select(location) %>% distinct)
hours %>%
left_join(inputdata) %>%
filter(start <= hour & hour <= end) %>%
right_join(hours) %>%
mutate(indicator = +!is.na(start))
Is there a good way to get a year + week number converted a date in R? I have tried the following:
> as.POSIXct("2008 41", format="%Y %U")
[1] "2008-02-21 EST"
> as.POSIXct("2008 42", format="%Y %U")
[1] "2008-02-21 EST"
According to ?strftime:
%Y Year with century. Note that whereas there was no zero in the
original Gregorian calendar, ISO 8601:2004 defines it to be valid
(interpreted as 1BC): see http://en.wikipedia.org/wiki/0_(year). Note
that the standard also says that years before 1582 in its calendar
should only be used with agreement of the parties involved.
%U Week of the year as decimal number (00–53) using Sunday as the
first day 1 of the week (and typically with the first Sunday of the
year as day 1 of week 1). The US convention.
This is kinda like another question you may have seen before. :)
The key issue is: what day should a week number specify? Is it the first day of the week? The last? That's ambiguous. I don't know if week one is the first day of the year or the 7th day of the year, or possibly the first Sunday or Monday of the year (which is a frequent interpretation). (And it's worse than that: these generally appear to be 0-indexed, rather than 1-indexed.) So, an enumerated day of the week needs to be specified.
For instance, try this:
as.POSIXlt("2008 42 1", format = "%Y %U %u")
The %u indicator specifies the day of the week.
Additional note: See ?strptime for the various options for format conversion. It's important to be careful about the enumeration of weeks, as these can be split across the end of the year, and day 1 is ambiguous: is it specified based on a Sunday or Monday, or from the first day of the year? This should all be specified and tested on the different systems where the R code will run. I'm not certain that Windows and POSIX systems sing the same tune on some of these conversions, hence I'd test and test again.
Day-of-week == zero in the POSIXlt DateTimesClasses system is Sunday. Not exactly Biblical and not in agreement with the R indexing that starts at "1" convention either, but that's what it is. Week zero is the first (partial) week in the year. Week one (but day of week zero) starts with the first Sunday. And all the other sequence types in POSIXlt have 0 as their starting point. It kind of interesting to see what coercing the list elements of POSIXlt objects do. The only way you can actually change a POSIXlt date is to alter the $year, the $mon or the $mday elements. The others seem to be epiphenomena.
today <- as.POSIXlt(Sys.Date())
today # Tuesday
#[1] "2012-02-21 UTC"
today$wday <- 0 # attempt to make it Sunday
today
# [1] "2012-02-21 UTC" The attempt fails
today$mday <- 19
today
#[1] "2012-02-19 UTC" Success
I did not come up with this myself (it's taken from a blog post by Forester), but nevertheless I thought I'd add this to the answer list because it's the first implementation of the ISO 8601 week number convention that I've seen in R.
No doubt, week numbers are a very ambiguous topic, but I prefer an ISO standard over the current implementation of week numbers via format(..., "%U") because it seems that this is what most people agreed on, at least in Germany (calendars etc.).
I've put the actual function def at the bottom to facilitate focusing on the output first. Also, I just stumbled across package ISOweek, maybe worth a try.
Approach Comparison
x.days <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
x.names <- sapply(1:length(posix), function(x) {
x.day <- as.POSIXlt(posix[x], tz="Europe/Berlin")$wday
if (x.day == 0) {
x.day <- 7
}
out <- x.days[x.day]
})
data.frame(
posix,
name=x.names,
week.r=weeknum,
week.iso=ISOweek(as.character(posix), tzone="Europe/Berlin")$weeknum
)
# Result
posix name week.r week.iso
1 2012-01-01 Sun 1 4480458
2 2012-01-02 Mon 1 1
3 2012-01-03 Tue 1 1
4 2012-01-04 Wed 1 1
5 2012-01-05 Thu 1 1
6 2012-01-06 Fri 1 1
7 2012-01-07 Sat 1 1
8 2012-01-08 Sun 2 1
9 2012-01-09 Mon 2 2
10 2012-01-10 Tue 2 2
11 2012-01-11 Wed 2 2
12 2012-01-12 Thu 2 2
13 2012-01-13 Fri 2 2
14 2012-01-14 Sat 2 2
15 2012-01-15 Sun 3 2
16 2012-01-16 Mon 3 3
17 2012-01-17 Tue 3 3
18 2012-01-18 Wed 3 3
19 2012-01-19 Thu 3 3
20 2012-01-20 Fri 3 3
21 2012-01-21 Sat 3 3
22 2012-01-22 Sun 4 3
23 2012-01-23 Mon 4 4
24 2012-01-24 Tue 4 4
25 2012-01-25 Wed 4 4
26 2012-01-26 Thu 4 4
27 2012-01-27 Fri 4 4
28 2012-01-28 Sat 4 4
29 2012-01-29 Sun 5 4
30 2012-01-30 Mon 5 5
31 2012-01-31 Tue 5 5
Function Def
It's taken directly from the blog post, I've just changed a couple of minor things. The function is still kind of sketchy (e.g. the week number of the first date is far off), but I find it to be a nice start!
ISOweek <- function(
date,
format="%Y-%m-%d",
tzone="UTC",
return.val="weekofyear"
){
##converts dates into "dayofyear" or "weekofyear", the latter providing the ISO-8601 week
##date should be a vector of class Date or a vector of formatted character strings
##format refers to the date form used if a vector of
## character strings is supplied
##convert date to POSIXt format
if(class(date)[1]%in%c("Date","character")){
date=as.POSIXlt(date,format=format, tz=tzone)
}
# if(class(date)[1]!="POSIXt"){
if (!inherits(date, "POSIXt")) {
print("Date is of wrong format.")
break
}else if(class(date)[2]=="POSIXct"){
date=as.POSIXlt(date, tz=tzone)
}
print(date)
if(return.val=="dayofyear"){
##add 1 because POSIXt is base zero
return(date$yday+1)
}else if(return.val=="weekofyear"){
##Based on the ISO8601 weekdate system,
## Monday is the first day of the week
## W01 is the week with 4 Jan in it.
year=1900+date$year
jan4=strptime(paste(year,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
##calculate the date of the first week of the year
weekstart=jan4-(wday-1)*86400
weeknum=ceiling(as.numeric((difftime(date,weekstart,units="days")+0.1)/7))
#########################################################################
##calculate week for days of the year occuring in the next year's week 1.
#########################################################################
mday=date$mday
wday=date$wday
wday[wday==0]=7
year=ifelse(weeknum==53 & mday-wday>=28,year+1,year)
weeknum=ifelse(weeknum==53 & mday-wday>=28,1,weeknum)
################################################################
##calculate week for days of the year occuring prior to week 1.
################################################################
##first calculate the numbe of weeks in the previous year
year.shift=year-1
jan4.shift=strptime(paste(year.shift,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4.shift$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
weekstart=jan4.shift-(wday-1)*86400
weeknum.shift=ceiling(as.numeric((difftime(date,weekstart)+0.1)/7))
##update year and week
year=ifelse(weeknum==0,year.shift,year)
weeknum=ifelse(weeknum==0,weeknum.shift,weeknum)
return(list("year"=year,"weeknum"=weeknum))
}else{
print("Unknown return.val")
break
}
}
Any idea why the day is coming out wrong when the date is accurate?
I'm debugging and I can see the date variables which are correct but the day is wrong.
date Date (#9f14161)
date 26 [0x1a]
dateUTC 26 [0x1a]
day 5
dayUTC 5
fullYear 2010 [0x7da]
fullYearUTC 2010 [0x7da]
hours 17 [0x11]
hoursUTC 17 [0x11]
milliseconds 0
millisecondsUTC 0
minutes 0
minutesUTC 0
month 10 [0xa]
monthUTC 10 [0xa]
seconds 0
secondsUTC 0
time 1290790800000 [0x12c89208a80]
timezoneOffset 0
That is my date variables, as you can see, The date is 26 (today), month is 10 (this month) and the year is 2010 (this year) yet the day is coming out at 5 which is a friday.
The month begins with 0, so a month with the value 10 is not october but november.
So friday (day = 5) is correct in your example.