I have a data frame with start dates and end dates, along with the number of people registered for an event. I would like to calculate the number of hours each party is present for within a specific timeframe (e.g., 07:00 - 17:00)
If I use the following example data.frame...
d <- data.frame(startDate = c(as.POSIXct("2011-06-04 08:00:00"), as.POSIXct("2011-06-03 08:00:00"),
as.POSIXct("2011-09-12 10:00:00")),
endDate = c(as.POSIXct("2011-06-06 11:00:00"), as.POSIXct("2011-06-04 11:00:00"),
as.POSIXct("2011-09-12 18:00:00")),
partysize = c(124,442,323))
open <- "07:00"
close <- "17:00"
I would like my result set to look something like this:
day numhours partysize
2011-06-04 9 124
2011-06-05 10 124
2011-06-06 4 124
2011-06-03 9 442
2011-06-04 4 442
2011-09-12 7 323
note: numhours is the number of hours the date was included between the open and close times
Thanks in advance,
--JT
Sorry its very messy and I used 7 and 17 instead of your open and close
app.days<-mapply(function(x,y){x+y*60*60*24},as.POSIXct(format(d$startDate,"%Y-%m-%d")),lapply(floor(-(d$startDate-d$endDate)/24),seq,from=0))
start.date<-mapply(function(x,y){pmax(x+7*60*60,y)},app.days,d$startDate)
end.date<-mapply(function(x,y){pmin(x+17*60*60,y)},app.days,d$endDate)
app.hours<-mapply(function(x,y){as.numeric(x-y)},end.date,start.date)
res<-mapply(function(x,y,z){data.frame(day=as.Date(x),numhours=y,partysize=z)},app.days,app.hours,as.list(d$partysize))
res1<-data.frame(day=as.Date(unlist(res[1,]),origin="1970-01-01"),numhours=unlist(res[2,]),partysize=unlist(res[3,]))
> res1
day numhours partysize
1 2011-06-04 9 124
2 2011-06-05 10 124
3 2011-06-06 4 124
4 2011-06-03 9 442
5 2011-06-04 4 442
6 2011-09-12 7 323
Basically we identify how many days each party size stays for. For a given day we find the applicable open and close. Then we subtract open from close. The dataframe is eventually formed but it could probably have been created in the res<- step.....
Related
This probably seems straightforward, but I am pretty stumped.
I have a set of dates ~ August 1 of each year and need to sum sales by week number. The earliest date is 2008-12-08 (YYYY-MM-DD). I need to create a "week_id" field where week #1 begins on 2008-12-08. And the date 2011-09-03 is week 142. Note that this is different since the calculation of week number does not reset every year.
I am putting up a small example dataset here:
data <- data.frame(
dates = c("2008-12-08", "2009-08-10", "2010-03-31", "2011-10-16", "2008-06-03", "2009-11-14" , "2010-05-05", "2011-09-03"))
data$date = as.Date(data$date)
Any help is appreciated
data$week_id = as.numeric(data$date - as.Date("2008-12-08")) %/% 7 + 1
This would take the day difference between the two dates and find the integer number of 7 days elapsed. I add one since we want the dates where zero weeks have elapsed since the start to be week 1 instead of week 0.
dates date week_id
1 2008-12-07 2008-12-07 0 # added for testing
2 2008-12-08 2008-12-08 1
3 2008-12-09 2008-12-09 1 # added for testing
4 2008-12-14 2008-12-14 1 # added for testing
5 2008-12-15 2008-12-15 2 # added for testing
6 2009-08-10 2009-08-10 36
7 2010-03-31 2010-03-31 69
8 2011-10-16 2011-10-16 149
9 2008-06-03 2008-06-03 -26
10 2009-11-14 2009-11-14 49
11 2010-05-05 2010-05-05 74
12 2011-09-03 2011-09-03 143
I have this dataframe I am working with.
data <- data.frame(id = c(123,124,125,126,127,128,129,130),
date = c("10/7/2021","10/6/2021","9/13/2021","10/18/2021","8/12/2021","9/6/2021","10/29/2021","9/6/2021"))
My goal is create a new column that tells me how many days have passed since that recorded date for each row. I'm trying to use this code but I keep getting NA days in my new column.
data %>%
select(id,date) %>%
mutate("days_since" = as.Date(Sys.Date()) - as.Date(date,format="%Y-%m-%d"))
id date days_since
1 123 10/7/2021 NA days
2 124 10/6/2021 NA days
3 125 9/13/2021 NA days
4 126 10/18/2021 NA days
5 127 8/12/2021 NA days
6 128 9/6/2021 NA days
7 129 10/29/2021 NA days
8 130 9/6/2021 NA days
What am I doing wrong? Thank you for any feedback.
We can use the lubridate package. It makes type conversion and operations with dates much easier.
In your code, the as.Date(date) step was problematic because the format was wrong.
library(dplyr)
library(lubridate)
data %>% mutate("days_since" = Sys.Date() - mdy(date))
id date days_since
1 123 10/7/2021 28
2 124 10/6/2021 29
3 125 9/13/2021 22
4 126 10/18/2021 17
5 127 8/12/2021 23
6 128 9/6/2021 29
7 129 10/29/2021 6
8 130 9/6/2021 29
Thanks, #Karthik S for the simplification
it is also easily done, using base r and a simple "-". This gives back the difference in days:
data <- data.frame(id = c(123,124,125,126,127,128,129,130),
date = c("2021-10-10","2021-10-06","2021-09-13","2021-10-18","2021-08-12","2021-09-06","2021-10-29","2021-09-06"))
data$date <- as.Date(data$date)
data$sys_date <- Sys.Date()
data$sysDate_to_date <- data$sys_date -data$date
I want to figure out when exactly the number of calls sharply increased.
Here is my original code:
plot(breaks, cumfreq0, main="Cumulative percentage of calls happened in NOV.7th", xlab="time", ylab = "cumulative percentage of calls", sub = "(each dot represents a single period of time on Nov.7th)")
but I don't think the time scale on the x-axis is specific enough.
How can I change it?
I tried some times as shown here but it seems that codes does fit time object.
Many many thanks for any help
Please see below, i just replicated your example with dummy data
> df
# A tibble: 55 x 2
datetime Freq
<dttm> <int>
1 2018-11-01 12:41:57 215
2 2018-11-01 12:41:58 163
3 2018-11-01 12:47:06 225
4 2018-11-01 12:51:00 69
5 2018-11-01 12:57:37 203
6 2018-11-01 12:57:38 248
7 2018-11-01 12:57:38 58
8 2018-11-01 13:29:15 179
9 2018-11-01 13:37:45 233
10 2018-11-01 14:24:43 150
# ... with 45 more rows
And the code to kind of plot you are expecting with x-axis as timestamp and you can give whichever format you want
plot(df$datetime,df$Freq,xaxt="n")
axis.POSIXct(1, at=df$datetime, labels=format(df$datetime, "%m/%d/%Y %H:%M:%S"))
Here, i have a data set with Start date and End Date and the usages. I have calculated the number of Days between these two days and got the daily usages. (I am okay with one flat usages for each day for now).
Now, what i want to achieve is the sum of the usage for each day in those TIME-FRAME FOR month of June. For example, the first case will be just the Daily_usage
START_DATE END_DATE x DAYS DAILY_USAGE
1 2015-05-01 2015-06-01 261605.00 32 8175.156250
And, for 2nd, i want to the add the Usage 3905 to June 1st, and also to June 2nd because it spans in both June 1st and June 2nd.
2015-05-04 2015-06-02 117159.00 30 3905.3000000
I want to continue doing this for all 387 rows and at the end get the sum of Usages for each day. And,I do not know how to do this for hundreds of records.
This is what my datasets looks right now:
str(YYY)
'data.frame': 387 obs. of 5 variables:
$ START_DATE : Date, format: "2015-05-01" "2015-05-04" "2015-05-11" "2015- 05-13" ...
$ END_DATE : Date, format: "2015-06-01" "2015-06-01" "2015-06-01" "2015-06-01" ...
$ x : num 261605 1380796 183 103 489 ...
$ DAYS : num 32 29 22 20 19 12 1 34 30 29 ...
$ DAILY_USAGE: num 8175.16 47613.66 8.32 5.13 25.74 ...
Also, the header.
START_DATE END_DATE x DAYS DAILY_USAGE
1 2015-05-01 2015-06-01 261605.00 32 8175.1562500
2 2015-05-04 2015-06-01 1380796.00 29 47613.6551724
6 2015-05-21 2015-06-01 1392.00 12 116.0000000
7 2015-06-01 2015-06-01 2503.00 1 2503.0000000
8 2015-04-30 2015-06-02 0.00 34 0.0000000
9 2015-05-04 2015-06-02 117159.00 30 3905.3000000
10 2015-05-05 2015-06-02 193334.00 29 6666.6896552
13 2015-05-04 2015-06-03 630.00 31 20.3225806
and so on........
Example of data sets and Results
I will call this data set. EXAMPLE1 (For 3 days, mocked up data)
START_DATE END_DATE x DAYS DAILY_USAGE
5/1/2015 6/1/2015 261605 32 8175.15625
5/4/2015 6/1/2015 1380796 29 47613.65517
5/11/2015 6/1/2015 183 22 8.318181818
4/30/2015 6/2/2015 0 34 0
5/20/2015 6/2/2015 70 14 5
6/1/2015 6/2/2015 569 2 284.5
6/1/2015 6/3/2015 582 3 194
6/2/2015 6/3/2015 6 2 3
For the above examples, answer should be like this
DAY USAGE
6/1/2015 56280.6296
6/2/2015 486.5
6/3/2015 197
HOW?
In Example 1, for June 1st, i have added all the rows of usages except the last row usage because the last row doesn't include the the date 06/01 in time-frame. It starts in 06/02 and ends in 06/03.
To get June 2nd, i have added all the usages from Row 4 to 8 because June 2nd is between all of those start and end dates.
For June 3rd, i have only added, Last two rows to get 197.
So, where to sum, depends on the time-frame of Start & End_date.
Hope this helps!
There might be a easy trick to do this than to write 400 lines of If else statement.
Thank you again for your time!!
-Gyve
library(lubridate)
indx <- lapply(unique(mdy(df[,2])), '%within%', interval(mdy(df[,1]), mdy(df[,2])))
cbind.data.frame(DAY=unique(df$END_DATE),
USAGE=unlist(lapply(indx, function(x) sum(df$DAILY_USAGE[x]))))
# DAY USAGE
# 1 6/1/2015 56280.63
# 2 6/2/2015 486.50
# 3 6/3/2015 197.00
Explanation
We can expand it to explain what is happening:
indx <- lapply(unique(mdy(df[,2])), '%within%', interval(mdy(df[,1]), mdy(df[,2])))
The unique end dates are tested to be within the range days in the first and second columns. mdy is a quick way to convert to POSIXct with lubridate. The operator %within% tests a date against an interval. We created intervals with interval('col1', 'col2'). This creates an index that we can subset the data by.
In our final data frame,
cbind.data.frame(DAY=unique(df$END_DATE),
creates the first column of dates.
And,
USAGE=unlist(lapply(indx, function(x) sum(df$DAILY_USAGE[x])))
takes the sum of df$DAILY_USAGE by the index that we created.
This is probably a very simple question that has been asked already but..
I have a data frame that I have constructed from a CSV file generated in excel. The observations are not homogeneously sampled, i.e they are for "On Peak" times of electricity usage. That means they exclude different days each year. I have 20 years of data (1993-2012) and am running both non Robust and Robust LOESS to extract seasonal and linear trends.
After the decomposition has been done, I want to focus only on the observations from June through September.
How can I create a new data frame of just those results?
Sorry about the formatting, too.
Date MaxLoad TMAX
1 1993-01-02 2321 118.6667
2 1993-01-04 2692 148.0000
3 1993-01-05 2539 176.0000
4 1993-01-06 2545 172.3333
5 1993-01-07 2517 177.6667
6 1993-01-08 2438 157.3333
7 1993-01-09 2302 152.0000
8 1993-01-11 2553 144.3333
9 1993-01-12 2666 146.3333
10 1993-01-13 2472 177.6667
As Joran notes, you don't need anything other than base R:
## Reproducible data
df <-
data.frame(Date = seq(as.Date("2009-03-15"), as.Date("2011-03-15"), by="month"),
MaxLoad = floor(runif(25,2000,3000)), TMAX=runif(25,100,200))
## One option
df[months(df$Date) %in% month.name[6:9],]
# Date MaxLoad TMAX
# 4 2009-06-15 2160 188.4607
# 5 2009-07-15 2151 164.3946
# 6 2009-08-15 2694 110.4399
# 7 2009-09-15 2460 150.4076
# 16 2010-06-15 2638 178.8341
# 17 2010-07-15 2246 131.3283
# 18 2010-08-15 2483 112.2635
# 19 2010-09-15 2174 160.9724
## Another option: strftime() will be more _generally_ useful than months()
df[as.numeric(strftime(df$Date, "%m")) %in% 6:9,]