Here, i have a data set with Start date and End Date and the usages. I have calculated the number of Days between these two days and got the daily usages. (I am okay with one flat usages for each day for now).
Now, what i want to achieve is the sum of the usage for each day in those TIME-FRAME FOR month of June. For example, the first case will be just the Daily_usage
START_DATE END_DATE x DAYS DAILY_USAGE
1 2015-05-01 2015-06-01 261605.00 32 8175.156250
And, for 2nd, i want to the add the Usage 3905 to June 1st, and also to June 2nd because it spans in both June 1st and June 2nd.
2015-05-04 2015-06-02 117159.00 30 3905.3000000
I want to continue doing this for all 387 rows and at the end get the sum of Usages for each day. And,I do not know how to do this for hundreds of records.
This is what my datasets looks right now:
str(YYY)
'data.frame': 387 obs. of 5 variables:
$ START_DATE : Date, format: "2015-05-01" "2015-05-04" "2015-05-11" "2015- 05-13" ...
$ END_DATE : Date, format: "2015-06-01" "2015-06-01" "2015-06-01" "2015-06-01" ...
$ x : num 261605 1380796 183 103 489 ...
$ DAYS : num 32 29 22 20 19 12 1 34 30 29 ...
$ DAILY_USAGE: num 8175.16 47613.66 8.32 5.13 25.74 ...
Also, the header.
START_DATE END_DATE x DAYS DAILY_USAGE
1 2015-05-01 2015-06-01 261605.00 32 8175.1562500
2 2015-05-04 2015-06-01 1380796.00 29 47613.6551724
6 2015-05-21 2015-06-01 1392.00 12 116.0000000
7 2015-06-01 2015-06-01 2503.00 1 2503.0000000
8 2015-04-30 2015-06-02 0.00 34 0.0000000
9 2015-05-04 2015-06-02 117159.00 30 3905.3000000
10 2015-05-05 2015-06-02 193334.00 29 6666.6896552
13 2015-05-04 2015-06-03 630.00 31 20.3225806
and so on........
Example of data sets and Results
I will call this data set. EXAMPLE1 (For 3 days, mocked up data)
START_DATE END_DATE x DAYS DAILY_USAGE
5/1/2015 6/1/2015 261605 32 8175.15625
5/4/2015 6/1/2015 1380796 29 47613.65517
5/11/2015 6/1/2015 183 22 8.318181818
4/30/2015 6/2/2015 0 34 0
5/20/2015 6/2/2015 70 14 5
6/1/2015 6/2/2015 569 2 284.5
6/1/2015 6/3/2015 582 3 194
6/2/2015 6/3/2015 6 2 3
For the above examples, answer should be like this
DAY USAGE
6/1/2015 56280.6296
6/2/2015 486.5
6/3/2015 197
HOW?
In Example 1, for June 1st, i have added all the rows of usages except the last row usage because the last row doesn't include the the date 06/01 in time-frame. It starts in 06/02 and ends in 06/03.
To get June 2nd, i have added all the usages from Row 4 to 8 because June 2nd is between all of those start and end dates.
For June 3rd, i have only added, Last two rows to get 197.
So, where to sum, depends on the time-frame of Start & End_date.
Hope this helps!
There might be a easy trick to do this than to write 400 lines of If else statement.
Thank you again for your time!!
-Gyve
library(lubridate)
indx <- lapply(unique(mdy(df[,2])), '%within%', interval(mdy(df[,1]), mdy(df[,2])))
cbind.data.frame(DAY=unique(df$END_DATE),
USAGE=unlist(lapply(indx, function(x) sum(df$DAILY_USAGE[x]))))
# DAY USAGE
# 1 6/1/2015 56280.63
# 2 6/2/2015 486.50
# 3 6/3/2015 197.00
Explanation
We can expand it to explain what is happening:
indx <- lapply(unique(mdy(df[,2])), '%within%', interval(mdy(df[,1]), mdy(df[,2])))
The unique end dates are tested to be within the range days in the first and second columns. mdy is a quick way to convert to POSIXct with lubridate. The operator %within% tests a date against an interval. We created intervals with interval('col1', 'col2'). This creates an index that we can subset the data by.
In our final data frame,
cbind.data.frame(DAY=unique(df$END_DATE),
creates the first column of dates.
And,
USAGE=unlist(lapply(indx, function(x) sum(df$DAILY_USAGE[x])))
takes the sum of df$DAILY_USAGE by the index that we created.
Related
I have a dataset where the data is reported by week and year like: YYWW. I have split it into to columns: Year and Week.
I need to get a date from the week: Week_start_date. My weeks start at mondays, so I would like to get the monday and sunday date from each week.
ID
YYWW
year
week
Week_start_date
Week_end_date
1
1504
2015
04
?
?
2
1651
2016
51
?
?
3
1251
2012
51
?
?
4
1447
2014
47
?
?
How do I extract the week start date from just a week number and year?
I've looked at several threads at SO, but haven't found a solution yet.
I have tried looking at different threads, but encounters problems using their solutions. Most seaches for "convert week number and year to date" on google and SO returns the opposite: Getting a weeknumber from a date. This guy answered by Vince, have maybe some similar issues, but I can't get the code to do the job: https://communities.sas.com/t5/SAS-Programming/Converting-week-number-to-start-date/td-p/106456
Use INTNX() with the WEEK interval and increment from the first of the year.
Use +1 to get Monday/Sunday dates.
You may need to tweak to match the dates you need.
data have;
infile cards dlm='09'x;
input ID $ YYWW year week ;
format year 8. week z2.;
cards;
1 1504 2015 04
2 1651 2016 51
3 1251 2012 51
4 1447 2014 47
;;;;
data want;
set have;
week_start = intnx('week', mdy(1, 1, year), week, 'b')+1;
week_end = intnx('week', mdy(1, 1, year), week, 'e')+1;
format week_: date9.;
run;
Use one of the WEEK... informats. But you will need to insert the letter W between the YEAR and WEEK number.
data have;
input ID $ YYWW year week ;
cards;
1 1504 2015 04
2 1651 2016 51
3 1251 2012 51
4 1447 2014 47
;;;;
data want;
set have;
week_start=input(cats(year,'W',put(week,Z2.)),weekv.);
week_end=week_start+6;
format week_: yymmdd10.;
run;
Results
Obs ID YYWW year week week_start week_end
1 1 1504 2015 4 2015-01-19 2015-01-25
2 2 1651 2016 51 2016-12-19 2016-12-25
3 3 1251 2012 51 2012-12-17 2012-12-23
4 4 1447 2014 47 2014-11-17 2014-11-23
This probably seems straightforward, but I am pretty stumped.
I have a set of dates ~ August 1 of each year and need to sum sales by week number. The earliest date is 2008-12-08 (YYYY-MM-DD). I need to create a "week_id" field where week #1 begins on 2008-12-08. And the date 2011-09-03 is week 142. Note that this is different since the calculation of week number does not reset every year.
I am putting up a small example dataset here:
data <- data.frame(
dates = c("2008-12-08", "2009-08-10", "2010-03-31", "2011-10-16", "2008-06-03", "2009-11-14" , "2010-05-05", "2011-09-03"))
data$date = as.Date(data$date)
Any help is appreciated
data$week_id = as.numeric(data$date - as.Date("2008-12-08")) %/% 7 + 1
This would take the day difference between the two dates and find the integer number of 7 days elapsed. I add one since we want the dates where zero weeks have elapsed since the start to be week 1 instead of week 0.
dates date week_id
1 2008-12-07 2008-12-07 0 # added for testing
2 2008-12-08 2008-12-08 1
3 2008-12-09 2008-12-09 1 # added for testing
4 2008-12-14 2008-12-14 1 # added for testing
5 2008-12-15 2008-12-15 2 # added for testing
6 2009-08-10 2009-08-10 36
7 2010-03-31 2010-03-31 69
8 2011-10-16 2011-10-16 149
9 2008-06-03 2008-06-03 -26
10 2009-11-14 2009-11-14 49
11 2010-05-05 2010-05-05 74
12 2011-09-03 2011-09-03 143
Date Sales
3/11/2017 1
3/12/2017 0
3/13/2017 40
3/14/2017 47
3/15/2017 83
3/16/2017 62
3/17/2017 13
3/18/2017 58
3/19/2017 27
3/20/2017 17
3/21/2017 71
3/22/2017 76
3/23/2017 8
3/24/2017 13
3/25/2017 97
3/26/2017 58
3/27/2017 80
3/28/2017 77
3/29/2017 31
3/30/2017 78
3/31/2017 0
4/1/2017 40
4/2/2017 58
4/3/2017 32
4/4/2017 31
4/5/2017 90
4/6/2017 35
4/7/2017 88
4/8/2017 16
4/9/2017 72
4/10/2017 39
4/11/2017 8
4/12/2017 88
4/13/2017 93
4/14/2017 57
4/15/2017 23
4/16/2017 15
4/17/2017 6
4/18/2017 91
4/19/2017 87
4/20/2017 44
Here current date is 20/04/2017, My question is grouping data from 19/04/2017 to 11/03/2017 with 4 equal parts with summation sales in r programming?
Eg :
library("xts")
ep <- endpoints(data, on = 'days', k = 4)
period.apply(data,ep,sum)
it's not working. However, its taking start date to current date but I need to geatherd data from yestderday (19/4/2017) to start date and split into 4 equal parts.
kindly anyone guide me soon.
Thank you
Base R has a function cut.Date() which is built for the purpose.
However, the question is not fully clear on what the OP intends. My understanding of the requirements supplied in Q and additional comment is:
Take the sales data per day in Book1 but leave out the current day, i.e., use only completed days.
Group the data in four equal parts, i.e., four periods containing an equal number of days. (Note that the title of the Q and the attempt to use xts::endpoint() with k = 4 indicates that the OP might have a different intention to group the data in periods of four days length each.)
Summarize the sales figures by period
For the sake of brevity, data.table is used here for data manipulation and aggregation, lubridate for date manipulation
library(data.table)
library(lubridate)
# coerce to data.table, convert Date column from character to class Date,
# exclude the actual date
temp <- setDT(Book1)[, Date := mdy(Book1$Date)][Date != today()]
# cut the date range in four parts
temp[, start_date_of_period := cut.Date(Date, 4)]
temp
# Date Sales start_date_of_period
# 1: 2017-03-11 1 2017-03-11
# 2: 2017-03-12 0 2017-03-11
# 3: 2017-03-13 40 2017-03-11
# ...
#38: 2017-04-17 6 2017-04-10
#39: 2017-04-18 91 2017-04-10
#40: 2017-04-19 87 2017-04-10
# Date Sales start_date_of_period
# aggregate sales by period
temp[, .(n_days = .N, total_sales = sum(Sales)), by = start_date_of_period]
# start_date_of_period n_days total_sales
#1: 2017-03-11 10 348
#2: 2017-03-21 10 589
#3: 2017-03-31 10 462
#4: 2017-04-10 10 507
Thanks to chaining, this can be put together in one statement without using a temporary variable:
setDT(Book1)[, Date := mdy(Book1$Date)][Date != today()][
, start_date_of_period := cut.Date(Date, 4)][
, .(n_days = .N, total_sales = sum(Sales)), by = start_date_of_period]
Note If you want to reproduce the result in the future, you will have to replace the call to today() which excludes the current day by mdy("4/20/2017") which is the last day in the sample data set supplied by the OP.
This is probably a very simple question that has been asked already but..
I have a data frame that I have constructed from a CSV file generated in excel. The observations are not homogeneously sampled, i.e they are for "On Peak" times of electricity usage. That means they exclude different days each year. I have 20 years of data (1993-2012) and am running both non Robust and Robust LOESS to extract seasonal and linear trends.
After the decomposition has been done, I want to focus only on the observations from June through September.
How can I create a new data frame of just those results?
Sorry about the formatting, too.
Date MaxLoad TMAX
1 1993-01-02 2321 118.6667
2 1993-01-04 2692 148.0000
3 1993-01-05 2539 176.0000
4 1993-01-06 2545 172.3333
5 1993-01-07 2517 177.6667
6 1993-01-08 2438 157.3333
7 1993-01-09 2302 152.0000
8 1993-01-11 2553 144.3333
9 1993-01-12 2666 146.3333
10 1993-01-13 2472 177.6667
As Joran notes, you don't need anything other than base R:
## Reproducible data
df <-
data.frame(Date = seq(as.Date("2009-03-15"), as.Date("2011-03-15"), by="month"),
MaxLoad = floor(runif(25,2000,3000)), TMAX=runif(25,100,200))
## One option
df[months(df$Date) %in% month.name[6:9],]
# Date MaxLoad TMAX
# 4 2009-06-15 2160 188.4607
# 5 2009-07-15 2151 164.3946
# 6 2009-08-15 2694 110.4399
# 7 2009-09-15 2460 150.4076
# 16 2010-06-15 2638 178.8341
# 17 2010-07-15 2246 131.3283
# 18 2010-08-15 2483 112.2635
# 19 2010-09-15 2174 160.9724
## Another option: strftime() will be more _generally_ useful than months()
df[as.numeric(strftime(df$Date, "%m")) %in% 6:9,]
I have a data frame with start dates and end dates, along with the number of people registered for an event. I would like to calculate the number of hours each party is present for within a specific timeframe (e.g., 07:00 - 17:00)
If I use the following example data.frame...
d <- data.frame(startDate = c(as.POSIXct("2011-06-04 08:00:00"), as.POSIXct("2011-06-03 08:00:00"),
as.POSIXct("2011-09-12 10:00:00")),
endDate = c(as.POSIXct("2011-06-06 11:00:00"), as.POSIXct("2011-06-04 11:00:00"),
as.POSIXct("2011-09-12 18:00:00")),
partysize = c(124,442,323))
open <- "07:00"
close <- "17:00"
I would like my result set to look something like this:
day numhours partysize
2011-06-04 9 124
2011-06-05 10 124
2011-06-06 4 124
2011-06-03 9 442
2011-06-04 4 442
2011-09-12 7 323
note: numhours is the number of hours the date was included between the open and close times
Thanks in advance,
--JT
Sorry its very messy and I used 7 and 17 instead of your open and close
app.days<-mapply(function(x,y){x+y*60*60*24},as.POSIXct(format(d$startDate,"%Y-%m-%d")),lapply(floor(-(d$startDate-d$endDate)/24),seq,from=0))
start.date<-mapply(function(x,y){pmax(x+7*60*60,y)},app.days,d$startDate)
end.date<-mapply(function(x,y){pmin(x+17*60*60,y)},app.days,d$endDate)
app.hours<-mapply(function(x,y){as.numeric(x-y)},end.date,start.date)
res<-mapply(function(x,y,z){data.frame(day=as.Date(x),numhours=y,partysize=z)},app.days,app.hours,as.list(d$partysize))
res1<-data.frame(day=as.Date(unlist(res[1,]),origin="1970-01-01"),numhours=unlist(res[2,]),partysize=unlist(res[3,]))
> res1
day numhours partysize
1 2011-06-04 9 124
2 2011-06-05 10 124
3 2011-06-06 4 124
4 2011-06-03 9 442
5 2011-06-04 4 442
6 2011-09-12 7 323
Basically we identify how many days each party size stays for. For a given day we find the applicable open and close. Then we subtract open from close. The dataframe is eventually formed but it could probably have been created in the res<- step.....