SQL Server: Count integers with high precision - count

I'm struggling with a query in SQL Server. I have the following columns:
Member Number; Dependant Number; Provider Number; Service Date
I concatenate the above to create a unique ID, see extract below:
MbrNo DepNo PracticeNo ServiceDt UniqueProviderConsults
100001077 1 243264 2014-07-02 00:00:00.000 243264100001077141820
100001077 1 243264 2014-07-02 00:00:00.000 243264100001077141820
100001077 1 243264 2014-07-02 00:00:00.000 243264100001077141820
100001077 1 243264 2014-07-02 00:00:00.000 243264100001077141820
100001077 1 243264 2014-07-02 00:00:00.000 243264100001077141820
100001077 1 243264 2014-07-02 00:00:00.000 243264100001077141820
100001077 1 243264 2014-07-02 00:00:00.000 243264100001077141820
100001077 0 243264 2014-07-02 00:00:00.000 243264100001077041820
100001077 0 243264 2014-07-02 00:00:00.000 243264100001077041820
100001077 0 243264 2014-07-02 00:00:00.000 243264100001077041820
100001077 0 243264 2014-07-02 00:00:00.000 243264100001077041820
100001077 0 243264 2014-07-02 00:00:00.000 243264100001077041820
100001077 0 243264 2014-07-02 00:00:00.000 243264100001077041820
100001077 0 243264 2014-07-07 00:00:00.000 243264100001077041825
100000838 1 243264 2014-07-09 00:00:00.000 243264100000838141827
100000838 5 243264 2014-07-14 00:00:00.000 243264100000838541832
100000838 3 243264 2014-07-17 00:00:00.000 243264100000838341835
100000838 0 243264 2014-07-17 00:00:00.000 243264100000838041835
100000838 5 243264 2014-07-18 00:00:00.000 243264100000838541836
100001077 0 243264 2014-07-14 00:00:00.000 243264100001077041832
100001077 0 243264 2014-07-14 00:00:00.000 243264100001077041832
100001077 0 243264 2014-07-14 00:00:00.000 243264100001077041832
100001077 0 243264 2014-07-14 00:00:00.000 243264100001077041832
100001077 0 243264 2014-07-14 00:00:00.000 243264100001077041832
100001077 0 243264 2014-07-14 00:00:00.000 243264100001077041832
100001480 1 243264 2014-07-17 00:00:00.000 243264100001480141835
My Unique ID is a numeric(30) data type. I then want to count the how many times the Unique ID appears. Using count_big, I do not get any overflow warnings but it still does not give me the right count. I think it is because a precision of 30 is too high so it cuts it off when counting. Is there another alternative?Unfortunately the components above are the minimum number of components to define the Unique ID. I have tried taking the log of my unique ID but the count is also incorrect.
Can someone please help:)
Code:
ALTER TABLE [Claims Edited] ADD [UniqueProviderConsults] NUMERIC(30)
GO
UPDATE [Claims Edited] SET [UniqueProviderConsults] = CONCAT(CONVERT(DECIMAL(38,0),ProviderNo),CONVERT(DECIMAL(38,0),MbrNo),CONVERT(VARCHAR(MAX),DepNo),CONVERT(DECIMAL(38,0),ServiceDt))
GO
Select PracticeNo,
count_big(DISTINCT case when [ServiceMth]='2014-06-30' THEN [UniqueProviderConsults] else 0 end) as [Jun-14 Consults],
count_big(DISTINCT case when [ServiceMth]='2014-07-31' THEN [UniqueProviderConsults] else 0 end) as [Jul-14 Consults],
count_big(DISTINCT case when [ServiceMth]='2014-08-31' THEN [UniqueProviderConsults] else 0 end) as [Aug-14 Consults],
count_big(DISTINCT case when [ServiceMth]='2014-09-30' THEN [UniqueProviderConsults] else 0 end) as [Sep-14 Consults],
count_big(DISTINCT case when [ServiceMth]='2014-10-31' THEN [UniqueProviderConsults] else 0 end) as [Oct-14 Consults],
count_big(DISTINCT case when [ServiceMth]='2014-11-30' THEN [UniqueProviderConsults] else 0 end) as [Nov-14 Consults],
count_big(DISTINCT case when [ServiceMth]='2014-12-31' THEN [UniqueProviderConsults] else 0 end) as [Dec-14 Consults],
Into [EM Consultation Count temp]
from [EM Claims Edited]
Group by PracticeNo
Using the data from the extract above
Provider Number 243264
For the month of June 2014, there are no lines yet my code counts 1.
For the month of July 2014, there are 10 unique IDs, yet my code counts 11.

Related

RODBC No data available in table

I use RODBC to get data from sql
sql <- paste0("
with cte as (
Select *,datePart(WEEKDAY,Dt) as WeekDay,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY SaleCount) Over (partition by ItemRelation,
DocumentNum, DocumentYear) as PERCENTILE,
avg(SaleCount) over (Partition by ItemRelation,
DocumentNum, DocumentYear,datePart(WEEKDAY,Dt), IsPromo) as AVG_WeekDay
From [Action].[dbo].[promo_data_copy])
Update a
Set SaleCount = cte.AVG_WeekDay
From CTE
join [Action].[dbo].[promo_data_copy] a
on a.Dt = cte.dt
and a.ItemRelation=cte.ItemRelation
and a.DocumentNum = cte.DocumentNum
and a.DocumentYear = cte.DocumentYear
and a.ispromo = cte.ispromo
Where CTE.PERCENTILE < CTE.SaleCount
and datePart(WEEKDAY,CTE.Dt) < 5
and CTE.ispromo = 0 ;")
df <- sqlQuery(dbHandle, sql)
View(df)
and df is empty dataset.
No data available in table
Can anobody help me understand, why the data wasn't return?
Edit
Dt ItemRelation SaleCount DocumentNum DocumentYear IsPromo
2017-10-12 00:00:00.000 13322 7 36 2017 0
2017-10-12 00:00:00.000 13322 35 4 2017 0
2017-10-12 00:00:00.000 158121 340 41 2017 0
2017-10-12 00:00:00.000 158122 260 41 2017 0
2017-10-13 00:00:00.000 13322 3 36 2017 0
2017-10-13 00:00:00.000 13322 31 4 2017 0
2017-10-13 00:00:00.000 158121 420 41 2017 0
2017-10-13 00:00:00.000 158122 380 41 2017 0
2017-10-14 00:00:00.000 11592 45 33 2017 0
2017-10-14 00:00:00.000 13189 135 33 2017 0
2017-10-14 00:00:00.000 13191 852 33 2017 0
2017-10-14 00:00:00.000 13322 1 36 2017 0
2017-10-14 00:00:00.000 13322 34 4 2017 0
2017-10-14 00:00:00.000 158121 360 41 2017 0
2017-10-14 00:00:00.000 158122 140 41 2017 0
here top 15 observations of table.So i expect that my query will return me this data.frame
I'm not sure about the percentile stuff; I'll leave it to you to get that part straightened out. Anyway, here is how I use R to query a database.
library(RODBC)
dbconnection <- odbcDriverConnect("Driver=ODBC Driver 11 for SQL Server;Server=Server_Name; Database=DB_Name;Uid=; Pwd=; trusted_connection=yes")
initdata <- sqlQuery(dbconnection,paste("select * from MyTable;"))
odbcClose(channel)
Here are a couple very useful resources for cross-reference.
http://stackoverflow.com/questions/15420999/rodbc-odbcdriverconnect-connection-error
https://andersspur.wordpress.com/2013/11/26/connect-r-to-sql-server-2012-and-14/

insert a new row in R based on time interval checking

ALL;
I just have a data file with two columns, one is time series, one is values. Normally, the time interval between tow rows is exact 5 mins,but sometimes it is larger than 5 mins
A sample is as below:
dd <- data.table(date = c("2015-07-01 00:00:00", "2015-07-01 00:05:00", "2015-07-01 00:20:00","2015-07-01 00:25:00","2015-07-01 00:30:00"),
value = c(9,1,10,12,0))
what i want to do is to check the time interval between two rows, when the time interval is larger than 5 mins, then insert a new row below with 0 value, so , the result could be :
date value
2015-07-01 00:00:00 9
2015-07-01 00:05:00 1
2015-07-01 00:10:00 0
2015-07-01 00:15:00 0
2015-07-01 00:20:00 10
2015-07-01 00:25:00 12
2015-07-01 00:30:00 0
any suggestion and idea is welcome :)
We can do a join after converting to 'date' to DateClass
dd[, date := as.POSIXct(date)][]
dd[dd[, .(date=seq(min(date), max(date), by = "5 min"))], on = 'date'
][is.na(value), value := 0][]
# date value
#1: 2015-07-01 00:00:00 9
#2: 2015-07-01 00:05:00 1
#3: 2015-07-01 00:10:00 0
#4: 2015-07-01 00:15:00 0
#5: 2015-07-01 00:20:00 10
#6: 2015-07-01 00:25:00 12
#7: 2015-07-01 00:30:00 0

xts::apply.weekly thinks Monday is the last day of the week

I have an R data.frame containing one value for every quarter of hour
Date A B
1 2015-11-02 00:00:00 0 0 //day start
2 2015-11-02 00:15:00 0 0
3 2015-11-02 00:30:00 0 0
4 2015-11-02 00:45:00 0 0
...
96 2015-11-02 23:45:00 0 0 //day end
97 2015-11-03 00:00:00 0 0 //new day
...
6 2016-03-23 01:15:00 0 0 //last record
I use xts to construct a time series
xtsA <- xts(data$A,data$Date)
by using apply.daily I get the result I expect
apply.daily(xtsA, sum)
Date A
1 2015-11-02 23:45:00 400
2 2015-11-03 23:45:00 400
3 2015-11-04 23:45:00 500
but apply.weekly seems to use Monday as last day of the week
Date A
19 2016-03-07 00:45:00 6500 //Monday
20 2016-03-14 00:45:00 5500 //Monday
21 2016-03-21 00:45:00 5000 //Monday
and I do not understand why it uses 00:45:00. Does anyone know?
Data is imported from CSV file the Date column looks like this:
data <- read.csv("...", header=TRUE)
Date A
1 151102 0000 0
...
The error is in the date time interpretation and using
data$Date <- as.POSIXct(strptime(data$Date, "%y%m%d %H%M"), tz = "GMT")
solves it, and now apply.weekly returns
Date A
1 2015-11-08 23:45:00 3500 //Sunday
2 2015-11-15 23:45:00 4000 //Sunday
...

cut by interval and aggregate over one month in R

I have the given data - all bike trips that started from a particular station over the month of October 2013. I'd like to count the amount of trips that occurred within ten-minute time intervals. There should be a total of 144 rows with a sum of all of the trips that occurred within that interval for the entire month. How would one cut the data.frame and then aggregate by interval (so that trips occurring between 00:00:01 and 00:10:00 are counted in the second row, between 00:10:01 and 00:20:00 are counted in the third row, and so on...)?
head(one.station)
tripduration starttime stoptime start.station.id start.station.name
59 803 2013-10-01 00:11:49 2013-10-01 00:25:12 521 8 Ave & W 31 St
208 445 2013-10-01 00:40:05 2013-10-01 00:47:30 521 8 Ave & W 31 St
359 643 2013-10-01 01:25:57 2013-10-01 01:36:40 521 8 Ave & W 31 St
635 388 2013-10-01 05:30:30 2013-10-01 05:36:58 521 8 Ave & W 31 St
661 314 2013-10-01 05:38:00 2013-10-01 05:43:14 521 8 Ave & W 31 St
768 477 2013-10-01 05:54:49 2013-10-01 06:02:46 521 8 Ave & W 31 St
start.station.latitude start.station.longitude end.station.id end.station.name
59 40.75045 -73.99481 2003 1 Ave & E 18 St
208 40.75045 -73.99481 505 6 Ave & W 33 St
359 40.75045 -73.99481 508 W 46 St & 11 Ave
635 40.75045 -73.99481 459 W 20 St & 11 Ave
661 40.75045 -73.99481 462 W 22 St & 10 Ave
768 40.75045 -73.99481 457 Broadway & W 58 St
end.station.latitude end.station.longitude bikeid usertype birth.year gender
59 40.73416 -73.98024 15139 Subscriber 1985 1
208 40.74901 -73.98848 20538 Subscriber 1990 2
359 40.76341 -73.99667 19935 Customer \\N 0
635 40.74674 -74.00776 14781 Subscriber 1955 1
661 40.74692 -74.00452 17976 Subscriber 1982 1
768 40.76695 -73.98169 19022 Subscriber 1973 1
So that the output looks like this
output
interval total_trips
1 00:00:00 0
2 00:10:00 1
3 00:20:00 2
4 00:30:00 3
5 00:40:00 4
Here it is using only start time:
library(lubridate)
library(dplyr)
tripduration <- floor(runif(6) * 1000)
start_times <- as.POSIXlt(
c("2013-10-01 00:11:49"
,"2013-10-01 00:40:05"
,"2013-10-01 01:25:57"
,"2013-10-01 05:30:30"
,"2013-10-01 05:38:00"
,"2013-10-01 05:54:49")
)
time_bucket <- start_times - minutes(minute(start_times) %% 10) - seconds(second(start_times))
df <- data.frame(tripduration, start_times, time_bucket)
summarized <- df %>%
group_by(time_bucket) %>%
summarize(trip_count = n())
summarized <- as.data.frame(summarized)
out_buckets <- data.frame(out_buckets = seq(as.POSIXlt("2013-10-01 00:00:00"), as.POSIXct("2013-10-01 06:0:00"), by = 600))
out <- left_join(out_buckets, summarized, by = c("out_buckets" = "time_bucket"))
out$trip_count[is.na(out$trip_count)] <- 0
out
out_buckets trip_count
1 2013-10-01 00:00:00 0
2 2013-10-01 00:10:00 1
3 2013-10-01 00:20:00 0
4 2013-10-01 00:30:00 0
5 2013-10-01 00:40:00 1
6 2013-10-01 00:50:00 0
7 2013-10-01 01:00:00 0
8 2013-10-01 01:10:00 0
9 2013-10-01 01:20:00 1
10 2013-10-01 01:30:00 0
11 2013-10-01 01:40:00 0
12 2013-10-01 01:50:00 0
13 2013-10-01 02:00:00 0
14 2013-10-01 02:10:00 0
15 2013-10-01 02:20:00 0
16 2013-10-01 02:30:00 0
17 2013-10-01 02:40:00 0
18 2013-10-01 02:50:00 0
19 2013-10-01 03:00:00 0
20 2013-10-01 03:10:00 0
21 2013-10-01 03:20:00 0
22 2013-10-01 03:30:00 0
23 2013-10-01 03:40:00 0
24 2013-10-01 03:50:00 0
25 2013-10-01 04:00:00 0
26 2013-10-01 04:10:00 0
27 2013-10-01 04:20:00 0
28 2013-10-01 04:30:00 0
29 2013-10-01 04:40:00 0
30 2013-10-01 04:50:00 0
31 2013-10-01 05:00:00 0
32 2013-10-01 05:10:00 0
33 2013-10-01 05:20:00 0
34 2013-10-01 05:30:00 2
35 2013-10-01 05:40:00 0
36 2013-10-01 05:50:00 1
37 2013-10-01 06:00:00 0
The lubridate library can provide one solution. It has a nice function for interval overlap logic. The below uses lapply to loop through the intervals provided in the data then buckets them accordingly.
library(lubridate)
start_times <- as.POSIXlt(
c("2013-10-01 00:11:49"
,"2013-10-01 00:40:05"
,"2013-10-01 01:25:57"
,"2013-10-01 05:30:30"
,"2013-10-01 05:38:00"
,"2013-10-01 05:54:49")
)
stop_times <- as.POSIXlt(
c("2013-10-01 00:25:12"
,"2013-10-01 00:47:30"
,"2013-10-01 01:36:40"
,"2013-10-01 05:36:58"
,"2013-10-01 05:43:14"
,"2013-10-01 06:02:46")
)
start_bucket <- seq(as.POSIXct("2013-10-01 00:00:00"), as.POSIXct("2013-10-01 06:0:00"), by = 600)
end_bucket <- start_bucket + 600
bucket_interval <- interval(start_bucket, end_bucket)
data_interval <- interval(start_times, stop_times)
int_list <- lapply(data_interval, function(x) ifelse(int_overlaps(x, bucket_interval),1,0))
rides_per_bucket <- rowSums(do.call(cbind, int_list))
out_df <- data.frame(bucket_interval, rides_per_bucket)
out_df
bucket_interval rides_per_bucket
1 2013-10-01 00:00:00 PDT--2013-10-01 00:10:00 PDT 0
2 2013-10-01 00:10:00 PDT--2013-10-01 00:20:00 PDT 1
3 2013-10-01 00:20:00 PDT--2013-10-01 00:30:00 PDT 1
4 2013-10-01 00:30:00 PDT--2013-10-01 00:40:00 PDT 0
5 2013-10-01 00:40:00 PDT--2013-10-01 00:50:00 PDT 1
6 2013-10-01 00:50:00 PDT--2013-10-01 01:00:00 PDT 0
7 2013-10-01 01:00:00 PDT--2013-10-01 01:10:00 PDT 0
8 2013-10-01 01:10:00 PDT--2013-10-01 01:20:00 PDT 0
9 2013-10-01 01:20:00 PDT--2013-10-01 01:30:00 PDT 1
10 2013-10-01 01:30:00 PDT--2013-10-01 01:40:00 PDT 1
11 2013-10-01 01:40:00 PDT--2013-10-01 01:50:00 PDT 0
12 2013-10-01 01:50:00 PDT--2013-10-01 02:00:00 PDT 0
13 2013-10-01 02:00:00 PDT--2013-10-01 02:10:00 PDT 0
14 2013-10-01 02:10:00 PDT--2013-10-01 02:20:00 PDT 0
15 2013-10-01 02:20:00 PDT--2013-10-01 02:30:00 PDT 0
16 2013-10-01 02:30:00 PDT--2013-10-01 02:40:00 PDT 0
17 2013-10-01 02:40:00 PDT--2013-10-01 02:50:00 PDT 0
18 2013-10-01 02:50:00 PDT--2013-10-01 03:00:00 PDT 0
19 2013-10-01 03:00:00 PDT--2013-10-01 03:10:00 PDT 0
20 2013-10-01 03:10:00 PDT--2013-10-01 03:20:00 PDT 0
21 2013-10-01 03:20:00 PDT--2013-10-01 03:30:00 PDT 0
22 2013-10-01 03:30:00 PDT--2013-10-01 03:40:00 PDT 0
23 2013-10-01 03:40:00 PDT--2013-10-01 03:50:00 PDT 0
24 2013-10-01 03:50:00 PDT--2013-10-01 04:00:00 PDT 0
25 2013-10-01 04:00:00 PDT--2013-10-01 04:10:00 PDT 0
26 2013-10-01 04:10:00 PDT--2013-10-01 04:20:00 PDT 0
27 2013-10-01 04:20:00 PDT--2013-10-01 04:30:00 PDT 0
28 2013-10-01 04:30:00 PDT--2013-10-01 04:40:00 PDT 0
29 2013-10-01 04:40:00 PDT--2013-10-01 04:50:00 PDT 0
30 2013-10-01 04:50:00 PDT--2013-10-01 05:00:00 PDT 0
31 2013-10-01 05:00:00 PDT--2013-10-01 05:10:00 PDT 0
32 2013-10-01 05:10:00 PDT--2013-10-01 05:20:00 PDT 0
33 2013-10-01 05:20:00 PDT--2013-10-01 05:30:00 PDT 0
34 2013-10-01 05:30:00 PDT--2013-10-01 05:40:00 PDT 2
35 2013-10-01 05:40:00 PDT--2013-10-01 05:50:00 PDT 1
36 2013-10-01 05:50:00 PDT--2013-10-01 06:00:00 PDT 1
37 2013-10-01 06:00:00 PDT--2013-10-01 06:10:00 PDT 1

rollsum with fixed dates

I have a data frame that looks like this:
user_id date price
2375 2012/12/12 00:00:00.000 47.900000
2375 2013/01/16 00:00:00.000 47.900000
2375 2013/01/16 00:00:00.000 47.900000
2375 2013/05/08 00:00:00.000 47.900000
2375 2013/06/01 00:00:00.000 47.900000
2375 2013/10/02 00:00:00.000 26.500000
2375 2014/01/22 00:00:00.000 47.900000
2375 2014/03/21 00:00:00.000 47.900000
2375 2014/05/24 00:00:00.000 47.900000
2375 2015/04/11 00:00:00.000 47.900000
7419 2012/12/12 00:00:00.000 7.174977
7419 2013/01/02 00:00:00.000 27.500000
7419 2013/01/18 00:00:00.000 22.901482
7419 2013/02/08 00:00:00.000 27.500000
7419 2013/03/06 00:00:00.000 8.200000
7419 2013/04/03 00:00:00.000 22.901482
7419 2013/04/03 00:00:00.000 8.200000
7419 2013/04/03 00:00:00.000 6.900000
7419 2013/04/17 00:00:00.000 7.500000
7419 2013/04/17 00:00:00.000 7.500000
7419 2013/05/23 00:00:00.000 7.500000
7419 2013/06/07 00:00:00.000 27.500000
7419 2013/06/07 00:00:00.000 7.500000
7419 2013/06/07 00:00:00.000 7.500000
7419 2013/06/07 00:00:00.000 5.829188
7419 2013/07/10 00:00:00.000 27.500000
7419 2013/08/21 00:00:00.000 7.500000
7419 2013/08/21 00:00:00.000 27.500000
7419 2013/09/06 00:00:00.000 27.500000
7419 2013/12/27 00:00:00.000 7.500000
7419 2014/01/10 00:00:00.000 27.500000
7419 2014/02/16 00:00:00.000 27.500000
7419 2014/05/14 00:00:00.000 41.900000
7419 2014/07/03 00:00:00.000 26.500000
7419 2014/09/26 00:00:00.000 26.500000
7419 2014/09/26 00:00:00.000 7.500000
7419 2014/10/22 00:00:00.000 27.500000
7419 2014/11/15 00:00:00.000 6.900000
7419 2014/11/27 00:00:00.000 26.500000
7419 2014/12/12 00:00:00.000 40.900000
7419 2015/01/14 00:00:00.000 27.200000
7419 2015/02/24 00:00:00.000 26.500000
7419 2015/03/17 00:00:00.000 40.900000
7419 2015/05/02 00:00:00.000 27.200000
7419 2015/05/02 00:00:00.000 26.500000
7419 2015/05/15 00:00:00.000 7.900000
7419 2015/05/20 00:00:00.000 27.500000
7419 2015/06/20 00:00:00.000 7.500000
7419 2015/06/26 00:00:00.000 7.500000
7419 2015/06/30 00:00:00.000 41.900000
7419 2015/07/16 00:00:00.000 78.500000
11860 2012/12/12 00:00:00.000 7.174977
11860 2012/12/12 00:00:00.000 21.500000
11860 2013/03/02 00:00:00.000 22.901482
11860 2013/03/02 00:00:00.000 8.200000
11860 2013/05/25 00:00:00.000 29.500000
11860 2013/05/25 00:00:00.000 7.500000
In reality, I have more than 40000 user_id. I want to calculate the sum of the previous 4 weeks (not counting the present week) of the price for each user. However, the date period is fixed, from 12/12/2012 to 22/09/2015. In order to avoid a loop for each user, I thought of something like
df <- df %>% group_by(user_id) %>%
mutate(price.lag1 = lag(prod_price, n = 1)) %>%
mutate(amount4weeks = rollsum(x=price, 4, align = "right", fill = NA))
However, it gives me an error, and it will only take as "date" the rows present in the data.
How can I give rollsum specific dates and/or how can I do what I want in a one-liner? My result should look like:
df$price4weeks = c(NA, 0.000000, 0.000000, 0.000000, 47.900000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, NA, 7.174977, 27.500000, 22.901482, 27.500000, 8.200000, 8.200000, 8.200000, 6.900000, 6.900000, 0.000000, 7.500000, 7.500000, 7.500000, 7.500000, 0.000000, 0.000000, 0.000000, 27.500000, 0.000000, 7.500000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 7.500000, 27.500000, 6.900000, 33.400000, 0.000000, 0.000000, 26.500000, 0.000000, 0.000000, 26.500000, 34.400000, 27.500000, 7.500000,15.000000, 56.900000, NA, NA, 0.000000, 0.000000, 0.000000, 0.000000)
Let me know if I am missing something in my explanation.
Thank you!
rollsum calculates the sum over a rolling k number of data points. To use dplyr with weeks, you could add a week_number column to your data and then calculate the rolling sum using sapply over week_number . The code could look like:
df <- mutate(df, week_number=cut.POSIXt(df$date, breaks="week", labels=FALSE))
df_new <- df %>% group_by(user_id) %>%
do(mutate(.,total_4wk=sapply(week_number, function(n) sum(.$price[between(.$week_number, n -4, n-1)],na.rm=TRUE))))

Resources