RODBC No data available in table - r

I use RODBC to get data from sql
sql <- paste0("
with cte as (
Select *,datePart(WEEKDAY,Dt) as WeekDay,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY SaleCount) Over (partition by ItemRelation,
DocumentNum, DocumentYear) as PERCENTILE,
avg(SaleCount) over (Partition by ItemRelation,
DocumentNum, DocumentYear,datePart(WEEKDAY,Dt), IsPromo) as AVG_WeekDay
From [Action].[dbo].[promo_data_copy])
Update a
Set SaleCount = cte.AVG_WeekDay
From CTE
join [Action].[dbo].[promo_data_copy] a
on a.Dt = cte.dt
and a.ItemRelation=cte.ItemRelation
and a.DocumentNum = cte.DocumentNum
and a.DocumentYear = cte.DocumentYear
and a.ispromo = cte.ispromo
Where CTE.PERCENTILE < CTE.SaleCount
and datePart(WEEKDAY,CTE.Dt) < 5
and CTE.ispromo = 0 ;")
df <- sqlQuery(dbHandle, sql)
View(df)
and df is empty dataset.
No data available in table
Can anobody help me understand, why the data wasn't return?
Edit
Dt ItemRelation SaleCount DocumentNum DocumentYear IsPromo
2017-10-12 00:00:00.000 13322 7 36 2017 0
2017-10-12 00:00:00.000 13322 35 4 2017 0
2017-10-12 00:00:00.000 158121 340 41 2017 0
2017-10-12 00:00:00.000 158122 260 41 2017 0
2017-10-13 00:00:00.000 13322 3 36 2017 0
2017-10-13 00:00:00.000 13322 31 4 2017 0
2017-10-13 00:00:00.000 158121 420 41 2017 0
2017-10-13 00:00:00.000 158122 380 41 2017 0
2017-10-14 00:00:00.000 11592 45 33 2017 0
2017-10-14 00:00:00.000 13189 135 33 2017 0
2017-10-14 00:00:00.000 13191 852 33 2017 0
2017-10-14 00:00:00.000 13322 1 36 2017 0
2017-10-14 00:00:00.000 13322 34 4 2017 0
2017-10-14 00:00:00.000 158121 360 41 2017 0
2017-10-14 00:00:00.000 158122 140 41 2017 0
here top 15 observations of table.So i expect that my query will return me this data.frame

I'm not sure about the percentile stuff; I'll leave it to you to get that part straightened out. Anyway, here is how I use R to query a database.
library(RODBC)
dbconnection <- odbcDriverConnect("Driver=ODBC Driver 11 for SQL Server;Server=Server_Name; Database=DB_Name;Uid=; Pwd=; trusted_connection=yes")
initdata <- sqlQuery(dbconnection,paste("select * from MyTable;"))
odbcClose(channel)
Here are a couple very useful resources for cross-reference.
http://stackoverflow.com/questions/15420999/rodbc-odbcdriverconnect-connection-error
https://andersspur.wordpress.com/2013/11/26/connect-r-to-sql-server-2012-and-14/

Related

Eight-hour averages in R studio

I want to get eight-hour (Time column) averages of the maximum values of O column. It is important that the time column is each hour of one day. Also, the dataset contains the next columns: (place) several place, (date) 365 days, (Time) per each day it has 24 hours, (O) values of ozone, month and year.
The sample dataset is:
data%>%select("Place","date","Time", "O",
"month","year")
Place Date Time O month year
Al 2016-01-01 1 23 enero 2016
Al 2016-01-01 2 15 enero 2016
Al 2016-01-01 3 18 enero 2016
Al 2016-01-01 4 18 enero 2016
Al 2016-01-01 5 20 enero 2016
Al 2016-01-01 6 21 enero 2016
Al 2016-01-01 7 24 enero 2016
Al 2016-01-01 8 24 enero 2016
Al 2016-01-01 9 22 enero 2016
Al 2016-01-01 10 24 enero 2016
Al 2016-01-01 11 33 enero 2016
Al 2016-01-01 12 53 enero 2016
Al 2016-01-01 13 54 enero 2016
Al 2016-01-01 14 54 enero 2016
Al 2016-01-01 15 58 enero 2016
Al 2016-01-01 16 60 enero 2016
Al 2016-01-01 17 57 enero 2016
Al 2016-01-01 18 55 enero 2016
Al 2016-01-01 19 50 enero 2016
Al 2016-01-01 20 51 enero 2016
Al 2016-01-01 21 51 enero 2016
Al 2016-01-01 22 55 enero 2016
Al 2016-01-01 23 46 enero 2016
Al 2016-01-01 24 57 enero 2016
I hope to get the maximum values of O column by the period of one day, ie, the 2016-01-01 I want to get eight maximum value of O and make the average.
But I don't know how to do.
You can use zoo::rollapply for this.
Default usage:
zoo::rollapply(dat$O, 8, max, fill = NA)
# [1] NA NA NA 24 24 24 33 53 54 54 58 60 60 60 60 60 60 60 60 57 NA NA NA NA
This is "centering" the window, where the first 24 is the max of positions 1-8. I already added fill=NA, since we need the output to be the same size as the input vector.
You can change the alignment, so that the max is of the value and 7 to its left or its right. For instance,
zoo::rollapply(dat$O, 8, max, fill = NA, align = "left")
# [1] 24 24 24 33 53 54 54 58 60 60 60 60 60 60 60 60 57 NA NA NA NA NA NA NA
zoo::rollapply(dat$O, 8, max, fill = NA, align = "right")
# [1] NA NA NA NA NA NA NA 24 24 24 33 53 54 54 58 60 60 60 60 60 60 60 60 57
I'll assume that we need the latter (align="right").
Finally, we can do a partial max, where the second value is the max of indices 1-2; third value is the max of indices 1-3; etc. In that case,
zoo::rollapply(dat$O, 8, max, align = "right", partial = TRUE)
# [1] 23 23 23 23 23 23 24 24 24 24 33 53 54 54 58 60 60 60 60 60 60 60 60 57
(Notice we don't technically need fill=NA anymore.)
If you want "partial" max,

Aggregate on a daily basis in R

I'm borrowing the reproducible example given here:
Aggregate daily level data to weekly level in R
since it's pretty much close to what I want to do.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
In his question, he asks to aggregate on weekly intervals, what I'd like to do is aggregate on a "day of the week basis".
So I'd like to have a table similar to that one, adding the values of all the same day of the week:
Day of the week value
1 "Sunday" 60000
2 "Monday" 50000
3 "Tuesday" 60000
4 "Wednesday" 50000
5 "Thursday" 60000
6 "Friday" 50000
7 "Saturday" 60000
You can try:
aggregate(d$value, list(weekdays(as.Date(d$Interval))), sum)
We can group them by weekly intervals using weekdays :
library(dplyr)
df %>%
group_by(Day_Of_The_Week = weekdays(as.Date(Interval))) %>%
summarise(value = sum(value))
# Day_Of_The_Week value
# <chr> <int>
#1 Friday 16903
#2 Monday 26368
#3 Saturday 4738
#4 Sunday 2975
#5 Thursday 17858
#6 Tuesday 23772
#7 Wednesday 13560
We can do this with data.table
library(data.table)
setDT(df1)[, .(value = sum(value)), .(Dayofweek = weekdays(as.Date(Interval)))]
# Dayofweek value
#1: Sunday 2975
#2: Monday 26368
#3: Tuesday 23772
#4: Wednesday 13560
#5: Thursday 17858
#6: Friday 16903
#7: Saturday 4738
using lubridate https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
df1$Weekday=wday(arrive,label=TRUE)
library(data.table)
df1=data.table(df1)
df1[,sum(value),Weekday]

insert new rows to the time series data, with date added automatically

I have a time-series data frame looks like:
TS.1
2015-09-01 361656.7
2015-09-02 370086.4
2015-09-03 346571.2
2015-09-04 316616.9
2015-09-05 342271.8
2015-09-06 361548.2
2015-09-07 342609.2
2015-09-08 281868.8
2015-09-09 297011.1
2015-09-10 295160.5
2015-09-11 287926.9
2015-09-12 323365.8
Now, what I want to do is add some new data points (rows) to the existing data frame, say,
320123.5
323521.7
How can I added corresponding date to each row? The data is just sequentially inhered from the last row.
Is there any package can do this automatically, so that the only thing I do is to insert new data point?
Here's some play data:
df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31))
new.x <- c(32, 33)
This adds the extra observations along with the proper sequence of dates:
new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x)
Then just rbind them to get your expanded data frame:
rbind(df, new.df)
date x
1 2015-01-01 1
2 2015-01-02 2
3 2015-01-03 3
4 2015-01-04 4
5 2015-01-05 5
6 2015-01-06 6
7 2015-01-07 7
8 2015-01-08 8
9 2015-01-09 9
10 2015-01-10 10
11 2015-01-11 11
12 2015-01-12 12
13 2015-01-13 13
14 2015-01-14 14
15 2015-01-15 15
16 2015-01-16 16
17 2015-01-17 17
18 2015-01-18 18
19 2015-01-19 19
20 2015-01-20 20
21 2015-01-21 21
22 2015-01-22 22
23 2015-01-23 23
24 2015-01-24 24
25 2015-01-25 25
26 2015-01-26 26
27 2015-01-27 27
28 2015-01-28 28
29 2015-01-29 29
30 2015-01-30 30
31 2015-01-31 31
32 2015-02-01 32
33 2015-02-02 33

Mutate with dplyr strange error

I'm trying to create new variables with mutate in dplyr and I can't understand my error, I've tried everything and have not stumbled upon this issue in the past.
I have a large data set, over a million observations. I only provide you with the 20 first observations.
This is how my data looks like:
data1 <- read.table(header=TRUE, text="IDnr visit time year end event survival
7 1 04/09/06 2006 31/12/06 0 118
7 2 04/09/06 2007 31/12/07 0 483
7 3 04/09/06 2008 31/12/08 0 849
7 4 04/09/06 2009 31/12/09 0 1214
7 5 04/09/06 2010 31/12/10 0 1579
7 6 04/09/06 2011 31/12/11 0 1944
20 1 24/10/03 2003 31/12/03 0 68
20 2 24/10/03 2004 31/12/04 0 434
20 3 24/10/03 2005 31/12/05 0 799
20 4 24/10/03 2006 31/12/06 0 1164
20 5 24/10/03 2007 31/12/07 0 1529
20 6 24/10/03 2008 31/12/08 0 1895
20 7 24/10/03 2009 31/12/09 0 2260
20 8 24/10/03 2010 31/12/10 0 2625
20 9 24/10/03 2011 31/12/11 0 2990
87 1 17/01/06 2006 31/12/06 0 348
87 2 17/01/06 2007 31/12/07 0 713
87 3 17/01/06 2008 31/12/08 0 1079
87 4 17/01/06 2009 31/12/09 0 1444
87 5 17/01/06 2010 31/12/10 0 1809")
I must say that the date and time variables does not have this format in my dataset, I't is coded with POSIXct with the format ("%Y-%m-%d"). I't somehow reformats itself when I attach I't to stackoverflow and apply the "code" citations.
Okey, the problem is that I'm trying to create new survival time variables in the same dataset, one is for a cox regression model with stop and start time (survival is stop time and the new start variable should be called survcox).
Also im trying to do a poisson regression where the offset variable (i.e the survival time variable) should be called survpois. This is the code I'm trying to use;
data2 <- data1 %>%
group_by(IDnr) %>%
mutate(survcox = ifelse(visit==1, 0, lag(survival)),
year_aar = substr(data1$year, 1,4), first_day = as.POSIXct(paste0(year_aar, "-01-01-")),
survpois = as.numeric(data1$end - first_day)+1) %>%
mutate(survpois = ifelse(year_aar > first_day, as.numeric(end - year_aar),
survpois)) %>%
ungroup()
I receive an error in this step!
Error: incompatible size (1345000), expecting 6 (the group size) or 1
I have no idea why I get this error, what I't means and why my code doesn't work.
All the help I can get is appreciated, thanks in advance!
It's because you reference variable as data1$year which doesn't fit in grouped data (and in data1$end too)
I teased apart your code and found a few issues. One was the thing I mentioned in the comment above. Second thing was the class of end. If the data you provided is the one, end is factor. If this is the case in your own situation, you need to convert end to an date object. The other thing was year_aar > first_day. first_day is a date object whereas year_arr is character. Given those, I modified your code.
data1 %>%
group_by(IDnr) %>%
mutate(survcox = ifelse(visit == 1, 0, lag(survival)),
year_aar = substr(year, 1,4),
first_day = as.POSIXct(paste0(year_aar, "-01-01-")),
survpois = as.numeric(as.POSIXct(end, format = "%d/%m/%y") - first_day) + 1) %>%
mutate(survpois = ifelse(as.numeric(year_aar) > as.numeric(format(first_day, "%Y")),
as.numeric(as.POSIXct(end, format = "%d/%m/%y") - year_aar), survpois)) %>%
ungroup()
Here is a bit of the outcome.
# IDnr visit time year end event survival survcox year_aar first_day survpois
#1 7 1 04/09/06 2006 31/12/06 0 118 0 2006 2006-01-01 365
#2 7 2 04/09/06 2007 31/12/07 0 483 118 2007 2007-01-01 365
#3 7 3 04/09/06 2008 31/12/08 0 849 483 2008 2008-01-01 366
#4 7 4 04/09/06 2009 31/12/09 0 1214 849 2009 2009-01-01 365
#5 7 5 04/09/06 2010 31/12/10 0 1579 1214 2010 2010-01-01 365

Aggregate daily level data to weekly level in R

I have a huge dataset similar to the following reproducible sample data.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
I want to aggregate this data to weekly level to get the output similar to the following:
Interval value
1 Week 2, June 2012 *aggregate value for day 10 to day 14 of June 2012*
2 Week 3, June 2012 *aggregate value for day 15 to day 21 of June 2012*
3 Week 4, June 2012 *aggregate value for day 22 to day 28 of June 2012*
4 Week 5, June 2012 *aggregate value for day 29 to day 30 of June 2012*
5 Week 1, July 2012 *aggregate value for day 1 to day 7 of July 2012*
6 Week 2, July 2012 *aggregate value for day 8 to day 10 of July 2012*
How do I achieve this easily without writing a long code?
If you mean the sum of of ‘value’ by week I think the easiest way to do it is to convert the data into a xts object as GSee suggested:
data <- as.xts(data$value,order.by=as.Date(data$interval))
weekly <- apply.weekly(data,sum)
[,1]
2012-06-10 552
2012-06-17 23629
2012-06-24 23872
2012-07-01 23667
2012-07-08 23552
2012-07-10 10902
I leave the formatting of the output as an exercise for you :-)
If you were to use week from lubridate, you would only get five weeks to pass to by. Assume dat is your data,
> library(lubridate)
> do.call(rbind, by(dat$value, week(dat$Interval), summary))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 24 552 4146 4188 3759 4529 4850
# 25 490 2498 4256 3396 4438 5156
# 26 564 2578 4206 3355 4346 4866
# 27 698 993 4868 3366 5122 5770
# 28 671 1086 3200 3200 5314 5726
This shows a summary for the 24th through 28th week of the year. Similarly, we can get the means with aggregate with
> aggregate(value~week(Interval), data = dat, mean)
# week(Interval) value
# 1 24 3758.667
# 2 25 3396.286
# 3 26 3355.000
# 4 27 3366.429
# 5 28 3199.500
I just came across this old question because it was used as a dupe target.
Unfortunately, all the upvoted answers (except the one by konvas and a now deleted one) present solutions for aggregating the data by week of the year while the OP has requested to aggregate by week of the month.
The definition of week of the year and week of the month is ambiguous as discussed here, here, and here.
However, the OP has indicated that he wants to count the days 1 to 7 of each month as week 1 of the month, days 8 to 14 as week 2 of the month, etc. Note that week 5 is a stub for most of the months consisting of only 2 or 3 days (except for the month of February if no leap year).
Having prepared the ground, here is a data.table solution for this kind of aggregation:
library(data.table)
DT[, .(value = sum(value)),
by = .(Interval = sprintf("Week %i, %s",
(mday(Interval) - 1L) %/% 7L + 1L,
format(Interval, "%b %Y")))]
Interval value
1: Week 2, Jun 2012 18366
2: Week 3, Jun 2012 24104
3: Week 4, Jun 2012 23348
4: Week 5, Jun 2012 5204
5: Week 1, Jul 2012 23579
6: Week 2, Jul 2012 11573
We can verify that we have picked the correct intervals by
DT[, .(value = sum(value),
date_range = toString(range(Interval))),
by = .(Week = sprintf("Week %i, %s",
(mday(Interval) -1L) %/% 7L + 1L,
format(Interval, "%b %Y")))]
Week value date_range
1: Week 2, Jun 2012 18366 2012-06-10, 2012-06-14
2: Week 3, Jun 2012 24104 2012-06-15, 2012-06-21
3: Week 4, Jun 2012 23348 2012-06-22, 2012-06-28
4: Week 5, Jun 2012 5204 2012-06-29, 2012-06-30
5: Week 1, Jul 2012 23579 2012-07-01, 2012-07-07
6: Week 2, Jul 2012 11573 2012-07-08, 2012-07-10
which is in line with OP's specification.
Data
library(data.table)
DT <- fread(
"rn Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176", drop = 1L)
DT[, Interval := as.Date(Interval)]
If you are using a data frame, you can easily do this with the tidyquant package. Use the tq_transmute function, which applies a mutation and returns a new data frame. Select the "value" column and apply the xts function apply.weekly. The additional argument FUN = sum will get the aggregate by week.
library(tidyquant)
df
#> # A tibble: 31 x 2
#> Interval value
#> <date> <int>
#> 1 2012-06-10 552
#> 2 2012-06-11 4850
#> 3 2012-06-12 4642
#> 4 2012-06-13 4132
#> 5 2012-06-14 4190
#> 6 2012-06-15 4186
#> 7 2012-06-16 1139
#> 8 2012-06-17 490
#> 9 2012-06-18 5156
#> 10 2012-06-19 4430
#> # ... with 21 more rows
df %>%
tq_transmute(select = value,
mutate_fun = apply.weekly,
FUN = sum)
#> # A tibble: 6 x 2
#> Interval value
#> <date> <int>
#> 1 2012-06-10 552
#> 2 2012-06-17 23629
#> 3 2012-06-24 23872
#> 4 2012-07-01 23667
#> 5 2012-07-08 23552
#> 6 2012-07-10 10902
When you say "aggregate" the values, you mean take their sum? Let's say your data frame is d and assuming d$Interval is of class Date, you can try
# if d$Interval is not of class Date d$Interval <- as.Date(d$Interval)
formatdate <- function(date)
paste0("Week ", (as.numeric(format(date, "%d")) - 1) + 1,
", ", format(date, "%b %Y"))
# change "sum" to your required function
aggregate(d$value, by = list(formatdate(d$Interval)), sum)
# Group.1 x
# 1 Week 1, Jul 2012 3725.667
# 2 Week 2, Jul 2012 3199.500
# 3 Week 2, Jun 2012 3544.000
# 4 Week 3, Jun 2012 3434.000
# 5 Week 4, Jun 2012 3333.143
# 6 Week 5, Jun 2012 3158.667

Resources