I'm solving the exercises from book Forecasting:Principles and Practice 3rd edition
On chapter 7 ex 1 I want to filter Jan 2014 month from tsibbledata:vic_elec and summarise data by day, here's the code :
jan14_vic_elec <- vic_elec %>%
filter(yearmonth(Time) == yearmonth("2014 Jan")) %>%
index_by(Date = as_date(Time)) %>%
summarise(
Demand = sum(Demand),
Temperature = max(Temperature)
)
This chunk on the filter() functions gives an error :
Error: Problem with filter() input ..1. i Input ..1 is
yearmonth(Time) == yearmonth("2014 Jan"). x function
'Rcpp_precious_remove' not provided by package 'Rcpp'
Can somebody help ?
Open a new r window, and do this.
It should work!
Main issue is you have some package clashes. Hence start in a new window
library(fpp3)
jan14_vic_elec <- vic_elec %>%
filter(yearmonth(Time) == yearmonth("2014 Jan")) %>%
index_by(Date = as_date(Time)) %>%
summarise(
Demand = sum(Demand),
Temperature = max(Temperature)
)
Related
I am connected with impala
con <- DBI::dbConnect(odbc::odbc(), "impala connector", schema = "some_schema")
library(dplyr)
library(dbplyr) #I have to load both of them, if not tbl won't work
table <- tbl(con, 'serverTable')
I would like to use Pearson's R to track the change of a measure in time as a quick and dirty prediction model.
In locale, it works quite well, but I have problems implementing it on the server.
Here's the code:
library(corrr)
table %>%
filter(!is.na(VAR) | VAR > -10 | VAR < -32) %>%
#VAR is the measure, and values over -10 or under -32 are already out of the threshold, I wanna intercept the subjects before that
mutate(num_date = as.numeric(as.POSIXct(date))) %>%
#to convert the date string into the number of seconds since 1970
group_by(id) %>%
#the measure is taken daily for various subjects, I am interested in isolating the subjects approaching the thresholds
mutate(corr = corrr::correlate(VAR, num_date)) %>%
ungroup() %>%
#here I calculare Pearson's R, I must specify corrr:: if not I get an error
filter(abs(corr) > 0.9) %>%
#in locale I found out that a value of 0.9 is good for isolating the subjects whose measure is approaching the thresholds
select(id) %>%
collect()
If I run this though, I get the error:
Error in corrr::correlate(VAR, num_date) : object 'VAR' not found.
So I tried to substitute that line with
mutate(corr = corrr::correlate(.$VAR, .$num_date)) %>%
and like this I get the error
Error in stats::cor(x = x, y = y, use = use, method = method) : supply both 'x' and 'y' or a matrix-like 'x'
if instead I try to use cor from stats, cor(VAR, num_date), I get the error
Error in new_result(connection#ptr, statement, immediate) : nanodbc/nanodbc.cpp:1412: HY000: [Cloudera][ImpalaODBC] (370) Query analysis error occurred during query execution: [HY000] : AnalysisException: some_schema.cor() unknown
like dbplyr can't translate cor into SQL (I see it if I run show_query() instead of collect() )
EDIT,
I solved the problem using SQL:
SELECT id, cor
FROM(
SELECT id,
((tot_sum - (VAR_sum * date_sum / _count)) / sqrt((VAR_sq - pow(VAR_sum, 2.0) / _count) * (date_sq - pow(date_sum, 2.0) / _count))) AS cor
FROM (
SELECT id,
sum(VAR) AS VAR_sum,
sum(CAST(CAST(date AS TIMESTAMP) AS DOUBLE)) AS date_sum,
sum(VAR * VAR) AS VAR_sq,
sum(CAST(CAST(date AS TIMESTAMP) AS DOUBLE) * CAST(CAST(date AS TIMESTAMP) AS DOUBLE)) AS date_sq,
sum(VAR * CAST(CAST(date_push AS TIMESTAMP) AS DOUBLE)) AS tot_sum,
count(*) as _count
FROM (
SELECT id, VAR, date
FROM (
SELECT id, VAR, date
FROM schema
WHERE VAR IS NOT NULL) AS a
WHERE VAR < -10 OR VAR > -32) AS b
GROUP BY idur) AS c) AS d
WHERE ABS(cor) > 0.9 AND ABS(cor) <= 1
thanks to this article:
https://chartio.com/learn/postgresql/correlation-coefficient-pearson/
cor is not in the list of functions that dplyr can translate - see here: https://dbplyr.tidyverse.org/articles/sql-translation.html#known-functions
You can try the following in your code:
mutate(corr = translate_sql(corr(VAR, num_date)))
This should translate directly to CORR(VAR, num_date). These translations don't work in all database types. If you can't get this working in your case, you likely have no choice but to collect your data before you try to run non-translatable functions.
My solution was to use dplyr's functions to replicate the correlation formula:
temp_cor = d_price_w_db %>% # your table from SQL from tbl(con, "NAME OF TABLE")
group_by(GroupA, GroupB) %>% # Your groups
# And then use summarise to create the correlation.
# You can create as many as you like:
summarise(cor_temp_ab = ( avg(temp_a*temp_b) - (avg(temp_a)*avg(temp_b)) ) /
( sd(temp_a) * sd(temp_b) ),
.groups = "drop"
)
This creates the SQL query that will create your correlation coefficients. You can see it with show_query(temp_cor). Finally just do
local_object = temp_cor %>%
collect()
To save the result of your query in a local object.
The formula for correlation from this post: https://www.red-gate.com/simple-talk/blogs/statistics-sql-pearsons-correlation/
everything good?
During that week I spent time writing a script that even this morning seemed to work. but then I tried to run it again and exactly in a part that uses the function "summarize" of the package dplyr appears an error that I had never seen.
Below is an excerpt of the code I used and the error on the console:
library(tidyverse)
a <- c(1,0,1,1,0,1,1,1,1,0,0)
b <-c( 0.9157101,
0.4854955,
0.8853174,
0.4373646,
0.3855175,
0.8603407,
0.9193342,
0.4693117,
0.9849855,
0.4458159,
0.4379776)
c <- c(8,2,7,1,0,6,8,1,9,1,1)
treated_data <- data.frame(Risk = a ,
Model_Predicted = b,
Grupo = c)
calculo <- treated_data %>% group_by(Grupo) %>% summarise(Quantidade = n(),
Non_event = sum(Risk),
Event = n() - sum(Risk))
Console Result:
---------------------------------------------------------
Error in n() : argument "vec" is missing, with no default
---------------------------------------------------------
I am new to the tsibble package. I have monthly data that I coerced to a tsibble to use the fable package. A few issues I am having
It appears the index variable (from my testing) is not of class date even though I applied
lubridate's ymd function to it.
has_gaps function returns FALSE but when I model on the data, I get the error that ".data contains
implicit gaps in time"
library(dplyr)
library(fable)
library(lubridate)
library(tsibble)
test <- data.frame(
YearMonth = c(20160101, 20160201, 20160301, 20160401, 20160501, 20160601,
20160701, 20160801, 20160901, 20161001, 20161101, 20161201),
Claims = c(13032647, 1668005, 24473616, 13640769, 17891432, 11596556,
23176360, 7885872, 11948461, 16194792, 4971310, 18032363),
Revenue = c(12603367, 18733242, 5862766, 3861877, 15407158, 24534258,
15633646, 13720258, 24944078, 13375742, 4537475, 22988443)
)
test_ts <- test %>%
mutate(YearMonth = ymd(YearMonth)) %>%
as_tsibble(
index = YearMonth,
regular = FALSE #because it picks up gaps when I set it to TRUE
)
# Are there any gaps?
has_gaps(test_ts, .full = T)
model_new <- test_ts %>%
model(
snaive = SNAIVE(Claims))
Warning messages:
1: 1 error encountered for snaive
[1] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.
Any help will appreciated.
You have a daily index, but you want a monthly index. The simplest way is to use the tsibble::yearmonth() function, but you will need to convert the date to character first.
library(dplyr)
library(tsibble)
test <- data.frame(
YearMonth = c(20160101, 20160201, 20160301, 20160401, 20160501, 20160601,
20160701, 20160801, 20160901, 20161001, 20161101, 20161201),
Claims = c(13032647, 1668005, 24473616, 13640769, 17891432, 11596556,
23176360, 7885872, 11948461, 16194792, 4971310, 18032363),
Revenue = c(12603367, 18733242, 5862766, 3861877, 15407158, 24534258,
15633646, 13720258, 24944078, 13375742, 4537475, 22988443)
)
test_ts <- test %>%
mutate(YearMonth = yearmonth(as.character(YearMonth))) %>%
as_tsibble(index = YearMonth)
Looks like as_tsibble isn't able to recognize the interval properly in the YearMonth column because it is a Date class object. It's hidden in the 'Index' section of help page that that might be problem:
For a tbl_ts of regular interval, a choice of index representation has to be made. For example, a monthly data should correspond to time index created by yearmonth or zoo::yearmon, instead of Date or POSIXct.
Like that excerpt suggests you can get around the problem with yearmonth(). But that requires a little string manipulation first to get it into a format that will parse properly.
test_ts <- test %>%
mutate(YearMonth = gsub("(.{2})01$", "-\\1", YearMonth) %>%
yearmonth()
) %>%
as_tsibble(
index = YearMonth
)
Now the model should run error free! Not sure why the has_gaps() test is saying everything is okay in your example...
I want to be able to modify the resource capacity inside trajectory as a function of queue length.
The following (simplified) code below does not work. - When I try to call get_mon_resources(simStore) inside the function, the code crashes with the error:
Error in run_(private$sim_obj, until) :
Expecting a single value: [extent=0].
Thank you for your help.
simStore <- simmer()
fUpdateNumberOfCashiers <- function() {
dtLastRes <- simStore %>% get_mon_resources %>% tail(1)
nCapacityNow <- dtLastRes$capacity # same result with get_capacity(simStore),
nQueueNow <- dtLastRes$queue # same result with get_queue_count(simStore)
print(dtLastRes) # prints empty data-frame !
return (5) # crashes here ! (eventually 5 will be replaced with more meaningful formula
}
trajClient <- trajectory("Client's path") %>%
log_("Arrived to cashier") %>%
set_capacity("Cashier", value = fUpdateNumberOfCashiers ) %>%
seize("Cashier") %>%
timeout(function() {rexp(1, 30)}) %>% # One Cashier processes 30 clients / hour
release("Cashier") %>%
log_(function(attr) { sprintf("In total spent %.2f", now(simStore) - attr["start_time"])})
simStore <- simmer("Store") %>%
add_resource("Cashier", 1) %>%
add_generator("Store Clients", trajClient, function() {rexp(1, 120)}) %>% # 120 clients / hour
run(until=nHoursObserved <- 1) ; simStore
See the discussion related to troubleshooting this problem here: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/simmer-devel/NgIikOpHpss
What causes the problem is that the other package (lubridate) masks objects from "simmer", as written seen below:
Attaching package: ‘lubridate’
The following objects are masked from ‘package:simmer’:
now, rollback
Once I replaced
library(simmer); library(lubridate);
with
library(lubridate); library(simmer);
The problem disappeared!
From google am trying to understand "simmer" package in R for discrete event simulation.
As per the example given in the link, am trying to create the simulator using create_simulator function. but am getting the below error
Code Used:
trajectory <- read.table(header=T, text= "event_id description resource amount duration successor 1 registration administration 1 runif(1,3,10) 2 2 intake nurse 1 runif(1,10,20) 3 3 consultation doctor 1 runif(1,5,15) NA" )
sim <- create_simulator(name = "SuperDuperSim") %>% add_trajectory(name = "simple_trajectory", trajectory_df = trajectory) %>% add_resource(name = "administration", capacity = 1) %>% add_resource(name = "nurse", capacity = 1) %>% add_resource(name = "doctor", capacity = 2) %>% add_entities_with_interval(n = 10, name_prefix = "patient", trajectory_name = "simple_trajectory", interval = "rnorm(1,10)") %>% replicator(15)
Error:
Error in eval(expr, envir, enclos) :
could not find function "create_simulator"
I have never used this package before but it appears to me that the create_simulator function does not exist in the current version of the package (the add_trajectory function doesn't seem to exist too). Information about the simmer package is available at https://github.com/r-simmer/simmer. More specifically, an introduction is given at: https://cran.r-project.org/web/packages/simmer/vignettes/A-introduction.html. It appears to me from this site that the creation of the simulator is now done by typing simmer(nameOfMySimulator).