Error in n() inside Summarise Function dplyr - r

everything good?
During that week I spent time writing a script that even this morning seemed to work. but then I tried to run it again and exactly in a part that uses the function "summarize" of the package dplyr appears an error that I had never seen.
Below is an excerpt of the code I used and the error on the console:
library(tidyverse)
a <- c(1,0,1,1,0,1,1,1,1,0,0)
b <-c( 0.9157101,
0.4854955,
0.8853174,
0.4373646,
0.3855175,
0.8603407,
0.9193342,
0.4693117,
0.9849855,
0.4458159,
0.4379776)
c <- c(8,2,7,1,0,6,8,1,9,1,1)
treated_data <- data.frame(Risk = a ,
Model_Predicted = b,
Grupo = c)
calculo <- treated_data %>% group_by(Grupo) %>% summarise(Quantidade = n(),
Non_event = sum(Risk),
Event = n() - sum(Risk))
Console Result:
---------------------------------------------------------
Error in n() : argument "vec" is missing, with no default
---------------------------------------------------------

Related

Advise to filter yearmonth

I'm solving the exercises from book Forecasting:Principles and Practice 3rd edition
On chapter 7 ex 1 I want to filter Jan 2014 month from tsibbledata:vic_elec and summarise data by day, here's the code :
jan14_vic_elec <- vic_elec %>%
filter(yearmonth(Time) == yearmonth("2014 Jan")) %>%
index_by(Date = as_date(Time)) %>%
summarise(
Demand = sum(Demand),
Temperature = max(Temperature)
)
This chunk on the filter() functions gives an error :
Error: Problem with filter() input ..1. i Input ..1 is
yearmonth(Time) == yearmonth("2014 Jan"). x function
'Rcpp_precious_remove' not provided by package 'Rcpp'
Can somebody help ?
Open a new r window, and do this.
It should work!
Main issue is you have some package clashes. Hence start in a new window
library(fpp3)
jan14_vic_elec <- vic_elec %>%
filter(yearmonth(Time) == yearmonth("2014 Jan")) %>%
index_by(Date = as_date(Time)) %>%
summarise(
Demand = sum(Demand),
Temperature = max(Temperature)
)

Issue creating statcast database with BaseballR Package

I am trying to create a database of all MLB statcast outcomes. For this, I am using the baseballr package made by Bill Petti https://billpetti.github.io/2020-05-26-build-statcast-database-rstats-version-2.0/. I am not connecting to a SQL database but simply making a data frame in R. I want to collect all statcast data from 2019 and 2020. First, I loaded in the necessary packages.
library(baseballr)
library(tidyverse)
Then I executed the annual_statcast_query function:
annual_statcast_query <- function(season) {
dates <- seq.Date(as.Date(paste0(season, '-03-01')),
as.Date(paste0(season, '-12-01')), by = 'week')
date_grid <- tibble(start_date = dates,
end_date = dates + 6)
safe_savant <- safely(scrape_statcast_savant)
payload <- map(.x = seq_along(date_grid$start_date),
~{message(paste0('\nScraping week of ', date_grid$start_date[.x], '...\n'))
payload <- safe_savant(start_date = date_grid$start_date[.x],
end_date = date_grid$end_date[.x], type = 'pitcher')
return(payload)
})
payload_df <- map(payload, 'result')
number_rows <- map_df(.x = seq_along(payload_df),
~{number_rows <- tibble(week = .x,
number_rows = length(payload_df[[.x]]$game_date))}) %>%
filter(number_rows > 0) %>%
pull(week)
payload_df_reduced <- payload_df[number_rows]
combined <- payload_df_reduced %>%
bind_rows()
return(combined)
}
When I ran his code for the 2019 season payload <- annual_statcast_query(2019), I could scrape the data without any problems. However, when I tried it for 2020 payload <- annual_statcast_query(2020) I encountered the error:
Error: Can't combine `spin_rate_deprecated` <logical> and `spin_rate_deprecated` <character>.
This error occurs in the last part of the annual_statcast_query function:
combined <- payload_df_reduced %>%
bind_rows()
When reading through the statcast documentation (https://baseballsavant.mlb.com/csv-docs), it appears that the variable spin_rate_depreceated was replaced by release_spin. Perhaps this is why I am encountering this error. I do not need this variable for my analysis, and the error tracing I did made it very obvious that fixing the problem is beyond my skill set as a college student.
> rlang::last_error()
<error/vctrs_error_incompatible_type>
Can't combine `spin_rate_deprecated` <logical> and `spin_rate_deprecated` <character>.
Backtrace:
1. global::annual_statcast_query(2020)
3. dplyr::bind_rows(.)
4. vctrs::vec_rbind(!!!dots, .names_to = .id)
6. vctrs::vec_default_ptype2(...)
7. vctrs:::vec_ptype2_df_fallback(x, y, opts)
8. vctrs:::vec_ptype2_params(...)
9. vctrs:::vec_ptype2_opts(x, y, opts = opts, x_arg = x_arg, y_arg = y_arg)
11. vctrs::vec_default_ptype2(...)
12. vctrs::stop_incompatible_type(...)
13. vctrs:::stop_incompatible(...)
14. vctrs:::stop_vctrs(...)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/vctrs_error_incompatible_type>
Can't combine `spin_rate_deprecated` <logical> and `spin_rate_deprecated` <character>.
Backtrace:
x
1. +-global::annual_statcast_query(2020)
2. | \-payload_df_reduced %>% bind_rows()
3. \-dplyr::bind_rows(.)
4. \-vctrs::vec_rbind(!!!dots, .names_to = .id)
5. \-(function () ...
6. \-vctrs::vec_default_ptype2(...)
7. \-vctrs:::vec_ptype2_df_fallback(x, y, opts)
8. \-vctrs:::vec_ptype2_params(...)
9. \-vctrs:::vec_ptype2_opts(x, y, opts = opts, x_arg = x_arg, y_arg = y_arg)
10. \-(function () ...
11. \-vctrs::vec_default_ptype2(...)
12. \-vctrs::stop_incompatible_type(...)
13. \-vctrs:::stop_incompatible(...)
14. \-vctrs:::stop_vctrs(...)
Therefore, I tried to drop this variable from my database before the bind rows operation to avoid the error.
combined <- payload_df_reduced %>%
payload_df_reduced[ , !names(payload_df_reduced) %in% c("spin_rate_deprecated")] %>%
bind_rows()
However, this returned the error message:
Error in .[payload_df_reduced, , !names(payload_df_reduced) %in% c("spin_rate_deprecated")] :
incorrect number of dimensions
I am running
packageVersion("baseballr") [1] ‘0.8.3’
On R 4.03
If anyone could help me find a way to do this, that would be amazing. I am not picky about how I get this data, so I am all ears if anyone has an idea. Thank you so much!
To drop a column from data.frame you should do this:
payload_df_reduced %>%
select(-c(spin_rate_deprecated))
or if using your current way it should be like this
payload_df_reduced[ , !names(payload_df_reduced) %in% c("spin_rate_deprecated")]
Your current code is not work because it is incorrect grammar.
It seem that your payload_df_reduced is a list of data.frame not one data.frame. I tried to run your code but it seem you have other functions so not reproducible. Here is a theory code that you may need to adjust a bit.
combined <- map(payload_df_reduced, select, -c(spin_rate_deprecated)) %>%
bind_rows()

How to create a variable that uses previous (lagged) instance of its own value?

There is a panel dataset. I have to compute a new variable:
cases_new = 3^(lag(cases, n=1L))
for the first date alone. Then the 2nd date onwards, it uses lag (i.e previous instance) on its own value, i.e.:
cases_new = 3^lag(cases_new, n=1L)
I tried to implement using the following code:
df2 <-df1 %>%
mutate(cases_new = if_else(date == "2020-01-01", 3^(lag(cases, n=1L)), 3^(lag(cases_new, n=1L)))
But it threw an error stating that object 'cases_new' not found. I even tried initializing cases_new prior to the above assignment by:
df1$cases_new <- NA
But this did not update cases_new correctly (I get all NAs). Can someone help me out with this particular recursive implementation?
You could solve your initialization problem like this:
df2 <-df1 %>%
mutate(
cases_new = 3^(lag(cases, n=1L)),
cases_new = if_else(date == "2020-01-01", cases_new, 3^(lag(cases_new, n=1L)))
)
BUT, I dont think that this will give you the values you want. You will have to iterate over the rows like this:
cases_new <- c(3^(lag(cases, n=1L)), rep(0, nrow(df1) - 1))
for (i in 2:nrow(df1)) {
cases_new[i] <- 3^(lag(cases_new[i - 1], n=1L))
}
df1$cases_new <- cases_new

Why do I get this error using biomod2:response.plot2, and is it important? Error in ncol(dat_) : could not find function "ncol"

When I run the example for the response.plot2 function (biomod2 package) I get the above error. The code produces some plots but does not save an object
Here's the example (including the code that I ran): https://www.rdocumentation.org/packages/biomod2/versions/3.3-7.1/topics/response.plot2
)
[edit:]
The source code for the function response.plot2 is here:
https://r-forge.r-project.org/scm/viewvc.php/checkout/pkg/biomod2/R/response.plot.R?revision=728&root=biomod
It includes these lines:
.as.ggdat.1D <-
function (rp.dat)
{
# requireNamespace('dplyr')
out_ <- bind_rows(lapply(rp.dat, function(dat_) {
dat_$id <- rownames(dat_)
id.col.id <- which(colnames(dat_) == "id")
expl.dat_ <- dat_ %>% dplyr::select(1, id.col.id) %>%
tidyr::gather("expl.name", "expl.val", 1)
pred.dat_ <- dat_ %>% dplyr::select(-1, id.col.id) %>%
tidyr::gather("pred.name", "pred.val", (1:(ncol(dat_)-2)))
out.dat_ <- dplyr::full_join(expl.dat_, pred.dat_)
out.dat_$expl.name <- as.character(out.dat_$expl.name)
out.dat_$pred.name <- as.character(out.dat_$pred.name)
return(out.dat_)
}))
out_$expl.name <- factor(out_$expl.name, levels = unique(out_$expl.name))
return(out_)
}
I tried changing ncol(dat_) to base::ncol(dat_) and then running the whole lot to redefine the function response.plot2 for my R session, but I got a different error message:
Error in base::ncol : could not find function "::"

simmer: reading resources from inside trajectory functions

I want to be able to modify the resource capacity inside trajectory as a function of queue length.
The following (simplified) code below does not work. - When I try to call get_mon_resources(simStore) inside the function, the code crashes with the error:
Error in run_(private$sim_obj, until) :
Expecting a single value: [extent=0].
Thank you for your help.
simStore <- simmer()
fUpdateNumberOfCashiers <- function() {
dtLastRes <- simStore %>% get_mon_resources %>% tail(1)
nCapacityNow <- dtLastRes$capacity # same result with get_capacity(simStore),
nQueueNow <- dtLastRes$queue # same result with get_queue_count(simStore)
print(dtLastRes) # prints empty data-frame !
return (5) # crashes here ! (eventually 5 will be replaced with more meaningful formula
}
trajClient <- trajectory("Client's path") %>%
log_("Arrived to cashier") %>%
set_capacity("Cashier", value = fUpdateNumberOfCashiers ) %>%
seize("Cashier") %>%
timeout(function() {rexp(1, 30)}) %>% # One Cashier processes 30 clients / hour
release("Cashier") %>%
log_(function(attr) { sprintf("In total spent %.2f", now(simStore) - attr["start_time"])})
simStore <- simmer("Store") %>%
add_resource("Cashier", 1) %>%
add_generator("Store Clients", trajClient, function() {rexp(1, 120)}) %>% # 120 clients / hour
run(until=nHoursObserved <- 1) ; simStore
See the discussion related to troubleshooting this problem here: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/simmer-devel/NgIikOpHpss
What causes the problem is that the other package (lubridate) masks objects from "simmer", as written seen below:
Attaching package: ‘lubridate’
The following objects are masked from ‘package:simmer’:
now, rollback
Once I replaced
library(simmer); library(lubridate);
with
library(lubridate); library(simmer);
The problem disappeared!

Resources