simmer: reading resources from inside trajectory functions - r

I want to be able to modify the resource capacity inside trajectory as a function of queue length.
The following (simplified) code below does not work. - When I try to call get_mon_resources(simStore) inside the function, the code crashes with the error:
Error in run_(private$sim_obj, until) :
Expecting a single value: [extent=0].
Thank you for your help.
simStore <- simmer()
fUpdateNumberOfCashiers <- function() {
dtLastRes <- simStore %>% get_mon_resources %>% tail(1)
nCapacityNow <- dtLastRes$capacity # same result with get_capacity(simStore),
nQueueNow <- dtLastRes$queue # same result with get_queue_count(simStore)
print(dtLastRes) # prints empty data-frame !
return (5) # crashes here ! (eventually 5 will be replaced with more meaningful formula
}
trajClient <- trajectory("Client's path") %>%
log_("Arrived to cashier") %>%
set_capacity("Cashier", value = fUpdateNumberOfCashiers ) %>%
seize("Cashier") %>%
timeout(function() {rexp(1, 30)}) %>% # One Cashier processes 30 clients / hour
release("Cashier") %>%
log_(function(attr) { sprintf("In total spent %.2f", now(simStore) - attr["start_time"])})
simStore <- simmer("Store") %>%
add_resource("Cashier", 1) %>%
add_generator("Store Clients", trajClient, function() {rexp(1, 120)}) %>% # 120 clients / hour
run(until=nHoursObserved <- 1) ; simStore

See the discussion related to troubleshooting this problem here: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/simmer-devel/NgIikOpHpss
What causes the problem is that the other package (lubridate) masks objects from "simmer", as written seen below:
Attaching package: ‘lubridate’
The following objects are masked from ‘package:simmer’:
now, rollback
Once I replaced
library(simmer); library(lubridate);
with
library(lubridate); library(simmer);
The problem disappeared!

Related

Choose command order in a function based on an error [R]

I have three files in a folder with the following names:
./multiqc_data$ ls
file1.json
file2.json
file3.json
When I open the files with the TidyMultiqc package existing NA values in the files might lead to the following error:
files <- dir(path,pattern = "*.json") #locate files
files %>%
map(~ load_multiqc(file.path(path, .))) #parse them
## the error
Error in parse_con(txt, bigint_as_char) :
lexical error: invalid char in json text.
"mapped_failed_pct": NaN, "paired in
(right here) ------^
I want to create a function to handle this error.
I want every time this error pops up to be able to apply this sed function in all files of the folder.
system(paste("gsed -i 's/NaN/null/g'",paste0(path,"*.json")))
Any ideas how can I achieve this
You could use this wrapper :
safe_load_multiqc <- function(path, file) {
tryCatch(load_multiqc(file.path(path, file)), error = function(e) {
system(paste("gsed -i 's/NaN/null/g'",paste0(path,"*.json")))
# retry
load_multiqc(path, file)
})
}
A good way to handle errors in work pipelines like that is using restarts and withCallingHandlers and withRestarts.
You establish the condition handlers and the recovery protocols (restarts) then you can choose what protocols to use and in which order. Calling handlers allows a much finer control on error conditions than common try-catch.
In the example, I wrote two handlers: removeNaNs (works at folder level) and skipFile (works at file level), if the first fails, the second is executed (simply skipping the file). Of course is an example
I think in your case you can simply run sed in every case, nevertheless, I hope this answer meet your looking for a canonical way
Inspiration and Extra lecture: Beyond Exception Handling: Conditions and Restarts
path <- "../your_path"
# function that does the error_prone task
do_task <- function(path){
files <- dir(path,pattern = "*.json") #locate files
files %>%
map(~ withRestart( # set an alternative restart
load_multiqc(file.path(path, .)), # parsing
skipFile = function() { # if fails, skip only this file
message(paste("skipping ", file.path(path, .)))
return(NULL)
}))
}
# error handler that invokes "removeNaN"
removeNaNHandler <- function(e) tryInvokeRestart("removeNaN")
# error handler that invokes "skipFile"
skipFileHandler <- function(e) tryInvokeRestart("skipFile")
# run the task with handlers in case of error
withCallingHandlers(
condition = removeNaNHandler, # call handler (on generic error)
# condition = skipFileHandler, # if previous fails skips file
{
# run with recovery protocols (can define more than one)
withRestarts({
do_task(path)},
removeNaN = function() # protocol "removeNaN"
{
system(paste("gsed -i 's/NaN/null/g'",paste0(path,"*.json")))
do_task(path) # try again
}
)
}
)
Based on this open github issue, a potential solution provided by Peter Diakumis is to use RJSONIO::fromJSON() in place of jsonlite::read_json(). You could adapt this solution to your use-case by e.g. creating your own load_multiqc() function:
library(RJSONIO)
load_multiqc_bugfix <- function(paths,
plots = NULL,
find_metadata = function(...) {
list()
},
plot_parsers = list(),
sections = "general") {
assertthat::assert_that(all(sections %in% c(
"general", "plot", "raw"
)), msg = "Only 'general', 'plot' and 'raw' (and combinations of those) are valid items for the sections parameter")
# Vectorised over paths
paths %>%
purrr::map_dfr(function(path) {
parsed <- RJSONIO::fromJSON(path)
# The main data is plots/general/raw
main_data <- sections %>%
purrr::map(~ switch(.,
general = parse_general(parsed),
raw = parse_raw(parsed),
plot = parse_plots(parsed, plots = plots, plot_parsers = plot_parsers)
)) %>%
purrr::reduce(~ purrr::list_merge(.x, !!!.y), .init = list()) %>%
purrr::imap(~ purrr::list_merge(.x, metadata.sample_id = .y))
# Metadata is defined by a user function
metadata <- parse_metadata(parsed = parsed, samples = names(main_data), find_metadata = find_metadata)
purrr::list_merge(metadata, !!!main_data) %>%
dplyr::bind_rows()
}) %>%
# Only arrange the columns if we have at least 1 column
`if`(
# Move the columns into the order: metadata, general, plot, raw
ncol(.) > 0,
(.) %>%
dplyr::relocate(dplyr::starts_with("raw")) %>%
dplyr::relocate(dplyr::starts_with("plot")) %>%
dplyr::relocate(dplyr::starts_with("general")) %>%
dplyr::relocate(dplyr::starts_with("metadata")) %>%
# Always put the sample ID at the start
dplyr::relocate(metadata.sample_id),
.
)
}

curl error (Could not resolve host: NA) while scraping in a loop

While this code for scraping prices from a webshop has worked perfectly fine for me over the last months, today I just got the following error message:
Error in curl::curl_fetch_memory(url, handle = handle) :
Could not resolve host: NA
The code i use is as follows:
This part is for getting the full url's:
#Scrape Galaxus
vec_galaxus<-vector()
i=0
input_galaxus <- input %>%
filter(`Galaxus Artikel`!=0)
input_galaxus2<-paste0('https://www.galaxus.ch/',input_galaxus$`Galaxus Artikel`)
This is the scraping loop:
sess <- session(input_galaxus2[1]) #to start the session
for (j in input_galaxus2){
sess <- sess %>% session_jump_to(j) #jump to URL
i=i+1
try(vec_galaxus[i] <- read_html(sess) %>% #can read direct from sess
html_nodes('.sc-1aeovxo-1.gvrGle') %>%
html_text()%>%
str_extract("[0-9]+") %>%
as.integer())
Sys.sleep(runif(1, min=0.2, max=0.5))
}
where part of my input "input_galaxus2" looks like this:
c("https://www.galaxus.ch/15758734", "https://www.galaxus.ch/7362734",
"https://www.galaxus.ch/12073455", "https://www.galaxus.ch/20841274",
"https://www.galaxus.ch/20589944 ", "https://www.galaxus.ch/13595276",
"https://www.galaxus.ch/16255768", "https://www.galaxus.ch/6296373",
"https://www.galaxus.ch/14513900", "https://www.galaxus.ch/14465626",
"https://www.galaxus.ch/10592707", "https://www.galaxus.ch/19958785",
"https://www.galaxus.ch/9858343", "https://www.galaxus.ch/14513913")
Does anybody know why suddenly this code gives me the above error message?
Thanks in advance for your responses!
If it were a different error, I'd think it was throttling, but this error does not really support that. However, to troubleshoot that (and you hitting too-many-hits limits on the server), try introducing a delay between pulls, perhaps a few seconds or a minute, just to see if that resolves things.
Here's a method that will allow to you repeat your code until all URLs are pulled without error. Note that this may also need the "delay" I suggested above in order to not anger the server admins on the remote end (or firewall or whatever).
Create a list in which we'll store the results. Run this code only once, all the remaining bullets in the list should be repeatable without consequence.
out <- vector("list", length(input_galaxus2))
Prep the session. This may be repeatable depending on if you have authentication or other attributes.
sess <- session(input_galaxus2[1]) #to start the session
Iterate over the empty elements of your URLs and query as needed. If you get any errors, feel free to wait a little bit and re-run this code. If a particular URL succeeded, it will not be re-attempted, so repeat as needed, eventually (assuming the failures are intermittent and all URLs are value) you will get all results.
I don't think you need read_html in this pipe, but I'm not testing for fear of "slashdotting" the website. The point of this answer is to suggest a mechanism that allows you to reattempt efficiently.
empties <- which(sapply(out, is.null))
for (i in empties) {
res <- tryCatch({
sess %>%
session_jump_to(input_galaxus2[i]) %>%
html_nodes('.sc-1aeovxo-1.gvrGle') %>%
html_text() %>%
str_extract("[0-9]+") %>%
as.integer()
}, error = function(e) e)
if (inherits(res, "error")) {
warning(sprintf("failed (%i, %s): %s", i, input_galaxus2[i], conditionMessage(e)))
# optional
Sys.sleep(3)
} else out[[i]] <- res
}
Note: this assumes that a NULL value means the previous attempt failed, was interrupted, or ... was not attempted. If NULL can be a valid and successful return value from your pull, then you should likely prefill out with some other "canary" value: choose something that you are more confident will "never" appear in real results, and change how you define empties above.
Using purrr::map instead of loop, without any Sys.sleep().
library(tidyverse)
library(rvest)
df <- tibble(
links = c("https://www.galaxus.ch/15758734", "https://www.galaxus.ch/7362734",
"https://www.galaxus.ch/12073455", "https://www.galaxus.ch/20841274",
"https://www.galaxus.ch/20589944 ", "https://www.galaxus.ch/13595276",
"https://www.galaxus.ch/16255768", "https://www.galaxus.ch/6296373",
"https://www.galaxus.ch/14513900", "https://www.galaxus.ch/14465626",
"https://www.galaxus.ch/10592707", "https://www.galaxus.ch/19958785",
"https://www.galaxus.ch/9858343", "https://www.galaxus.ch/14513913")
)
get_prices <- function(link) {
link %>%
read_html() %>%
html_nodes(".sc-1aeovxo-1.gvrGle") %>%
html_text2() %>%
str_remove_all("–")
}
df %>%
mutate(price= map(links, get_prices) %>%
as.numeric)
# A tibble: 14 × 2
links price
<chr> <dbl>
1 "https://www.galaxus.ch/15758734" 17.8
2 "https://www.galaxus.ch/7362734" 500.
3 "https://www.galaxus.ch/12073455" 173
4 "https://www.galaxus.ch/20841274" 112
5 "https://www.galaxus.ch/20589944 " 25.4
6 "https://www.galaxus.ch/13595276" 313
7 "https://www.galaxus.ch/16255768" 40
8 "https://www.galaxus.ch/6296373" 62.9
9 "https://www.galaxus.ch/14513900" 539
10 "https://www.galaxus.ch/14465626" 466.
11 "https://www.galaxus.ch/10592707" 63.5
12 "https://www.galaxus.ch/19958785" NA
13 "https://www.galaxus.ch/9858343" 7.3
14 "https://www.galaxus.ch/14513913" 617

How can I fix this issue in R with webscraping?

I am trying to pull across data from within over 800 links and putting it onto a table. I have tried using chrome selector gadget but cannot work out how to get it to loop. I must have spent 40 hours and keep getting error codes. I need to pull the same information from li:nth-child(8) , li:nth-child(8) strong and another couple text boxes of information. I have tried following a YouTube video and I just changed the names and links but otherwise maintained consistency and it just will not work.
library(tidyverse)
library(rvest)
library(htmltools)
library(xml2)
library(dplyr)
results <- read_html("https://www.artemis.bm/deal-directory/")
issuers <- results %>% html_nodes("#table-deal a") %>% html_text()
url <- results %>% html_nodes("#table-deal a") %>% html_attr("href")
get_modelling = function(url_link) {
issuer_page = read_html(url_link)
modelling = issuer_page %>% html_nodes("#info-box li:nth-child(4)") %>%
html_text()
return(modelling)
}
issuer_modelling = sapply(url, FUN = get_modelling)
I get these issues:
Warning message:
In for (i in seq_along(specs)) { :
closing unused connection 4 (https://www.artemis.bm/deal-directory/bellemeade-re-2022-1-ltd/)
Called from: open.connection(x, "rb")
Browse[1]> data.table::data.table(placement = unlist(issue_placement))[,.N, placement]
Error during wrapup: object 'issue_placement' not found
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Browse[1]> c
> data.table::data.table(placement = unlist(issue_placement))[,.N, placement]
Error in unlist(issue_placement) : object 'issue_placement' not found
We can use simple for loop,
#create empty vector
df = c()
for(i in head(url)){
dd = i %>% read_html() %>% html_nodes("#info-box li:nth-child(4)") %>%
html_text()
df = c(dd, df)
}
df
[1] "Risk modelling / calculation agents etc: AIR Worldwide" "Risk modelling / calculation agents etc: AIR Worldwide"
[3] "Risk modelling / calculation agents etc: RMS" "Risk modelling / calculation agents etc: AIR Worldwide"
[5] "Risk modelling / calculation agents etc: AIR Worldwide" "Risk modelling / calculation agents etc: AIR Worldwide"

Error in n() inside Summarise Function dplyr

everything good?
During that week I spent time writing a script that even this morning seemed to work. but then I tried to run it again and exactly in a part that uses the function "summarize" of the package dplyr appears an error that I had never seen.
Below is an excerpt of the code I used and the error on the console:
library(tidyverse)
a <- c(1,0,1,1,0,1,1,1,1,0,0)
b <-c( 0.9157101,
0.4854955,
0.8853174,
0.4373646,
0.3855175,
0.8603407,
0.9193342,
0.4693117,
0.9849855,
0.4458159,
0.4379776)
c <- c(8,2,7,1,0,6,8,1,9,1,1)
treated_data <- data.frame(Risk = a ,
Model_Predicted = b,
Grupo = c)
calculo <- treated_data %>% group_by(Grupo) %>% summarise(Quantidade = n(),
Non_event = sum(Risk),
Event = n() - sum(Risk))
Console Result:
---------------------------------------------------------
Error in n() : argument "vec" is missing, with no default
---------------------------------------------------------

Why do I get this error using biomod2:response.plot2, and is it important? Error in ncol(dat_) : could not find function "ncol"

When I run the example for the response.plot2 function (biomod2 package) I get the above error. The code produces some plots but does not save an object
Here's the example (including the code that I ran): https://www.rdocumentation.org/packages/biomod2/versions/3.3-7.1/topics/response.plot2
)
[edit:]
The source code for the function response.plot2 is here:
https://r-forge.r-project.org/scm/viewvc.php/checkout/pkg/biomod2/R/response.plot.R?revision=728&root=biomod
It includes these lines:
.as.ggdat.1D <-
function (rp.dat)
{
# requireNamespace('dplyr')
out_ <- bind_rows(lapply(rp.dat, function(dat_) {
dat_$id <- rownames(dat_)
id.col.id <- which(colnames(dat_) == "id")
expl.dat_ <- dat_ %>% dplyr::select(1, id.col.id) %>%
tidyr::gather("expl.name", "expl.val", 1)
pred.dat_ <- dat_ %>% dplyr::select(-1, id.col.id) %>%
tidyr::gather("pred.name", "pred.val", (1:(ncol(dat_)-2)))
out.dat_ <- dplyr::full_join(expl.dat_, pred.dat_)
out.dat_$expl.name <- as.character(out.dat_$expl.name)
out.dat_$pred.name <- as.character(out.dat_$pred.name)
return(out.dat_)
}))
out_$expl.name <- factor(out_$expl.name, levels = unique(out_$expl.name))
return(out_)
}
I tried changing ncol(dat_) to base::ncol(dat_) and then running the whole lot to redefine the function response.plot2 for my R session, but I got a different error message:
Error in base::ncol : could not find function "::"

Resources