Is there a way to set break time for each vehicle. A code using SetBreakIntervalsOfVehicle doesn't work and results in solver hanging.
for v in range(data['num_vehicles']):
vehicle_break = data['breaks'][v]
break_intervals[v]=
[routing.solver().FixedDurationIntervalVar(
15,100,vehicle_break[0],vehicle_break[1], 'Break for vehicle {}'.format(v))]
time_dimension.SetBreakIntervalsOfVehicle(break_intervals[v], v, node_visit_transit)
Related
I have to retrieve a large dataset from a web API (NCBI entrez) that limits me to a certain number of requests per second, say 10 (the example code will limit you to three without an API key). I'm using furrr's future_* functions to parallelize the requests to get them as quickly as possible, like this:
library(tidyverse)
library(rentrez)
library(furrr)
plan(multiprocess)
api_key <- "<api key>"
# this will return a crap-ton of results
srch <- entrez_search("nuccore", "Homo sapiens", use_history=T, api_key=api_key)
total <- srch$count
per_request <- 500 # get 500 records per parallel request
nrequest <- total %/% per_request + as.logical(total %% per_request)
result <- future_map(seq(nrequest),function(x) {
rstart <- (x - 1) * per_request
return(entrez_fetch(
"nuccore",
web_history = srch$web_history,
rettype="fasta",
retmode="xml",
retstart=rstart,
retmax=per_request,
api_key=api_key
))
}
Obviously for cases where nrequest > 10 (or whatever the limit is), we will immediately run afoul of the rate limiting.
I see two seemingly obvious simple solutions to this, both of which seem to work.
One is to introduce a random short delay before making the request, like so:
future_map(seq(nrequest),function(x) {
Sys.sleep(runif(1,0,5))
# ...do the request...
}
The second is to limit the number of concurrent requests to the rate limit, either by plan(multiprocess,workers=<max_concurrent_requests>) or by using the semaphore package with a semaphore set to the rate limit, like this:
# this sort of assumes individual requests take long enough to cause
# a wait for the semaphore to be long enough
# for this case, they do
rate_limit <- 10
lock = semaphore(rate_limit)
result <- future_map(seq(nrequest),function(x) {
rstart <- (x - 1) * per_request
acquire(lock)
s <- entrez_fetch(
"nuccore",
web_history = srch$web_history,
rettype="fasta",
retmode="xml",
retstart=rstart,
retmax=per_request,
api_key=api_key
)
release(lock)
return(s)
}
However, what I would really like to be able to do is limit the request rate rather than the number of concurrent requests. There's a great post by Quentin Pradet on how to do this using async io http requests in python. I made an attempt to adapt this to R, but ran into the problem that any variable shared across threads/processes in the future_* function is copied rather than actually shared, and thus modifications (even if protected by semaphore lock) are not shared among threads/processes, so it's not possible to implement the counter bucket we rely on for this method to work!
Is there a clever way to rate-limit parallel requests without necessarily capping the number of simultaneous requests? Or am I overthinking this and should just stick to limiting the number?
I am running a query in a loop for each store in a dataframe. Typically there are 70 or so stores so the loop repeats that many times for each complete loop.
Maybe 75% of the time this loop works all the way through with no errors.
About 25% of the time I get the following error during any one of the loop iterations:
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached
Then I have to figure out which iteration bombed, and repeat the loop excluding iterations that completed successfully.
I can't find anything on the web to help me understand what is causing this seemingly random error. Perhaps it is a BQ technical issue? There does not seem to be any relation to the size of the result set it crashes on.
Here is the part of my code that does the loop...again it works all the way through most of the time. The cartesian product across IDs is intentional, as I want every combination of each Test ID with all possible Control IDs within store.
sql<-"SELECT pstore as store, max(pretrips) as pretrips FROM analytics.campaign_ids
group by 1 order by 1"
store_maxtrips<-query_exec(sql,project=project, max_pages = 1)
store_maxtrips
for (i in 1:length(store_maxtrips$store)) {
#pull back all ids shopping in same primary store as each test ID with their pre metrics
sql<-paste("SELECT a.pstore as pstore, a.id as test_id,
b.id as ctl_id,
(abs(a.zpbsales-b.zpbsales)*",wt_pb_sales,")+(abs(a.zcatsales-b.zcatsales)*",wt_cat_sales,")+
(abs(a.zsales-b.zsales)*",wt_retail_sales,")+(abs(a.ztrips-b.ztrips)*",wt_retail_trips,") as zscore
FROM analytics.campaign_ids a inner join analytics.pre_zscores b
on a.pstore=b.pstore
where a.id<>b.id and a.pstore=",store_maxtrips$store[i]," order by a.pstore, a.id, zscore")
print(paste("processing store",store_maxtrips$store[i]))
query_exec(sql,project=project,destination_table = "analytics.campaign_matches",
write_disposition = "WRITE_APPEND", max_pages = 1)
}
Solved!
It turns out I was using query_exec, but I should have been using insert_query_job since I do not want to retrieve any results. The errors were all happening in the course of R trying to retrieve results from BigQuery which I didn't want anyhow.
By using insert_query_job + wait_for(job) in my loop instead of the query_exec command, it eliminated all issues with the loop finishing.
I did also need to add a try() function to help circumvent some rare errors that still popped up with this approach. Thanks to MarkeD for this tip. So my final solution looked like this:
try(job<-insert_query_job(sql,project=project,destination_table = "analytics.campaign_matches",
write_disposition = "WRITE_APPEND"))
wait_for(job)
Thanks to everyone who commented and helped me research the issue.
I have the following aggregation rule:
abc.prod.ALL.<service>.<metric>.count (60) = sum abc.local.*.<service>.<<metric>>.count
Given metrics like:
abc.prod.host1.aservice.ametric.count
abc.prod.host2.aservice.ametric.count
I would expect them to be aggregated to
abc.prod.ALL.aservice.ametric.count
But that metric is never created. In aggregator logs, I see
Allocating new metric buffer for abc.prod.ALL.aservice.ametric.count
but it's not created. If I add a layer to the generated metric like:
abc.prod.extralayer.ALL.<service>.<metric>.count (60) = sum abc.local.*.<service>.<<metric>>.count
then we seem to get a recursive explosion of created metrics like:
abc.prod.extralayer.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.ALL.ALL.aservice.ametric.count
Which led me to believe that the generated metric is then aggregated again...
I added a logging line to AggregationProcessor.process:
else:
log.clients("Found aggregate " + aggregate_metric + " for " + metric)
aggregate_metrics.add(aggregate_metric)
And then tried with my original, desired rule.. and I eventually started to see, loglines like:
Found aggregate abc.prod.ALL.aservice.ametric.count for abc.prod.ALL.aservice.ametric.count
It matched itself as if it was a new incoming metric... Why is it being fed back into the aggregator?
This appears to have been a bug. It was not in older version but was in master at the time of my question.
If you are seeing this behaviour, follow the issue on GitHub:
https://github.com/graphite-project/carbon/issues/560
https://github.com/graphite-project/carbon/issues/455
There is no point in continuing the question here on SO.
Note: I am using the older version, 0.9.15 and not seeing the problem - so I recommend this until it is confirmed to not be resolved in master.
I am trying to write a manual rate-limiting function for the rgithub package. So far this is what I have:
library(rgithub)
pull <- function(i){
commits <- get.pull.request.commits(owner = owner, repo = repo, id = i, ctx = get.github.context(), per_page=100)
links <- digest_header_links(commits)
number_of_pages <- links[2,]$page
if (number_of_pages != 0)
try_default(for (n in 1:number_of_pages){
if (as.integer(commits$headers$`x-ratelimit-remaining`) < 5)
Sys.sleep(as.integer(commits$headers$`x-ratelimit-reset`)-as.POSIXct(Sys.time()) %>% as.integer())
else
get.pull.request.commits(owner = owner, repo = repo, id = i, ctx = get.github.context(), per_page=100, page = n)
}, default = NULL)
else
return(commits)
}
list <- c(500, 501, 502)
pull_lists <- lapply(list, pull)
The intention i that if the x-ratelimit-remaining variable goes below a certain threshold the script should wait until the time specified in x-ratelimit-reset has passed, and then continue the script. However, I'm not sure if this is the actual behavior of the if else set up that I have here.
The function runs fine, but I have some doubts about whether it actually does the rate limiting or whether it somehow skips that steps. Hence I ask: a) how can I find out if it actually does rate-limiting, and b) if not, how can I rewrite it so that it actually does rate limiting? Would a while condition/loop perhaps be better?
You can test if it does the rate limiting changing 5 to a large enough number and adding a display of the timing of Sys.sleep using:
print(system.time(Sys.sleep(...)))
That said, the function seems ok to me, unfortunately I cannot test it easily as rgithub is not available for my version of R (3.1.3).
Not a canonical answer, but some working example.
You should add some logging in your script, even kind of write.csv(append=TRUE).
I've implemented automatic antiddos process which prevent your ip to be banned by the exchange market. You can find it jangorecki/Rbitcoin/R/utils.R.
Rbitcoin.last_api_call is env object stored in package namespace, kind of session package cache.
This can help you with setting it in your package.
You should also consider a optional parallel supported version. Linking to database with concurrency read. My function can be easy modified to queue call and recheck timing every X seconds.
Edit
I forget to add that mentioned function support multiple source systems. That allows for example to extend your rgithub for bitbucket, etc. and still effectively manage API rate limiting.
I am writing the code for a population MCMC. I will try to provide as much information I think could help, so please bear with me.
I am using tempered distributions and I want to perform exchange moves, i.e a moves that propose to swap the value of two chains.
What I have done (exchange happening in the master)
I had written this initially by
letting each chain mutate for a specified number of iterations n.
at every n-th iteration, I would send the slaves' results to the master and there attempt an exchange of parameters between chains.
Then send updated values back to slaves and repeat the process.
What I want to achieve (exchange directly between slaves)
This is working fine, but I wanted to clean up my code and remove unnecessary communication between master and slave . That is, let the slaves communicate directly between them.
So assuming I am spawning 10 slaves,
at iteration n, I want to let slave1-slave2, slave3-slave4, ...., slave9-slave10 communicate between them
at iteration 2*n, I want to let slave2-slave3, slave4-slave5, ...slave8-slave9 communicate between them
and so on, so that I let samples travel through the temperature ladder.
The problem
And this is where I am facing a problem.
I think I am managing to send a value from one slave to another (my print statement "Succesfully sent" gets printed in the log files of the slaves that are sending) but this doesn't seem to be received (my "Succesfully received" statement doesn't get printed in the log of the partner slaves).
And the program just hangs. I think maybe I have caused a deadlock, but I am not sure what I have done wrong?
Could you please advise? I have used as guide this Parallel Tempering R code
http://www.lindonslog.com/mathematics/parallel-tempering-r-rmpi/
Please see below my code
Many thanks!
Sofia
ind <- mpi.comm.rank()
oddFlag<-0 ### object to flag code suitable for odd/even numbered slaves.
for (i in 1:TotalIter) {
##### normal MCMC move (single chain mutation) - logL.current
if ( i%%exchangeInterval == 0 ){ ### every nth (right now 5th) iteration, attempt an exchange
message("\n\nAttempt an exchange move")
oddFlag<-oddFlag+1
exchange<-0
logL.partner<-0
if (ind%%2 == oddFlag%%2) { ###when oddFlag even , the following code concerns even-numbered slaves. When odd number, it concerns odd-numbered slaves.
ind.partner<-ind+1
if (0<ind.partner && ind.partner<(noChains+1)){
message("This is the slave: ", ind, " and its partner is: ", ind.partner)
message("The tag for receiving logL.partner is: ", ind.partner)
logL.partner<-mpi.recv.Robj(source=ind.partner,tag=ind.partner) #### receive the logL of partner
message("Succesfully received")
message("This is the logL.partner: ", logL.partner)
exchanges.attempted<-exchanges.attempted+1
if (runif(1)< min(1, exp((logL.partner - estimatorSelf)*(temper[ind] - temper[ind.partner] )))) { ############# exp((chain2 - chain1)*(T1 - T2))
message("I exchanged the values")
exchange<-1
print(exchange)
exchanges.accepted<-exchanges.accepted+1
}
mpi.send.Robj(obj=exchange,dest=ind.partner,tag=15*ind)
}
if (exchange==1){
### exchange parameters with mpi.send.Robj/mpi.recv.Robj functions
}
} else { ##### ###when oddFlag even , the following code concerns odd-numbered slaves. For oddFlag odd, it concerns even-numbered slaves.
ind.partner<-ind-1
if (0<ind.partner && ind.partner<(noChains+1)){
message("This is the slave: ", ind, " and its partner is: ", ind.partner)
message("The tag for sending logL.current is: ", ind)
mpi.send.Robj(obj=logL.current,dest=ind.partner,tag=ind) ### send logL to partner
message("Succesfully sent")
exchange<-mpi.recv.Robj(source=ind.partner, tag=15*ind.partner)
message("I received the exchange message")
}
if (exchange==1){
### exchange parameters send/receive functions
}
}
}
}
This seems very strange. To my knowledge the send and receive commands are locking so for the send to work without the receive to work strikes me as a bit odd. Have you tried the code on the guide? If you like I can send you the entire R-code just to see if it works on your system. If the code on the guide works then you can safely say that it isn't a problem with your MPI setup.
One thing you might try is for the receive command instead of using ind.partner you could use mpi.any.source() (i think is the correct one) which will accept a message of any tag. If this resolves your deadlock, it could be a problem with the tags (but by eye there doesn't seem anything wrong to me).
Another thing you might try is removing the "source=" and the "tag=" on the receive commands. I notice in my code that I don't have any there whilst I do for the send commands. Perhaps that was causing me too problems, but I can't quite remember. Let me know how it goes, I hope it all works out.