Preamble
Inspired by the presentation by Barret Schloerke at studio::global(2021) (I will add the link as soon as it becomes available), I tried to implement an app to see the differences between using {future}, {plumber}, both or none into a Shiny app running a sequence of fast-slow-slow-fast computations (both on distinct output and in a sequence within the same one).
Gist
You can find here my attempt, including the app.R Shiny app and the plumber.R APIs.
Results
execution selected (5 s for "slow")
result
expectation
comments
Standard run
~20 seconds before anything appear, next everything appears in the same moment
~20 seconds, appearing sequentially somehow
Why did not appear sequentially?
{future} only
~5 seconds before anything appear, next everything appears in the same moment
~20 with "first fast" and "second fast" appearing almost immediatly and next (~5) "first slow" or "second slow," next (~10) the other, and finally (~20) "Sequential")
I would expect that something similar to what happened here for the combined type of run... how is possible that "sequential" completed in 5 seconds???
{plumber} only
same as the standard run (r should remain busy until each API call would be resolved, right?)
same time (~20) but appearing sequentially some how
why shiny rendered everything the same time?
{future} and {plumber}
same as the standard run
I do not expect this at all!, What I expected here is to have "_fast"s appearing immediately, "_slow"s quite the same time after ~5 seconds, and "sequential" after ~10 seconds from the start (i.e., ~10 seconds overall)
I am totally confused here :-(
Doubts
One of the main things I did not understand is why when activating {future} (both with or without {plumber}), "first fast" does not appear immediately. And in general, why the output does not appear in a single sequence when {future} is not involved. And how is it possible that with {future} alone, "sequential" stay ~5 seconds?
So, clearly, I made something the wrong way, and I do not understand something correctly.
Questions
Can someone help me understand where/what (and maybe try to infer "why") I made the app wrong, and/or the API wrong, or their interaction wrong?
Thank you,
Corrado.
Related
Very new to R and trying to modify a script to help my end users.
Every week a group of files are produced and my modified script, reaches out to the network, makes the necessary changes and puts the files back, all nice and tidy. However, every quarter, there is a second set of files, that needs the EXACT same transformation completed. My thoughts were to check if the files exist on the network with a file.exists statement and then run through script and then continue with the normal weekly one, but my limited experience can only think of writing it this way (lots of stuff is a couple hundred lines)and I'm sure there's something I can do other than double the size of the program:
if file.exists("quarterly.txt"){
do lots of stuff}
else{
do lots of stuff}
Both starja and lemonlin were correct, my solution was to basically turn my program into a function and just create a program that calls the function with each dataset. I also skipped the 'else' portion of my if statement, which works perfectly (for me).
I have a code that runs a monte carlo simulation between A/B/C, and I modified it so now it will run a MC sim between A/B/C/D. However, the code works fine with A/B/C, but if I add the D category the code just dies. It brings up the UI that it is supposed to but when I click 'compute' (action button) the program stops doing anything. it doesn't close, but it just doesn't do anything.
I am wondering if the issue isn't necessarily the code but what I am asking the code to do. An MCMC with 100k samples, 4 conditions becomes 100000^4 and things get computationally heavy very quickly.
I am hoping for a recommendation that someone may have where I can see what the R code is doing in sort of a 'live stream'. It doesn't return any errors, but I am hoping to be able to see when I add the D category the code will say 'I'm working on it, but it's taking a long time.' But I don't even know where to begin to look for something like that.
I have 2 moderate-size datasets that I am using in R. I want to check one dataset if its referenece number matches with the reference numbers in the other dataset and if so, allot a column in the second dataset which contains the value present in the column in the other dataset.
ghi2$state=ifelse(b1$accntnumber %in% ghi2$referencenumber,b1$address,0)
Every time I am running this code, my RStudio hangs up and is unresponsive for a long time. Is it because its taking the time to process the command or is my command wrong.
I am using a 2GB RAM system so I think R hangs up. Should I use the == operator instead of %in%? Would I get the same result?
1. Should I use the == operator instead of %in%?
No (!). See #2.
2. Would I get the same result?
No. The order and position have to match with ==. Also, see #Akrun's comment.
3. How to make it faster and/or deal with RStudio freezing
If RStudio freezes you can save your log file info, send it to the RStudio team who will quickly respond, and also you could bring your log files here for help.
Beyond that, general Big Data rules apply. Here are some tips:
Try data.table
Try it on the command line instead of RStudio
Watch your Resource Monitor (or whatever you use to monitor resources) and observe the memory and CPU usage
If it's a RAM issue you can
a. use a cloud account to get more RAM
b. buy some more RAM (just sayin')
c. use 64-bit R and increase the RAM available to R to its max if it's not already
If it's a CPU issue you can consider parallelization
If any of these ID's are being repeated (and this makes sense in the context of your specific use-case) you can use unique to avoid redundant comparisons
There are lots of other tips you can find in pre-existing Big Data Q&A's on SO as well.
I am using the following snippet to show the relative time in a Modular Large complication.
textTemplate.body1TextProvider = [CLKRelativeDateTextProvider
textProviderWithDate:timeOfEntry
style:CLKRelativeDateStyleNatural
units:(NSCalendarUnitMinute)];
Usually (but not always), the relative time is shown correctly the first time, something like,
42 MIN (with plenty of room for more text on the same line)
...but soon after, updates appear as show in the screenshot below,
42 M... (again, with plenty of room for more text on the same line)
As shown, it it prematurely truncated with ellipsis and always after the first letter of the time unit (M.. for minutes, H.. for hours).
The body2 line is empty, should it want to overflow (I've even tried setting it to nil, and #"", just to make absolutely sure of that).
The problem appears on the simulator (38mm and 42mm), and on my actual 38mm watch.
If this was a watchos2 bug, I'd expect it to be obvious and fixed by now.
Anyone else seeing this, or know the solution?
Thanks.
I have an Rscript being called from a java program. The purpose of the script is to automatically generate a bunch of graphs in ggplot and them splat them on a pdf. It has grown somewhat large with maybe 30 graphs each of which are called from their own scripts.
The input is a tab delimited file from 5-20mb but the R session goes up to 12gb of ram usage sometimes (on a mac 10.68 btw but this will be run on all platforms).
I have read about how to look at the memory size of objects and nothing is ever over 25mb and even if it deep copies everything for every function and every filter step it shouldn't get close to this level.
I have also tried gc() to no avail. If I do gcinfo(TRUE) then gc() it tells me that it is using something like 38mb of ram. But the activity monitor goes up to 12gb and things slow down presumably due to paging on the hd.
I tried calling it via a bash script in which I did ulimit -v 800000 but no good.
What else can I do?
In the process of making assignments R will always make temporary copies, sometimes more than one or even two. Each temporary assignment will require contiguous memory for the full size of the allocated object. So the usual advice is to plan to have _at_least_ three time the amount of contiguous _memory available. This means you also need to be concerned about how many other non-R programs are competing for system resources as well as being aware of how you memory is being use by R. You should try to restart your computer, run only R, and see if you get success.
An input file of 20mb might expand quite a bit (8 bytes per double, and perhaps more per character element in your vectors) depending on what the structure of the file is. The pdf file object will also take quite a bit of space if you are plotting each point within a large file.
My experience is not the same as others who have commented. I do issue gc() before doing memory intensive operations. You should offer code and describe what you mean by "no good". Are you getting errors or observing the use of virtual memory ... or what?
I apologize for not posting a more comprehensive description with code. It was fairly long as was the input. But the responses I got here were still quite helpful. Here is how I mostly fixed my problem.
I had a variable number of columns which, with some outliers got very numerous. But I didn't need the extreme outliers, so I just excluded them and cut off those extra columns. This alone decreased the memory usage greatly. I hadn't looked at the virtual memory usage before but sometimes it was as high as 200gb lol. This brought it down to up to 2gb.
Each graph was created in its own function. So I rearranged the code such that every graph was first generated, then printed to pdf, then rm(graphname).
Futher, I had many loops in which I was creating new columns in data frames. Instead of doing this, I just created vectors not attached to data frames in these calculations. This actually had the benefit of greatly simplifying some of the code.
Then after not adding columns to the existing dataframes and instead making column vectors it reduced it to 400mb. While this is still more than I would expect it to use, it is well within my restrictions. My users are all in my company so I have some control over what computers it gets run on.