I'm downloading some kind of data from a webserver which limites the number of queries to 100 per hour.
Do you know an effective way to insert a time lag between r-script code lines to automatically run the script and gather the data after (approx.) 10 hours?
Many thanks in advance!
Related
I need create an empty loop that runs for a given time, for example 2 hours. The loop just runs for nothing, no matter what it does, it is important that it loads R executions for exactly 2 hours.
for example, let's have some kind of script
model=lm(Sepal.Length~Sepal.Width,data=iris)
after this row there is an empty loop that does something for exactly 2 hours
for i....
after the empty loop has completed via 2 hours, continue to execute subsequent rows
summary(model)
predict(model,iris)
(no matter what row, it is important that in a certain place of code the loop wasted for 2 hours)
How it can be done?
Thanks for your help.
There is no need to do this using a loop.
You can simply suspend all execution for n seconds by using Sys.sleep(n). So to suspend for 2 hours you can use Sys.sleep(2*60*60)
I am following this answer to calculate number of hot days in a year (temperature exceeding 35degC) from daily tmax data.
I am using tmax from CHELSA from 2000-2016, and I have crop it based on my bounding-box requirement.
Here are the steps I have done (example using 2001 data - 1 month 1 nc file):
Merge monthly data to annual: cdo mergetime chelsa_daily_2001*.nc chelsa_annual_2001.nc
Calculate hot days: cdo gec,308.15 chelsa_annual_2001.nc chelsa_hotdays_2001.nc The Chelsa's temperature is in Kelvin, so threshold for hot days is 308.15
Sum number of days in a year: cdo yearsum chelsa_hotdays_2001.nc chelsa_hotdays_yearsum_2001.nc
And below is the result and unfortunately not what I expected.
Why the number of days is not in integer? Did I missed something in the script?
UPDATE1 (following response from Adrian)
I have installed ncview via homebrew but unfortunately can't open. Got following error:
Note: could not open file /Users/xxx/.ncviewrc for reading
Error: Can't open display:
I try to open the nc output using QGIS, and the result still in float.
UPDATE2
Ok, I managed to check it using ncdump, and here's the first line that contains the value. Bit confuse, because I tried using 1 year data and the total is more than 365. How did it happen?
I strongly suspect panoply is performing some kind of spatial interpolation on the data on import.
Please take a look at the raw field directly using ncdump like this
ncdump chelsa_hotdays_yearsum_2001.nc | less
(I pipe to less so you can stroll down through the data). Or alternatively you can open the file in ncview and move the cursor over the data and you will see the field values displayed in the dialog box.
ncview chelsa_hotdays_yearsum_2001.nc
I have a bunch of sales opportunities in various excel files- broken down by region, type, etc.- that are one column each and simply list the dollar amounts of each opportunity. In R I have run a simulation to determine the likelihood of each opportunity closing with a sale or not, and repeated the simulation 100,000 times. I know that I can't pass the full results table back to Tableau because it has 100,000 rows- one total for each simulation- and the data I'm pulling into Tableau would just have the $ value of each opportunity so would only have a length of the number of opportunities of that type.
What I have in R is basically this first block of code; repeated a number of times with varying inputs and changing probabilities; then ultimately combine the totals vectors to get a quarter total vector.
APN<-ncol(APACPipelineNew)
APNSales<-matrix(rbinom(APN, 1, 0.033), 100000, APN)
APNSales<-sweep(APNSales,2,APACPipelineNew,'*')
APNTotals<-rowSums(APNSales)
...
Q1APACN<-APNTotals+ABNTotals+AFNTotals
...
Q1Total<-Q1APACT+Q1EMEAT+Q1NAMT
What I'd like to do is set this up as a dashboard in Tableau so that it can automatically update each week, but I'm not sure how to pass the simulation back into Tableau given the difference in length of the data.
Some suggestions:
For R you can use a windows scheduler to run a job at any given interval (or use the package taskscheduleR).
After you save the R data you can manually update your dashboard if it is on a desktop version (I do not know if you can schedule an extract refresh with a desktop dashboard).
However, if your dashboard lives on a tableau server you can schedule an extract refresh every week. Obviously, I would schedule the r update before the tableau extract refresh.
If you only wanted the data to update if there was a differing number of rows from the previous weekly run you can build that logic into R. Although saving the r data and refreshing the extract with the same data and number of rows should not cause any problems.
I need some help.
I have a script in R that is preforming tasks on a data frame that is either going to be 20148000, 4029600 or 50370000 rows. I have a machine that can handle preforming the tasks at these sizes in a couple minutes depending on the size of I select. However, I need to loop this 3002, 1501 or 1201 time respectively. That total run time is fine.
The problem I am having is at the end of every loop I need to export this huge data frame. When I use a write.csv() in R, it turns my run time for one iteration from 2 minutes to 15.5 minutes. Is there something more efficient than write.csv()??
I want a r function which makes my loop to run after evey 5 mins.
I have a loop that downloads market data from google finance.I want this loop to run in the interval of every 30 mins.
Is it possible?
An alternative to making your script loop: use an external job scheduling tool to call your script over the desired interval. If you have linux, I recommend checking out cron. Here's a SO response describing how to set up a cron job to kick off an R script: https://stackoverflow.com/a/10116439/819544
You can use Sys.sleep(100) to stop execution for 100 seconds. It's a little inefficient vs. running some other process in the same instance and setting up a proper timer. But it's pretty easy.