Schedule R Script Every 15 min (cronR) - r

I need to schedule/run an R script every 15 min. Running on an AWS RStudio instance.
I have played a bit with 'cronR', including loading the add-in. I can figure out how to get it to run "minutely", "hourly", etc...but not every 15-min.
What's the best way to get this done...either in RStudio via cronR or alternative, or via some other method?

So I followed #r2evans advice and opened an issue on git. It was addressed almost immediately with a fix to the code and an update to the readme. Figured I would answer this for completeness in case someone else ever finds their way here...
You can now schedule a job every 15 min with 'cronR' either in the RStudio add-in, or with the following code:
cron_add(script, frequency = '*/15 * * * *', id = 'Job ID', description = 'Every 15 min')
One note is that you might need to reinstall the package using devtools to push through the most recent changes:
devtools::install_github("bnosac/cronR")

Related

Notification when a R process is finished [iTerm2]

I'm running R within a HPC and I'd like to be notified when a long running process is finished (within R terminal, not as a Rscript).
I know the way to do it is through triggers but I have no idea how to... any idea, please?
It's simpler what I thought, but as anyone didn't answer, here there's my (posible) solution:
Write a simple function called fin(): fin <- function(){print("fin")}
Then go to Preferences > Profiles > Advanced > Triggers and add a new trigger based on function's output: Regular expression = \[1\] "fin", Action = Post Notification... et voilá!

R get_followers() more than 75,000 not possible, even with retryonratelimit = TRUE

I want to download all followers of a Twitter user with the function get_followers from the rtweet package.
With an older version of the rtweet package, this worked without any problems, although it took a long time.
Since I updated the package (current version is 1.0.2), I only get the first 75,000 followers, even though I set the retryonratelimit-option to TRUE.
The function downloads the first 75,000 followers, then waits 15 minutes and then ends the download process without any message.
Here you can see my example code:
library(rtweet)
# I have authenticated myself with auth_setup_default()
df_follower <- get_followers("CDU",n = 800000,retryonratelimit = TRUE)
> df_follower
# A tibble: 75,000 × 2
from_id to_id
<chr> <chr>
Can someone explain me where the problem is and how I can download all followers?
Many thanks in advance!
Sorry, this is a bug fixed in the devel branch of the package (I think you asked in the issue tracker https://github.com/ropensci/rtweet/issues/732).
In the next release this will be fixed.
In the rtweet version 1.1 this was fixed.

TCGABiolinks: GDCprepare never terminates and crashes

I recently started using TCGAbiolinks to process some gene expression from the TCGA database. All I need to do is download the data into an R file, and there are many examples online. However, every time I try the example codes, it crashes my R workspace and sometimes my PC entirely.
Here's the code I'm using:
library(TCGAbiolinks)
queryLUAD <- GDCquery(project = "TCGA-LUAD",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
sample.type = "Primary Tumor",
legacy = FALSE,
workflow.type = "HTSeq - FPKM-UQ"
)
GGDCdownload(queryLUAD)
LUADRNAseq <- GDCprepare(queryLUAD,
save = TRUE,
save.filename = "LUAD.R")
As you can see, it's very simple and (as far as I can tell, identical) to examples like this one.
When I run this code, it downloads fully (I've checked the folder with the files). Then, I run GDCprepare. The progress bar starts and goes to 100%. Then, the command never terminates eventually either RStudio or my machine crashes.
Here's the terminal output:
> GDCdownload(queryLUAD)
Downloading data for project TCGA-LUAD
Of the 533 files for download 533 already exist.
All samples have been already downloaded
> LUADRNAseq <- GDCprepare(queryLUAD,
+ save = TRUE,
+ save.filename = "LUAD.R")
|==============================================================================================|100% Completed after 13 s
Although it says completed, it never does. To solve this, I've tried reinstalling TCGAbiolinks, updating R to the latest version, and even running it on an entirely different machine (a Mac instead of Windows). I've tried other datasets ("LUSC") and got the exact same behavior. Nothing has solved the issue, and I haven't found this issue mentioned anywhere online.
I am sincerely grateful for any and all advice on why this is happening and how I can fix it.
Experienced exactly the same problem. Tried a variety of things, and noticed it doesn't crash when dataset has less than 100 samples or running with "summarizedExperiment = FALSE" for dataset less than 300 samples.
I am facing the same issue here. Looks like there is some kind of a memory leak happening because my RAM usage goes to 100%. I managed to "GDCprepare" 500 samples without crashing with ~64GB RAM but even after finishing, the memory is still occupied by the R session, even if I try garbage collection and removing everything in the environment.
I didn't have this issue with TCGAbiolinks around a year ago...

running all examples in r package

I am developing a package in Rstudio. Many of my examples need updating so I am going through each one. The only way to check the examples is by running devtools::check() but of course this runs all the checks and it takes a while.
Is there a way of just running the examples so I don't have to wait?
Try the following code to run all examples
devtools::run_examples()
You can also do this without devtools, admittedly it's a bit more circuitous.
package = "rgl"
# this gives a key-value mapping of the various `\alias{}`es
# in each Rd file to that file's canonical name
aliases <- readRDS(system.file("help", "aliases.rds", package=package))
# or sapply(unique(aliases), example, package=package, character.only=TRUE),
# but I think the for loop is superior in this case.
for (topic in unique(aliases)) example(topic, package=package, character.only = TRUE)

Update Package Automatically at Start-up

I find it annoying that I have to click Tools -> Update Packages every time I load RStudio. I could use update.packages(c("ggplot2")) for instance to update my packages in .RProfile, but the issue is that it won't look for other packages (dependencies). For instance, I have to update "seriation" and "digest" package every time I start RStudio, and these packages are not loaded by me at start-up. Does anyone have code to automatically check and update all packages at start-up ? If so, can you please share here? I extensively googled this topic and searched through SO, and it seems that popular opinion is to use RStudio's menu. Here's the thread I am referring to: How to update R2jags in R?
One way I can think of doing this is in .RProfile:
a<-installed.packages()
b<-data.frame(a[,1])
and then calling this function: https://gist.github.com/stevenworthington/3178163
However, I am not quite sure whether this is the most optimal method.
Another linked thread is: Load package at start-up
I created the thread above.
I'd appreciate any thoughts.
i found this on internet(don't remember where) when i was struggling with the same problem, though you still need to run this program . Hope this helps .
all.packages <- installed.packages()
r.version <- paste(version[['major']], '.', version[['minor']], sep = '')
for (i in 1:nrow(all.packages))
{
package.name <- all.packages[i, 1]
package.version <- all.packages[i, 3]
if (package.version != r.version)
{
print(paste('Installing', package.name))
install.packages(package.name)
}
}

Resources