control dropbox in R without authentication every time? - r

Hi I wanna using R to check dropbox and get the latest file. Now using library(rdrop2)
library(rdrop2)
# drop_auth() # username/password for 1st time
drop.file = drop_dir('daily_export')
which1 = grepl("^daily_export/hat.*.gz$", drop.file$path) # files begin with hat and end with .gz
drop.file = drop.file[which1, ]
drop.file = drop.file[ drop.file==max(drop.file$path), 'path']# max file name indicates latest
drop_get(drop.file$path) #download to current folder
It works but when I restart R, drop_dir needs my authentication - I need click agree in the browser.
I want to automate and schedule the R code. So I'm wondering if there's a way to avoid authentication every time. - Ways using other tools are welcomed too. Thanks!

Related

Schedule a task (update data) each monday in Shiny

I have a dashboard living in a Shiny Server pro that shows different analysis. The data is coming from a long query that takes around 20 minutes to be completed.
In my current set up, I have a button that updates the data:
queries new data
transforms the data
saves the data in a file .RData
saves the data in a global object (using data <<-)
Just in case, outside the server and ui functions I have a statement that checks if data object exists. In case that does not exists, it reads the data from the .RData file instead of doing the query again.
Now I would like to update the data each Monday at 5:00pm (I do not want to open the app and push the button each Monday). I think that the best way to do it is using a cron job using cronR. The code will be located in the app.R outside the server and ui functions. Now I have the following questions:
If I am using Shiny server pro how many times, the app, will create the cron job if it is located in the app.R outside the server and ui functions?
How can I replace the object data in the shiny app? In such a way that if a user open the app on Monday after 5:00 pm the data will be in place, without the need of reading the .RData file and of course not doing the query again.
What is the best practice?
Just create your cron process with cronR completely outside the shiny application and make sure it saves your data to the correct place.
Create the R code which gets your data:
library(...)
# ...
# x <- mydata
save(x, file = "NewData.Rda")
Create the cron job:
cmd <- cron_rscript("path/to/getdata.R")
cron_add(cmd, frequency = 'daily', id = 'job5', at = '05:00')
I cant't see your point 1. The app will not create the cron job if it is not named "global.R" or "ui.R" or "server.R", I think. Also, you don't have to put your code under the /srv/shiny-server/ directory.
For your point 2., check the reactiveFileReader function from the shiny library. This function checks a file's last modified time and the file is re-read if changed
data <- reactiveFileReader(5*60*1000, filePath="NewData.Rda", readFunc = load)

Reading excel files from sharepoint folder in R

Currently I am building the automated process to clean and transform excel data from sharepoint using R. I have trouble reading excel files from sharepoint in R. I read a couple of posts (Accessing Excel file from Sharepoint with R, for instance), and tried a couple of suggestions, but none worked for me. The all error message are "Path" does not exist. Could someone give me some light for that?
I ran GET() and the link works:
r <- GET(url, authenticate("window_username","window_password",type="any"))
I run into the same issue using the following code to get the info from an excel on this sharepoint site with the same error as the one in the original question:
data <- read_excel(url)
Any feedback would be greatly appreciated.
To make access to SharePoint files easy you should sync the sites from the web app to File Explorer. Addresses for these cloud resources that have been synced are commonly of the form: C:\Users\username\My Org\My Teams Group - General\Project\My Excel.xlsx This can create a problem when the code is run multiple users. Whilst https addresses for cloud locations may work in File Explorer they do not work directly within R packages. If relative addresses don't work you can make the code user agnostic by setting the username as a variable or returning the homepath with Sys.getenv() function.
library(openxlsx)
username <- Sys.getenv("USERNAME")
sharepoint_address <- "/My Org/My Teams Group – General/Project/My Excel.xlsx"
df <- read.xlsx(xlsxFile = paste0("C:/Users/",username,sharepoint_address), sheet = "Raw Data”)
# More elegantly
df <- read.xlsx(xlsxFile = paste0(Sys.getenv("HOMEPATH"),sharepoint_address), sheet = "Raw Data”)

Use R package "googledrive" to load in R a file from my googledrive

I have a file in my google drive that is an xlsx. It is too big so it is not automatically converted to a googlesheet (that's why using googlesheets package did not work). The file is big and I can't even preview it through clicking on it on my googledrive. The only way to see it is to download is as an .xlsx . While I could load it as an xlsx file, I am trying instead to use the googledrive package.
So far what I have is:
library(googledrive)
drive_find(n_max = 50)
drive_download("filename_without_extension.xlsx",type = "xlsx")
but I got the following error:
'file' does not identify at least one Drive file.
Maybe it is me not specifying the path where the file lives in the Drive. For example : Work\Data\Project1\filename.xlsx
Could you give me an idea on how to load in R the file called filename.xlsx that is nested in the drive like that?
I read the documentation but couldn't figure out how to do that.Thanks in advance.
You should be able to do this by:
library(googledrive)
drive_download("~/Work/Data/Project1/filename.xlsx")
The type parameter is only for Google native spreadsheets, and does not apply to raw files.
I want to share my way.
I do this way because I keep on updating the xlsx file. It is a query result that comes from an ERP.
So, when I tried to do it by googleDrive Id, it gave me errors because each time the ERP update the file its Id change.
This is my context. Yours can be absolutely different. This file changes just 2 or three times at month. Even tough it is a "big" xlsx file (78-80K records with 19 factors), I use it for just seconds to calculate some values and then I can trash it. It does not have any sense to store it. (to store is more expensive than upload)
library(googledrive)
library(googlesheets4) # watch out: it is not the CRAN version yet 0.1.1.9000
drive_folder_owner<-"carlos.sxxx#xxxxxx.com" # this is my account in this gDrive folder.
drive_auth(email =drive_folder_owner) # previously authorized account
googlesheets4::sheets_auth(email =drive_folder_owner) # Yes, I know, should be the same, but they are different.
d1<-drive_find(pattern = "my_file.xlsx",type = drive_mime_type("xlsx")) # This is me finding the file created by the ERP, and I do shorten the search using the type
meta<-drive_get(id=d1$id)[["drive_resource"]] # Get the id from the file in googledrive
n_id<-glue("https://drive.google.com/open?id=",d1$id[[1]]) # here I am creating a path for reading
meta_name<- paste(getwd(),"/Files/",meta[[1]]$originalFilename,sep = "") # and a path to temporary save it.
drive_download(file=as_id(n_id),overwrite = TRUE, path = meta_name) # Now read and save locally.
V_CMV<-data.frame(read_xlsx(meta_name)) # store to data frame
file.remove(meta_name) # delete from R Server
rm(d1,n_id) # Delete temporary variables

"File is still in use": How to free all resources / force deletion of files?

Question: How to free all file handlers / connections R is using ? In Python, one could have a look which objects are still alive. Is there anything comparable in R?
Within a function, I create a directory with some files. At the end of the function, it should be deleted again. I am facing the problem that I am unable to delete the files, presumably because a file handler is still open. The example is with the MetaSKAT package, but I'm interested in a general solution. The example data can be found here: https://groups.google.com/group/skat_slee/attach/28a76339619d8358/Datasets.zip?part=4&authuser=0
# Code author: Seunggeun (Shawn) Lee
setwd('./Datasets')
foo <- function(dir.name) {
###### Preparation stuff ################################################
if (!require(MetaSKAT)) {install.packages('MetaSKAT'); require(MetaSKAT)}
dir.create(file.path('.',dir.name),showWarnings=F)
dir.path <- paste("./",dir.name,sep="")
file.copy(c("01.fam","01.bed", "01.bim", "01_3.SetID"), dir.path)
setwd(dir.path)
FAM<-read.table("01.fam", header=FALSE)
y<-FAM[,6]
N.Sample<-length(y)
x1<-rnorm(N.Sample)
x2<-rbinom(N.Sample,1, 0.5)
obj <-SKAT_Null_Model(y~cbind(x1, x2))
re <-Generate_Meta_Files(obj, "01.bed", "01.bim", "01_3.SetID", "01.MSSD", "01.MInfo", N.Sample)
###### Problem #######################################################
print(file.remove(list.files(), force = T)) # problem: cannot delete
# curiously, sometimes there is 1, sometimes 2 False...
###### my different tries to solve it ################################
rm(re)
closeAllConnections()
sink.number() # shows 0
rm(list = ls())
gc()
###### problem is still there ######################################
print(file.remove(list.files()))
setwd('..')
# print(unlink(dir.path, recursive = T)) # I want finally delete the directory
}
debug(foo)
foo("temp2")
I am using R Studio. Even if a try to delete it manually in Windows while R is still open, it tells me that the file is be used by a program. I can only delete it after I closed R.
So how can I force R to free these files? I will try to solve the problem at the root and look at the source code of Generate_Meta_Files(), but I thought there must be a global function in R which forces to free everything (Note: I am well aware that it does not make sense to create the files and delete it directly afterwards, it's just an example.)
Edit: After a hint, I tried it under Linux. It turns out that though it shows me that there was a problem with deletion of one (of the 6) files, all is properly deleted, hence I guess this is a windows-specific problem. Any hints what this is?

Retain valid workspace reference after project transfer.

I've been working on a R project (projectA) that I want to hand over to a colleague, what would be the best way to handle workspace references in the scripts? To illustrate, let's say projectA consists of several R scripts that each read input and write output to certain directories (dirs). All dirs are contained within my local dropbox. The I/O part of the scripts look as follows:
# Script 1.
# Give input and output names and dirs:
dat1Dir <- "D:/Dropbox/ProjectA/source1/"
dat1In <- "foo1.asc"
dat2Dir <- "D:/Dropbox/ProjectA/source2/"
dat2In <- "foo2.asc"
outDir <- "D:/Dropbox/ProjectA/output1/"
outName <- "fooOut1.asc"
# Read data
setwd(dat1Dir)
dat1 <- read.table(dat1In)
setwd(dat2Dir)
dat2 <- read.table(dat2In)
# do stuff with dat1 and dat2 that result in new data foo
# Write new data foo to file
setwd(outDir)
write.table(foo, outName)
# Script 2.
# Give input and output names and dirs
dat1Dir <- "D:/Dropbox/ProjectA/output1/"
dat1In <- "fooOut1.asc"
outDir <- "D:/Dropbox/ProjectA/output2/"
outName <- "fooOut2.asc"
Etc. Each script reads and write data from/to file and subsequent scripts read the output of previous scripts. The question is: how can I ensure that the directory-strings remain valid after transfer to another user?
Let's say we copy the ProjectA folder, including subfolders, to another PC, where it is stored at, e.g., C:/Users/foo/my documents/. Ideally, I would have a function FindDir() that finds the location of the lowest common folder in the project, here "ProjectA", so that I can replace every directory string with:
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
So that:
# At my own PC
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
> "D:/Dropbox/ProjectA/source1/"
# At my colleagues PC
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
> "C:Users/foo/my documents/ProjectA/source1/"
Or perhaps there is a different way? Our work IT infrastructure currently does not allow using a shared disc. I'll put helper-functions in an 'official' R project (ie, hosted on R forge), but I'd like to use scripts when many I/O parameters are required and because the code can easily be viewed and commented.
Many thanks in advance!
You should be able to do this by using relative directory paths. This is what I do for my R projects that I have in Dropbox and that I edit/run on both my Windows and OS X machines where the Dropbox folder is D:/Dropbox and /Users/robin/Dropbox respectively.
To do this, you'll need to
Set the current working directory in R (either in the first line of your script, or interactively at the console before running), using setwd('/Users/robin/Dropbox;) (see the full docs for that command).
Change your paths to relative paths, which mean they just have the bit of the path from the current directory, in this case the 'ProjectA/source1' bit if you've set your current directory to your Dropbox folder, or just 'source1' if you've set your current directory to the ProjectA folder (which is a better idea).
Then everything should just work!
You may also be interested in an R library that I love called ProjectTemplate - it gives you really nice functionality for making self-contained projects for this sort of work in R, and they're entirely reproducible, moveable between computers and so on. I've written an introductory blog post which may be useful.

Resources