save rdata environment with timestamp - r

I am trying to loop a code every 24h to download and process data from a database that updates every day. I would like it to save automatically with the date.
This is the line I'm using to save the environment.
save.image("~/Bus data/leeds bus live/timetable data/itm_gtfs/yorkshire.data.RData")
How can I modify it so it captures the current date as well?
Thanks

you can paste() them together. I also prefer myself to set wd a command before having to repeat the path
setwd("~/Bus data/leeds bus live/timetable data/itm_gtfs")
save.image(paste(Sys.date(),"yorkshire.data.RData", sep = "."))

Related

if(nrow(results) > 0), save with file name to include date and time, and email file as an attachment using R

I have an R script that runs an SQL query and saves the output in xlxs format with the date and time in the file name. Hence, the file names in the directory are unique. Whenever the query result is greater than 0, I would also like to email the file as an attachment. What is the simplest way to accomplish this?
Since the file names are unique, is attaching the newest file in the directory the best or only way to do this? I am able to identify the newest file with:
df <- file.info(list.files("\\\\file\\path\\", full.name = T))
rownames (df) [which.max(df$mtime)]
However, I don't know how to successfully combine it with attached.file =. Is that even possible or is there a better way?

program a GET function to repeat over time, saving data with different names

I am working with an API that updates their data every 5 seconds. I want my code to repeat every minute to get different files with data updated data. Making it loop is not enough, I need to get each file to be saved with a different name (data1, data2, data3...). I have not been able to find a good way of doing that.
url<-(API URL)
res=GET(url, authenticate("username", "password"))
result<-xmlParse(file = res)
node<-xmlRoot(result)
That's the code I need to replicate and save in different files, so I would need the names of res, result and node to change every time the code is ran by the timed loop.
You may use the current date and time as filename while writing the data to csv so that the file is unique everytime.
url<-(API URL)
res=GET(url, authenticate("username", "password"))
result<-xmlParse(file = res)
node<-xmlRoot(result)
write.csv(node, sprintf('file_%s.csv', Sys.time()), row.names = FALSE)
You may use cron to run this script at any interval.

Schedule a task (update data) each monday in Shiny

I have a dashboard living in a Shiny Server pro that shows different analysis. The data is coming from a long query that takes around 20 minutes to be completed.
In my current set up, I have a button that updates the data:
queries new data
transforms the data
saves the data in a file .RData
saves the data in a global object (using data <<-)
Just in case, outside the server and ui functions I have a statement that checks if data object exists. In case that does not exists, it reads the data from the .RData file instead of doing the query again.
Now I would like to update the data each Monday at 5:00pm (I do not want to open the app and push the button each Monday). I think that the best way to do it is using a cron job using cronR. The code will be located in the app.R outside the server and ui functions. Now I have the following questions:
If I am using Shiny server pro how many times, the app, will create the cron job if it is located in the app.R outside the server and ui functions?
How can I replace the object data in the shiny app? In such a way that if a user open the app on Monday after 5:00 pm the data will be in place, without the need of reading the .RData file and of course not doing the query again.
What is the best practice?
Just create your cron process with cronR completely outside the shiny application and make sure it saves your data to the correct place.
Create the R code which gets your data:
library(...)
# ...
# x <- mydata
save(x, file = "NewData.Rda")
Create the cron job:
cmd <- cron_rscript("path/to/getdata.R")
cron_add(cmd, frequency = 'daily', id = 'job5', at = '05:00')
I cant't see your point 1. The app will not create the cron job if it is not named "global.R" or "ui.R" or "server.R", I think. Also, you don't have to put your code under the /srv/shiny-server/ directory.
For your point 2., check the reactiveFileReader function from the shiny library. This function checks a file's last modified time and the file is re-read if changed
data <- reactiveFileReader(5*60*1000, filePath="NewData.Rda", readFunc = load)

automatic update of filename for read.csv in r

I run a monthly data import process in R, using something similar to this:
Data <- read.csv("c:/Data/March 2018 Data.csv")
However, I want to fully automate the process and, hence, find a way to change the date of the file being uploaded, in this case 'March 2018', using a variable from a lookup table. This lookup table is changed every month externally and the Date variable, which indicates the month of production, is updated during this.
I've tried to use paste() function, but didn't get very far:
Data <- read.csv(paste("C:/Data Folder",Date,"Data.csv"))
Keeps saying "No such file or directoryError in file". I've checked the file names and path are fine. The only issue I'm detecting is the code line in the directory appears like this
'c:/Data folder/ March 2018 Data.csv'
I'm not sure if that extra 'space' is the issue
Any ideas?
Thanks to both bobbel and jalazbe for this solution
I used paste0()
Data <- read.csv(paste0("c/Date folder/",Date,"Data.csv"))

Append new data to an existing dataframe (RDS) in R

I have an Rscript that is reading in a constant stream of data in the form of a flat file. Another script picks up this flat file, does some parsing and processing, then saves the result as a data.frame in RDS format. It then sleeps, and repeats the process.
saveRDS(tmp.df, file="H:/Documents/tweet.df.rds") #saving the data.frame
On the second... nth iteration, I have the code only process the new lines added to the flat file since the previous iteration. However, in order to append the delta lines to the permanent data frame, I have to read it in, append, and then save it back out, overwriting the original.
df2 <- readRDS("H:/Documents/tweet.df.rds") #read in permanent
tmp.df2 <- rbind(df2, tmp.df) #append new to existing
saveRDS(tmp.df2, file="H:/Documents/tweet.df.rds") #save it
rm(df2) #housecleaning
rm(tmp.df2) #housecleaning
This approach is risky because whenever the RDS is open for reading/writing, another process wanting to touch that file has to wait. As the base file gets bigger, the risk increases.
Is there something like an appendRDS (I know literally there isn't) that can achieve what I want- iterative updating of a single data frame- saved to a file- that uses appending rather than complete replacement?
I think you can safeguard your process by using connections, opening and closing it before the next process takes over.
con <- file("tmp.rds")
open(con)
df <- readRDS(con)
df.new <- rbind(df,df)
saveRDS(df.new, con)
close(con)
Update:
You can test if a connection to the file is open and tell it to wait for a bit if you're having problems with concurrency.
while(is.Open(con)) { # untested but something of this nature should work
sys.Sleep(2)
}
Is there anything wrong with using a series of numbered RDS files in a directory instead of a single RDS file? I don't think is is possible to append to a data frame an an RDS file without rewriting the entire file, since data frames are simply lists of columns, so presumably they are serialized one column at a time, so only the last column ends near the end of the file.
If you want to stick with a single file but minimize the risk of reading inconsistent data from a RDS file, you can read it in, do the append operation, and then write it out to a temp file and rename the temp file to the original name once it is finished. Then at least your period of risk is not dependent on the size of the file. I'm not familiar with what kind of atomicity is guaranteed by various filesystems when renaming a file to an existing name, but it's probably better than the time taken by saveRDS.

Resources