Load a csv file in R and share in all sessions? - r

I have a shiny application that uses a csv file to generate different figures. I upload my application in my personal linux server using shiny-server.
I use this structure for my application
global.R
ui.R
server.R
Inside my global.R file I have this line, which help me to load and read my csv file
df <- read_csv("../Desktop/covid_2021-02-15.csv")
But my application is very slow, I read that objects in the global.R script, are read only one time and are share in all sessions.
Is there other way to load this data frame to have a more efficient application?

In addition to the comments from Gregor and HubertL:
To load large CSV files can slow down. I had the same issue and changed to r binary file (rds) with saveRDS() and readRDS(). As a first step you can try rds file and see if the issue is solved.
To check whether there is a performance difference you can use system.time(). Returns the time taken to evaluate any R expression.
In your case:
df <- read_csv("../Desktop/covid_2021-02-15.csv")
# Save an object to a file
saveRDS(df, file = "my_data.rds")
# Restore the object
readRDS(file = "my_data.rds")

Related

Vemco Acoustic Telemetry Data (vrl files) in R

Does anyone know a good way to read .vrl files from Vemco acoustic telemetry receivers directly into r as an object. Converting .vrl files to .csv files in the program VUE prior to analyzing the data in r seems like a waste of time if there is a way to bring them in directly. My internet searches have not turned up anything that worked for me.
I figured out a way using glatos to convert all .vrl files to .csv and then reading the .csv files in and binding them.
glatos has to be installed from github.
Convert all .vrl files to .csv files using vrl2csv. The help page has info on finding the path for vueExePath
library(glatos)
vrl2csv(vrl = "VRLFileInput",outDir = "VRLFilesToCSV", vueExePath = "C:/Program Files (x86)/VEMCO/VUE")
The following will pull in all .csv files in the output folder from vrl2csv and rbind them together. I had to add the paste0 function to create the full file path for each .csv in the list.
library(data.table)
AllDetections <- do.call(rbind, lapply(paste0("VRLFilesToCSV/", list.files(path = "VRLFilesToCSV")), read.csv))

regarding reading data information saved in a rda file

For a .rda file, after loading it, except viewing it using Rtudio window, are there any functions that can list all the data information stored in this rda file?
When I want to see what's in an .rda file without loading it into my current environment, I load it into a temp env:
e <- new.env(parent = emptyenv())
load("path/to/file.rda", envir=e)
ls(e) # shows names of variables stored in it
ls.str(e) # shows a `str` presentation of all variables within it
Not the most efficient way, as it requires that you load the contents before listing them. I don't think it's easy to look at a raw rda file on-disk and know its contents without loading it in some fashion.

Saving H2o data frame

I am working with 10GB training data frame. I use H2o library for faster computation. Each time I load the dataset, I should convert the data frame into H2o object which is taking so much time. Is there a way to store the converted H2o object ? (so that i can skip the as.H2o(trainingset) step each time I make trails on building models )
After the first transformation with as.h2o(trainingset) you can export / save the file to disk and later import it again.
my_h2o_training_file <- as.h2o(trainingset)
path <- "whatever/my/path/is"
h2o.exportFile(my_h2o_training_file , path = path)
And when you want to load it use either h2o.importFile or h2o.importFolder. See the function help for correct usage.
Or save the file as csv / txt before you transform it with as.h2o and load it directly into h2o with one of the above functions.
as.h2o(d) works like this (even when client and server are the same machine):
In R, export d to a csv file in a temp location
Call h2o.uploadFile() which does an HTTP POST to the server, then a single-threaded import.
Returns the handle from that import
Deletes the temp csv file it made.
Instead, prepare your data in advance somewhere(*), then use h2o.importFile() (See http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.importFile.html). This saves messing around with the local file, and it can also do a parallelized read and import.
*: For speediest results, the "somewhere" should be as close to the server as possible. For it to work at all, the "somewhere" has to be somewhere the server can see. If client and server are the same machine, then that is automatic. At the other extreme, if your server is a cluster of machines in an AWS data centre on another continent, then putting the data into S3 works well. You can also put it on HDFS, or on a web server.
See http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/importing-data.html for some examples in both R and Python.

R Can I use .rds files for my data in a package?

I'm trying to convert some code into a package. According to the documentation, only .RData files should be in the data directory, but I'd rather use .rds files because they don't retain the file name. There are times when I save with a different name than I want to use when reading in later. And I really only want to have one data set for file, so the ability of .RData files to store more is actually a negative.
So my question is why not allow .rds files in the package data directory? Or is there another way to solve this problem?
The only acceptable data files in /data are those saved with 'save', which means they are in the .RData format. Hadley's link, which #r2evans points to, says this. As does section 1.1.6, which #rawr points to.
Old question - but you can. It is a two step process.
save your data as .rds file
create an R file in the data directory which loads the rds data.
I am doing this as followed:
rdsFile <- paste0(schemeName, "_example.rds")
saveRDS(
dmdScheme_example,
file = here::here( "data", rdsFile )
)
cat(
paste0(schemeName, "_example <- readRDS(\"./", rdsFile, "\")"),
file = here::here( "data", paste0(schemeName, "_example.R") )
)

Editing a .r file from within another .r file

I am trying to make my current project reproducible, and so am creating a master document (eventually a .rmd file) that will be used to call and execute several other documents. This way myself and other investigators only need to open and run one file.
There are three layers to the current setup: master file, 2 read-in files, 2 databases. The master file calls the read-in files using source(), and the read-in files parse the .csv databases and apply labels.
The read-in files and the databases are generated automatically with the data management software I'm currently using (REDCap) each time I download the updated data.
However, the read-in files have a line of code that removes all of the objects in my environment. I would like to edit the read-in files directly from the master file so that I do not have to open the read-in files individually each time I run my report. Specifically, since all the read-in files are the same, I would like to remove line #2 in each.
I've tried searching Google, and tried file.edit(), but have been unable to find anything. Not even sure it is possible, but figured I would ask. Let me know if I can improve this question or if you need any additional code to answer it. Thanks!
Current relevant master code (edited for generality):
source("read-in1")
source("read-in2")
Current relevant read-in file code (same in each file, except for the database name):
#Clear existing data and graphics
rm(list=ls())
graphics.off()
#Load Hmisc library
library(Hmisc)
#Read Data
data=read.csv('database.csv')
#Setting Labels
[read-in code truncated]
Additional details:
OS: Windows 7 Professional x86
R version: 3.1.3
R Studio version: 0.99.441
You might try readLines() and something like the following (which was simplified greatly by a suggestion from #Hong Ooi below):
eval(parse(readLines("read-in1.R")[-2]))
My original solution which was much more pedantic:
f <- file("read-in1.R", open="r")
t <- readLines(f)
close(f)
for (l in t[-2]) { eval(parse(text=l)) }
The for() loop just parses and evaluates each line from the text file except for the second one (that's what the -2 index value does). If you're reading and writing longer files then the following will be much faster than the second option, however still less preferable than #Hong Ooi's:
f <- file("read-in1.R", open="r")
t <- readLines(f)
close(f)
f <- file("out.R", open="w")
o <- writeLines(t[-2], f)
close(f)
source("out.R")
Sorry I'm so late in noticing this question, but you may want to investigate getting access the the REDCap API and using either the redcapAPI package or the REDCapR package. Both of those packages will allow you to export the data from REDCap and directly into R without having to use the download scripts. redcapAPI will even apply all the formats and dates (REDCapR might do this now too. It was in the plan, but I haven't used it in a while).
You could try this. It just calls some shell commands: (1) renames the file, then (2) copies all lines not containing rm(list=ls()) to a new file with the same name as the original file, then (3) removes the copy.
files_to_change <- c("read-in1.R", "read-in2.R")
for (f in files_to_change) {
old <- paste0(f, ".old")
system(paste("cmd.exe /c ren", f, old))
system(paste("cmd.exe /c findstr /v rm(list=ls())", old, ">", f))
system(paste("cmd.exe /c rm", old))
}
After calling this loop you should have
#Clear existing data and graphics
graphics.off()
#Load Hmisc library
library(Hmisc)
#Read Data
data=read.csv('database.csv')
#Setting Labels
in your read-in*.R files. You could put this in a batch script
#echo off
ren "%~f1" "%~nx1.old"
findstr /v "rm(list=ls())" "%~f1.old" > "%~f1"
rm "%~nx1.old"
say, "example.bat", and call that in the same way using system.

Resources