R extension write local data - r

I am creating a package and would like to store settings data locally, since it is unique for each user of the package and so that the setting does not have to be set each time the package is loaded.
How can I do this in the best way?

You could save your necessary data in an object and save it using saveRDS()
whenever a change it made or when user is leaving or giving command for saving.
It saves the R object as it is under a file name in the specified path.
saveRDS(<obj>, "path/to/filename.rds")
And you can load it next time when package is starting using loadRDS().
The good thing of loadRDS() is that you can assign a new name to the obj. (So you don't have to remember its old obj name. However the old obj name is also loaded with the object and will eventually pollute your namespace.
newly.assigned.name <- loadRDS("path/to/filename.rds")
# or also possible:
loadRDS("path/to/filename.rds") # and use its old name
Where to store
Windows
Maybe here:
You can use %systemdrive%%homepath% environment variable to accomplish
this.
The two command variables when concatenated gives you the desired
user's home directory path as below:
Running echo %systemdrive% on command prompt gives:
C:
Running echo %homepath% on command prompt gives:
\Users\
When used together it becomes:
C:\Users\
Linux/OsX
Either in the package location of the user,
path.to.package <- find.package("name.of.your.pacakge",
lib.loc = NULL, quiet = FALSE,
verbose = getOption("verbose"))
# and then construct with
destination.folder.path <- file.path(path.to.package,
"subfoldername", "filename")`
# the path to the final destination
# You should use `file.path()` to construct such paths, because it detects automatically the correct ('/' or '\') separators for the file paths in Unix-derived systems (Linux/Mac Os X) versus Windows.
Or use the $HOME variable of the user and there in a file - the name of which beginning with "." - this is convention in Unix-systems (Linux/Mac OS X) for such kind of file which save configurations of software programs.
e.g. ".your-packages-name.rds".
If anybody has a better solution, please help!

Related

Attempting to access images using R

So I am following the guide here which indicates the way to access photos is as follows:
flags <- c(
system.file("img", "flag", "au.png", package = "ggpattern"),
system.file("img", "flag", "dk.png", package = "ggpattern")
)
My goal is to now use this code for my own uses, so I saved a few images in a folder. Here is my directory:
"C:/Users/Thom/Docs/Misc/Testy"
And within the Testy folder, there is a folder called image, holding 3 images. But the following doesn't seem to work and idk why...
images <- c(
system.file("image", "image1.png", package = "ggpattern"),
system.file("image", "image2.png", package = "ggpattern")
)
system.file is for use when a file included in a package. Basically, it will look for the file starting its search path to where your R packages are installed (because this can vary from user to user). system.file will return the resolved path to the file locally
If you already know the absolute path on your local computer (i.e. "C:/Users/Thom/Docs/Misc/Testy") you can use that as just the input to a read function, e.g. readBin("C:/Users/Thom/Docs/Misc/Testy")
If you want to get a little fancy or are like me and can't ever remember which direction of a slash to use on which OS, you can also do something like this which will add in the OS specific path separator:
readBin(file.path("C:", "Users", "Thom", "Docs", "Misc", "Testy"))

Use R package "googledrive" to load in R a file from my googledrive

I have a file in my google drive that is an xlsx. It is too big so it is not automatically converted to a googlesheet (that's why using googlesheets package did not work). The file is big and I can't even preview it through clicking on it on my googledrive. The only way to see it is to download is as an .xlsx . While I could load it as an xlsx file, I am trying instead to use the googledrive package.
So far what I have is:
library(googledrive)
drive_find(n_max = 50)
drive_download("filename_without_extension.xlsx",type = "xlsx")
but I got the following error:
'file' does not identify at least one Drive file.
Maybe it is me not specifying the path where the file lives in the Drive. For example : Work\Data\Project1\filename.xlsx
Could you give me an idea on how to load in R the file called filename.xlsx that is nested in the drive like that?
I read the documentation but couldn't figure out how to do that.Thanks in advance.
You should be able to do this by:
library(googledrive)
drive_download("~/Work/Data/Project1/filename.xlsx")
The type parameter is only for Google native spreadsheets, and does not apply to raw files.
I want to share my way.
I do this way because I keep on updating the xlsx file. It is a query result that comes from an ERP.
So, when I tried to do it by googleDrive Id, it gave me errors because each time the ERP update the file its Id change.
This is my context. Yours can be absolutely different. This file changes just 2 or three times at month. Even tough it is a "big" xlsx file (78-80K records with 19 factors), I use it for just seconds to calculate some values and then I can trash it. It does not have any sense to store it. (to store is more expensive than upload)
library(googledrive)
library(googlesheets4) # watch out: it is not the CRAN version yet 0.1.1.9000
drive_folder_owner<-"carlos.sxxx#xxxxxx.com" # this is my account in this gDrive folder.
drive_auth(email =drive_folder_owner) # previously authorized account
googlesheets4::sheets_auth(email =drive_folder_owner) # Yes, I know, should be the same, but they are different.
d1<-drive_find(pattern = "my_file.xlsx",type = drive_mime_type("xlsx")) # This is me finding the file created by the ERP, and I do shorten the search using the type
meta<-drive_get(id=d1$id)[["drive_resource"]] # Get the id from the file in googledrive
n_id<-glue("https://drive.google.com/open?id=",d1$id[[1]]) # here I am creating a path for reading
meta_name<- paste(getwd(),"/Files/",meta[[1]]$originalFilename,sep = "") # and a path to temporary save it.
drive_download(file=as_id(n_id),overwrite = TRUE, path = meta_name) # Now read and save locally.
V_CMV<-data.frame(read_xlsx(meta_name)) # store to data frame
file.remove(meta_name) # delete from R Server
rm(d1,n_id) # Delete temporary variables

R: possible truncation of >= 4GB file

I have a 370MB zip file and the content is a 4.2GB csv file.
I did:
unzip("year2015.zip", exdir = "csv_folder")
And I got this message:
1: In unzip("year2015.zip", exdir = "csv_folder") :
possible truncation of >= 4GB file
Have you experienced that before? How did you solve it?
I agree with #Sixiang.Hu's answer, R's unzip() won't work reliably with files greater than 4GB.
To get at how did you solve it?: I've tried a few different tricks with it, and in my experience the result of anything using R's built-ins is (almost) invariably an incorrect identification of the end-of-file (EOF) marker before the actual end of the file.
I deal with this issue in a set of files I process on a nightly basis, and to deal with it consistently and in an automated fashion, I wrote the function below to wrap the UNIX unzip. This is basically what you're doing with system(unzip()), but gives you a bit more flexibility in its behavior, and allows you to check for errors more systematically.
decompress_file <- function(directory, file, .file_cache = FALSE) {
if (.file_cache == TRUE) {
print("decompression skipped")
} else {
# Set working directory for decompression
# simplifies unzip directory location behavior
wd <- getwd()
setwd(directory)
# Run decompression
decompression <-
system2("unzip",
args = c("-o", # include override flag
file),
stdout = TRUE)
# uncomment to delete archive once decompressed
# file.remove(file)
# Reset working directory
setwd(wd); rm(wd)
# Test for success criteria
# change the search depending on
# your implementation
if (grepl("Warning message", tail(decompression, 1))) {
print(decompression)
}
}
}
Notes:
The function does a few things, which I like and recommend:
uses system2 over system because the documentation says "system2 is a more portable and flexible interface than system"
separates the directory and file arguments, and moves the working directory to the directory argument; depending on your system, unzip (or your choice of decompression tool) gets really finicky about decompressing archives outside the working directory
it's not pure, but resetting the working directory is a nice step toward the function having fewer side effects
you can technically do it without this, but in my experience it's easier to make the function more verbose than have to deal with generating filepaths and remembering unzip CLI flags
I set it to use the -o flag to automatically overwrite when rerun, but you could supply any number of arguments
includes a .file_cache argument which allows you to skip decompression
this comes in handy if you're testing a process which runs on the decompressed file, since 4GB+ files tend to take some time to decompress
commented out in this instance, but if you know you don't need the archive after decompressing, you can remove it inline
the system2 command redirects the stdout to decompression, a character vector
an if + grepl check at the end looks for warnings in the stdout, and prints the stdout if it finds that expression
Checking ?unzip, found the following comment in Note:
It does have some support for bzip2 compression and > 2GB zip files
(but not >= 4GB files pre-compression contained in a zip file: like
many builds of unzip it may truncate these, in R's case with a warning
if possible).
You can try to unzip it outside of R (using 7-Zip for example).
To add to the list of possible solutions, in case you have Java (JDK) available on your machine, you can wrap jar xf into an R function similar to utils::unzip() in interface, a very simple example:
unzipLarge <- function(zipfile, exdir = getwd()) {
oldWd <- getwd()
on.exit(setwd(oldWd))
setwd(exdir)
system2("jar", args = c("xf", zipfile))
}
And then use:
unzipLarge("year2015.zip", exdir = "csv_folder")

R Logging display name of the script

this is an atomic example of my current issue:
For the moment I have a project containing several R scripts (all in the same directory named DIR). I have a main script in DIR sourcing all the R files and, containing a basicconfig:
basicConfig()
I take two scripts in DIR, dog.r and cat.r. I have currently only one function in these scripts. In dog.r :
feedDog <- function(){
loginfo("The dog is happy to eat!", logger="dog.r")
}
And in cat.r :
feedCat <- function(){
loginfo("The cat is voracious", logger="cat.r")
}
It's fine with this example. But in real I have something like 20 scripts and 20 possible error messages in each. So that instead of writting:
loginfo("some message", logger="name of script")
I would like to write:
loginfo("some message", logger=logger)
And configure different loggers.
The issue is that if I declare a logger in each R scripts, only one will be taken into account when I source all files with my main ... I dunno how to bypass this issue.
PS: in Python it is possible to define a logger in each file taking automatically the name of the script like this:
logger = logging.getLogger(__name__)
But I am afraid it is not possible in R ?
If your source() a file, the functions created in that file will have an attribute called srcref that stored the location from the sourced file that the function came from. If you have a name that points to that function, you can use getSrcFilename to get the filename the function came from. For example, create a file that we can source
# -- testthis.R --
testthis <- function() {
loginfo("Hello")
}
Now if we enter R, we can run
loginfo <-function(msg) {
fnname <- sys.call(-1)[[1]]
fnfile <- getSrcFilename(eval(fnname))
paste(msg, "from", deparse(fnname), "in", fnfile)
}
source("testthis.R")
testthis()
# [1] "Hello from testthis in testthis.R"
The function loginfo uses sys.call(-1) to see what function it was called from. Then it extracts the name from that call (with [[1]]) and then we use eval() to turn that "name" object into the actual function. Once we have the function, we can get the source file name. This is the same as running
getSrcFilename(testthis)
if you already knew the name of the function. So it is possible, it's just a bit tricky. I believe this special attribute is only added to functions. Other than that, each source file doesn't get it's own namespace or anything so they can't each have their own logger.

Retain valid workspace reference after project transfer.

I've been working on a R project (projectA) that I want to hand over to a colleague, what would be the best way to handle workspace references in the scripts? To illustrate, let's say projectA consists of several R scripts that each read input and write output to certain directories (dirs). All dirs are contained within my local dropbox. The I/O part of the scripts look as follows:
# Script 1.
# Give input and output names and dirs:
dat1Dir <- "D:/Dropbox/ProjectA/source1/"
dat1In <- "foo1.asc"
dat2Dir <- "D:/Dropbox/ProjectA/source2/"
dat2In <- "foo2.asc"
outDir <- "D:/Dropbox/ProjectA/output1/"
outName <- "fooOut1.asc"
# Read data
setwd(dat1Dir)
dat1 <- read.table(dat1In)
setwd(dat2Dir)
dat2 <- read.table(dat2In)
# do stuff with dat1 and dat2 that result in new data foo
# Write new data foo to file
setwd(outDir)
write.table(foo, outName)
# Script 2.
# Give input and output names and dirs
dat1Dir <- "D:/Dropbox/ProjectA/output1/"
dat1In <- "fooOut1.asc"
outDir <- "D:/Dropbox/ProjectA/output2/"
outName <- "fooOut2.asc"
Etc. Each script reads and write data from/to file and subsequent scripts read the output of previous scripts. The question is: how can I ensure that the directory-strings remain valid after transfer to another user?
Let's say we copy the ProjectA folder, including subfolders, to another PC, where it is stored at, e.g., C:/Users/foo/my documents/. Ideally, I would have a function FindDir() that finds the location of the lowest common folder in the project, here "ProjectA", so that I can replace every directory string with:
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
So that:
# At my own PC
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
> "D:/Dropbox/ProjectA/source1/"
# At my colleagues PC
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
> "C:Users/foo/my documents/ProjectA/source1/"
Or perhaps there is a different way? Our work IT infrastructure currently does not allow using a shared disc. I'll put helper-functions in an 'official' R project (ie, hosted on R forge), but I'd like to use scripts when many I/O parameters are required and because the code can easily be viewed and commented.
Many thanks in advance!
You should be able to do this by using relative directory paths. This is what I do for my R projects that I have in Dropbox and that I edit/run on both my Windows and OS X machines where the Dropbox folder is D:/Dropbox and /Users/robin/Dropbox respectively.
To do this, you'll need to
Set the current working directory in R (either in the first line of your script, or interactively at the console before running), using setwd('/Users/robin/Dropbox;) (see the full docs for that command).
Change your paths to relative paths, which mean they just have the bit of the path from the current directory, in this case the 'ProjectA/source1' bit if you've set your current directory to your Dropbox folder, or just 'source1' if you've set your current directory to the ProjectA folder (which is a better idea).
Then everything should just work!
You may also be interested in an R library that I love called ProjectTemplate - it gives you really nice functionality for making self-contained projects for this sort of work in R, and they're entirely reproducible, moveable between computers and so on. I've written an introductory blog post which may be useful.

Resources