I'm new to R and frankly the amount of documentation is overwhelming, and I haven't been able to find the answer to this question.
I have created a number of .R script files, all stored in a folder that I can access on my server (let's say the folder is, using the Windows backslash character \\servername\Paige\myscripts)
I know that in R you can call each script individually, for example (using the forward slash required in R)
source(file="//servername/Paige/myscripts/con_mdb.r")
and now this script, con_mdb, is available for use.
If I want to make all the scripts in this folder available at startup, how do I do this?
Briefly:
Use your ~/.Rprofile in the directory found via Sys.getenv("HOME") (or if that fails, in R's own Rprofile.site)
Loop over the contents of the directory via dir() or list.files().
Source each file.
as eg via this one liner
sapply(dir("//servername/Paige/myscripts/", "*.r"), source)
but the real story is that you should not do this. Create a package instead, and load that. Bazillion other questions here on how to build a package. Research it -- it is worth it.
Far the best way is to create a package! But as first step, you could also create one r script file (collection.r) in your script directory which includes all the scripts in a relative manner.
In your separate project scripts you can than include only that script with
source(file="//servername/Paige/myscripts/collection.r", chdir = TRUE)
which changes the directory before sourcing. Therefore you would have only to include one file for each project.
In the collection file you could use a loop over all files (except collection.r) or simply list them all.
Related
I have a configure script to set up some paths for my R package during installation. I wish to edit a file based on some conditions. Is there any way to edit a file from within the configure.ac? It would be great if the solution is provided for all operating systems.
Is there any way to edit a file from within the configure.ac?
configure.ac is not executable, but I suppose you mean that you want the configure script generated from it to edit a file. The configure script is a shell script, and you can cause arbitrary shell code to be included in it, more or less just by including that code at the corresponding point in configure.ac.
The question, then, is how you would automate editing a file with a shell script. There is a variety of alternatives, but sed is high on my list. You will find it on every system that can support Autoconf configure scripts, because such scripts use it internally.
On the other hand, this sort of thing is one of the main activities of a configure script, in the form of creating files (especially makefiles, but not limited to those) from templates. You should consider building your target file of interest from a template in this way, instead of making custom-programmed edits to a file packaged in your program distribution. This would involve
setting output variables containing the chosen content for the parts of the file that need to be configured;
designating the target file as one for configure to build; and
providing the template, maybe by taking a complete example file and replacing each variable part with a reference to the appropriate #output_variable#.
I'm providing a .zip with a .R file and a .xlsx file to some people
I need to make a code that can read this .xlsx file in any directory of any pc.
But as the directories vary from computer to computer, I couldn't find a solution.
IMPORTANT: I'm not using Rstudio for read this .R, so i just can use base functions
Using R - How do I search for a file/folder on all drives (hard drives as well as USB drives) This question don't solve my problem..
Take a look at the here package. When you load the library (library("here")) it sets "base" working directory and then you can use the package to construct relative file paths given that location. For example, if inside your .zip file you have an R script (e.g., My Data Analysis.R) that analyzes data that is kept within a folder called data you could read it in using, for example, read.csv(here("data", "my_csv_file.csv")) and it will construct the full appropriate file path no matter what computer it is on. Of course the file structure of the program needs to stay the same across programs.
A while back I was reading an article about improving project workflow. The advice was not to use setwd or my computer would burn:
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")
I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
I started using the here package and it worked great until I started to schedule scripts using cronR. After asking this question my laptop was again threatened with arson:
If the first line of your #rstats script is wd <- here(), I will come
into your lab and SET YOUR COMPUTER ON FIRE.
Fearing for my laptop's safety I started using the method suggested in the answer to get relative file paths:
wd <- Sys.getenv("HOME")
wd <- file.path(wd, "projects", "my_proj")
Which worked for me but not people I was working with who didn't have the same projects directory. So now I'm confused. What is the safest / best way get relative file paths so that a project can be portable?
There are quite a few options: 1, 2. My requirements are to source functions/scripts and read/write csv files. Perhaps the rprojroot package is the best bet?
Create an RStudio project and then reference all files with relative paths from the project's root folder. That way, all users will open the project and automatically have the correct working directory.
There are many ways to organize code and data for use with R. Given that the "arsonist" described in the OP has rejected at least two approaches for locating the project files in an R script, the best next step is to ask the arsonist how s/he performs this function, and adjust your code and file structures accordingly.
UPDATE: Since the "arsonists" appear to be someone who writes on Tidyverse.org (see Tidyverse article in OP) and an answer on SO (see additional links in OP), your computer appears to be relatively safe.
If you are sharing code or executing it with batch processes where the "user" is someone other than you, a useful approach is to place the code, data, and configuration under version control, and develop a runbook to explain how others can retrieve the components and execute them on another computer.
As noted in the comments to the OP, there's nothing wrong with here::here() if its use can be made reliable through documentation in a runbook.
I structure all of my R code into Projects within RStudio, which are organized into a gitrepositories directory. All of the projects can be accessed as subdirectories from the gitrepositories directory. If I need to share a project, I make the project accessible to other users on GitHub.
In my R code I reference external files as subdirectories from the project root directory, such as ./data/gen01.csv.
There are two parts to this question:
how to load data from a relative path, and
how to load code from a relative path
For most use cases (including when invoking tools from a CRON job or similar) the location of the data should either be specified by the user (via command line arguments, standard input or environment variables) or should be relative to the current working directory (getwd() in R).
… Unless the data is a fixed part of the project itself — more on this below.
Loading code from a path that’s relative to other code is simply not supported by base R. For example, source('xyz.r') won’t source an xyz.r file from the project. It will always try to load it from the current working directory, whatever that happens to be. Which is pretty much never what you want. And as you’ve noticed, the ‘here’ package also doesn’t always work.
R basically only works when code is only loaded from packages. But packages aren’t suitable for all types of projects. R has no built-in solution for those other cases. I recommend using ‘box’ modules to solve this. ‘box’ provides a modern module system for R, which means that you can have R projects consisting of multiple code files (and nested sub-projects), without having to wrap them in packages. Loading code inside the same relative path in a module is as simple as
box::use(./xyz)
This always works, as you’d expect from a modern module system, and doesn’t require ‘here’ or similar hacks.
OK, back to the point about data that’s bundled with a project itself. If your project is an R package, you’d use system.file() to load that data. However, this once again doesn’t work for non-package projects. But if you use ‘box’ modules to structure your project, you can use box::file() to load data that’s associated with a module.
Packages such as ‘here’ or ‘rprojroot’, while well-intended, are essentially hacks to work around limitations in R’s handling of non-package code. The proper solution is to make non-package code into a first-class citizen of the R world, and ‘box’ does that.
You can check docs of RSuite package (https://RSuite.io). It is working with script_path that points to currently run R script. I use it to make relative paths using 'file.path' command
I'm trying to build an R package whose goal is to run a series of analyses by taking input data and writing output data to an external database (PostgreSQL).
Specifically, I need a set of operations to be scheduled to run on a daily basis. Therefore, I have written some bash scripts with R code (using the header #!/usr/bin/env Rscript) and I have saved them into the exec/ folder of the R package. The scripts make several call to the package's core functions in R/ folder.
At this point, once installed the package on a linux server, how do I set up a crontab that is able to directly access the scripts in the exec/ folder?
Is this way of proceeding correct or is there a different best practice for such operations?
We do this all the bleeping time at work. Here at home I also have a couple of recurring cronjobs, eg for CRANberries. The exec/ folder you reference works, but my preferred solution is to use, say, inst/scripts/someScript.R.
Then one initial time you need create a softlink from your package library, say, /usr/local/lib/R/site-library/myPackage/scripts/someScript.R to a directory in the $PATH, say /usr/local/bin.
The key aspect is that the softlink persists even as you update the package. So now you are golden. All you now need is your crontab entry referencing someScript.R. We use a mix of Rscript and littler scripts.
I have a .Rmd which I use to report on data quality in a number of different r projects. It would then split the data to remove subsets with missing data, and interpolate missing results where appropriate. It would do this via a write.csv command to a file path in the form of "./Cleansed_data/"
To make an example
open rstudio
go to the rhs 'project' menu , and select and make a new
project wherever you'd like
go to the lhs 'new script' drop down and
select 'new .Rmd'
change the output to .pdf and hit ok
in the last r
chunk include write.csv(mtcars, file = "mtcars.csv")
hit the 'knit
pdf' button, save the report as "writeFile.Rmd" to your project working directory, and
let it run.
Previously I moved this .Rmd from place to place, however now I would like to built it into an internal package. I have included it (as the documentation indicates to) into inst/rmd within the package directory.
In order to do this build or open any package you have access to
add the file to inst/rmd (create it if this doesn't exist)
rebuild the package
I then rebuild the package and open a new project. I load my new package and attempt to run the document via the render command using the system.file command to locate the .rmd like so
rmarkdown::render(input = system.file("rmd/writeFile.Rmd", package="MyPackage"),
output_file = "writeFile.pdf", output_dir = "./Cars/)
This will render the report from the package build into the folder from output_dir, however, there are a number of pitfalls here. First, if I omit the output_dir argument, the report will render into the package library, usually located in the libraries r installation in the c drive. This is however fixable.
What I can't get around is that when the .Rmd hits the write.csv() then (I believe) the .Rmd is being rendered in the package environment at the time, the working directory of which is the package library folder, not the current project directory.
The Questions
How can I inform the template in the package what the current working directory is for the rstudio project? I'm vaguely aware there might be a rstudio api package? I have nearly no understanding of what it is though, or if this would provide a solution.
If this is either outright impossible or just potentially a very bad idea how can I modify the workflow to successfully retrieve a number of r object outputs into the environment or the working directory, on the call to the report, without having to modify the report for each different project? Further, why specifically is this approach such a bad plan?
In order to close this off:
I have selected to keep the .Rmd included in the package. The .Rmd need to move and be versioned with the package as that holds the functions they use to run.
In order to meet my requirements I style the documents to grab the working directory via the rstudio api in the form.
write.csv(mtcars, file = paste0(rstudioapi::getActiveProject(), "mtcars.csv"))
Having tested #CL's answer, this also runs and is not dependant on Rstudio as an IDE, however I know that these documents will
Always be accessed via the rstudio IDE
Always be accessed from within a specific project
I fear (though have not tested) that there would be the potential for other impacts from setting the working directory for the file to be artificially booted into a different WD. Potentially this could be things like child documents I might want to include later, or other code that might need to be relevant to the file path of the package installation, not the project. In this way I think (If I interpreted Yuhui correctly) the r doc is still the centre of it's own universe. It just writes it's data into another one :)