I'm trying to setup the testthat unit test framework and having some trouble to get the source file location right.
My package folder structure is like below:
.\R\abc.R
.\R\def.R
.\tests\testthat\test_01.R
In my test case file test_01.R, I need to import abc.R. I managed to get this working by specifying a relative path like below:
'../../R/abc.R'
Now the abc.R file can be sourced successfully from my test cases. However, it failed at the step where abc.R tries to source def.R. I think this is because the working directory is set to ./tests/testthat by testthat.
The fix I can think of is to add a relative path '../../R/' to def.R, but this looks to me like a terrible solution as it will break when I run abc.R directly. And also there are a lot more files like abc.R and def.R in my package.
Is there a more graceful way to handle this?
Sorry if this is a straightforward question as I'm still new to R.
Inside ./tests/ there should be a file named testthat.R
Within this file you can add 3 lines:
library(testthat)
library(yourLibraryName)
test_check("yourLibraryName")
Of course replace "yourLibraryName" with the name of your package.
Then all the functions exported by your package will be loaded and tests will be able to use them.
Related
I have an R studio project with a main.R (see sample code) file that sources a few other scripts within this project using here::here(), and also uses here() within the scripts its sourcing. This first R studio project produces a dataset that I would to use in another R studio project also using here() with a similar structure in terms of the main.R script.
First project
library(here)
here::here()
#1. load packages
source(paste0(here::here(),"/R/load_packages.R"))
#2. load UDF functions
source(paste0(here::here(),"/R/functions.R"))
#3. Load BA data
source(paste0(here::here(),"/analysis/load_ba.R"))
#4. Load CDS data
source(paste0(here::here(),"/analysis/load_cds.R"))
#5. Calculate
source(paste0(here::here(),"/analysis/calculate.R"))
Second project
library(here)
here::here()
#load packages
source(base::paste0(here::here(),"/analysis/packages.R"))
#load and manipulate pop/ds data
source("first project full file path/main.R")
So my question is, what is the best way to source the first main.R file that produces the data set I want to use in the second R studio project without the here() links breaking?
One option is to write the output dataset to csv and then read it in, but maybe there is a better way?
The Right Way to do this is to build any code you want to re-use as a package that exports functions (and maybe fixed datasets) for other code to re-use.
I can think of hacky ways to do what you want, most of them rely on passing a folder from the calling code in a global variable or changing the current folder to the called code using setwd (or withr::with_dir) but these are messy. Make packages and create functions instead.
You might be tempted by the whereami package and its thisfile function. But read the help - even the authors don't want you to use it:
*CAVEAT*: Use this function only if your
workflow does not permit other solution: if a script needs to know
its location, it should be set outside the context of the script
if possible.
I'm trying to test a function of mine with the testthat package. The function is supposed to create an excel file, and I want to test if that excel file exists and contains the things it's supposed to contain. I tried running the function to create a simple mockup excel file and then reading in the output again, but that seems to not do anything, I'm guessing because of the localized testing environment. I also feel like writing code outside of the tests is not a good idea since it gave me a warning message about it.
Is there a way to test an external output with testthat that I don't know about? I'm new to unit testing and this package so any help would be appreciated.
I actually solved this myself by accident: code run in test_that DOES produce output, just in the test folder. I could read it from there and then test.
Using python, if I need the absolute path from the context of the current running script all I need to do is to add the following in the code of that script:
import os
os.path.abspath(__file__)
This is very useful as having the absolute path I can then use os.path.join to form new absolute paths for my project components (inside the project directory tree) and more interesting is that everything will continue to work without any problem no matter where the package directory is moved.
I need to achieve the very same thing using R programming, that is obtaining the absolute path of the current running R script ( = the absolute path of its file on the disk). But trying to do the same in R turns out to be quite challenging, at least for me as a rather beginner in R.
After a lot of googling, I tried to use the reticulate package to call Python from R but __file__ is not available there, then I found a few threads on Stackoverflow suggesting to play with the running Stack and others suggesting the use of normalizePath. However none of these worked for me when the entire project package is transferred from one directory to another.
Therefore, I would like to know if for example you have the following file/directory tree
base_dir ( = /home/usr1/apps/R/base_dir)
|
|
|___ myscript.R (this is my R script to be run)
|___ data (this is a directory)
|___ sql (this is a directory)
Is there any solution allowing to add something in the code of myscript.R so that inside the script the program can always know that the base directory is /home/usr1/apps/R/base_dir and if later this base directory is moved to another directory then there is no need to change the code and the program would be able to find correctly the new base directory?
R has in general no way of finding this path, because there is no equivalent to Python’s __file__ in R.
The closest you can get is to look at commandArgs() and laboriously extract the script filename (which requires different handling depending on how the script was launched!). But this will fail if the script was executed in RStudio, and it will fail after calling setwd().
Other solutions (such as the ‘here’ package) rely on heuristics and specific project structures.
But luckily there’s actually a solution that will always work: use ‘box’ modules.
With modules, you’ll always be able to get the path of the current script/module via box::file(). This is the closest equivalent to Python’s __file__ you’ll get in R, and it always works — as long as you’re using ‘box’ modules consistently.
(Internally the ‘box’ package requires complex logic to determine the value of the file() function in all circumstances; I don’t recommend replicating it, it’s too complex. For the curious, the bulk of the relevant logic is in R/loaded.r.)
If you are running the script using Rscript you can use getwd().
#!/usr/bin/Rscript
getwd()
# or assign it to a variable
base_dir = getwd()
you can run it from the command line using one of the following
./yourscript.R
# or
Rscript yourscript.R
Note however, this only works if you run the script from inside the folder, the file is in.
cd ~
./script.R
# "/home/usr1"
cd /
/home/usr1/script.R
# "/"
For a more elaborate option you could consider https://stackoverflow.com/a/55322344/3250126
I built my own package in R and created all my functions. Everything worked very well. Then, I want to include a .C files into my package.
I follow the structure in this link compiled code. Once I done that, my package stop working and cannot use it anymore.
I tried to fix it more than one time but nothing is happen. Then, I built another package and load my functions inside it (I was save a copy of my files).
Now I would like to start again but do not want to lose my function again. Any ideas?
Try to write your files first and make sure that they are work! Then build you package following the structures here.
Follow the structures step by step and you will be fine. Your package will set src file for you and all your other files.
I'm building a package that uses two main functions. One of the functions model.R requires a special type of simulation sim.R and a way to set up the results in a table table.R
In a sharable package, how do I call both the sim.R and table.R files from within model.R? I've tried source("sim.R") and source("R/sim.R") but that call doesn't work from within the package. Any ideas?
Should I just copy and paste the codes from sim.R and table.R into the model.R script instead?
Edit:
I have all the scripts in the R directory, the DESCRIPTION and NAMESPACE files are all set. I just have multiple scripts in the R directory. ~R/ has premodel.R model.R sim.R and table.R. I need the model.R script to use both sim.R and table.R functions... located in the same directory in the package (e.g. ~R/).
To elaborate on joran's point, when you build a package you don't need to source functions.
For example, imagine I want to make a package named TEST. I will begin by generating a directory (i.e. folder) named TEST. Within TEST I will create another folder name R, in that folder I will include all R script(s) containing the different functions in the package.
At a minimum you need to also include a DESCRIPTION and NAMESPACE file. A man (for help files) and tests (for unit tests) are also nice to include.
Making a package is pretty easy. Here is a blog with a straightforward introduction: http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
As others have pointed out you don't have to source R files in a package. The package loading mechanism will take care of losing the namespace and making all exported functions available. So usually you don't have to worry about any of this.
There are exceptions however. If you have multiple files with R code situations can arise where the order in which these files are processed matters. Often it doesn't matter or the default order used by R happens to be fine. If you find that there are some dependencies within your package that aren't resolved properly you may be faced with a situation where a custom processing order for the R files is required. The DESCRIPTION file offers the optional Collate field for this purpose. Simply list all your R files in the order they should be processed to satisfy the dependencies.
If all your files are in R directory, any function will be in memory after you do a package build or Load_All.
You may have issues if you have code in files that is not in a function tho.
R loads files in alphabetical order.
Usually, this is not a problem, because functions are evaluated when they are called for execution, not at loading time (id. a function can refer another function not yet defined, even in the same file).
But, if you have code outside a function in model.R, this code will be executed immediately at time of file loading, and your package build will fail usually with a
ERROR: lazy loading failed for package 'yourPackageName'
If this is the case, wrap the sparse code of model.R into a function so you can call it later, when the package has fully loaded, external library too.
If this piece of code is there for initialize some value, consider to use_data() to have R take care of load data into the environment for you.
If this piece of code is just interactive code written to test and implement the package itself, you should consider to put it elsewhere or wrap it to a function anyway.
if you really need that code to be executed at loading time or really have dependency to solve, then you must add the collate line into DESCRIPTION file, as already stated by Peter Humburg, to force R to load files order.
Roxygen2 can help you, put before your code
#' #include sim.R table.R
call roxygenize(), and collate line will be generate for you into the DESCRIPTION file.
But even doing that, external library you may depend are not yet loaded by the package, leading to failure again at build time.
In conclusion, you'd better don't leave code outside functions in a .R file if it's located inside a package.
Since you're building a package, the reason why you're having trouble accessing the other functions in your /R directory is because you need to first:
library(devtools)
document()
from within the working directory of your package. Now each function in your package should be accessible to any other function. Then, to finish up, do:
build()
install()
although it should be noted that a simple document() call will already be sufficient to solve your problem.
Make your functions global by defining them with <<- instead of <- and they will become available to any other script running in that environment.