I am developing a package in R (3.4.3) that has some internal data and I would like to be able to let users refresh it without having to do too much work. But let me be more specific.
The package contains several functions that rely on multiple parameters but I am really only exporting wrapper functions that do all the work. Because there is a large number of parameters to pass to those wrapper functions, I am putting them in a .csv file and passing them as a list. The data is therefore a simple .csv file called param.csv with the names of the parameters in one column and their value in the next column
parameter_name,parameter_value
ticker, SPX Index
field, PX_LAST
rolling_window,252
upper_threshold,1
lower_threshold,-1
signal_positive,1
signal_neutral,0
signal_negative,-1
I process it running
param.df <- read.csv(file = "param.csv", header = TRUE, sep = ",")
param.list <- split(param.df[, 2], param.df[, 1])
and I put the list inside the package like this
usethis::use_data(param.list, overwrite = TRUE)
Then, when reset the R interactive window and execute
devtools::document()
devtools::load_all()
devtools::install()
require(pkg)
everything works out great, the data is available and can be passed to functions.
First problem: when I change param.csv, save it and repeat the above four lines of code, the internal param.list is not updated. Is there something I am missing here ? Or is it by nature that the developer of the package should run usethis::use_data(param.list, overwrite = TRUE) each time he changes the .csv file where the data comes from?
Because it is intended to be a sort of map, the users will want to tweak parameters for (manual) model calibration. To try and solve this issue I let users provide the functions with their own parameter list (as a .csv file like before). I have a function get_param_list("path/to/file.csv") that does exactly the same processing that is done above and returns the list. It allows the users to pass their own parameter list. De facto, the internal data param.list is considered a default setting of parameters.
Second Problem: I'd like to let users modify this default list of parameters param.list from outside the package in a safe way.
So since the internal object is a list, the users can simply modify the element of their choice in the list from outside the package. But that is pretty dangerous in my mind because users could
- forget parameters, so I have default values inside the functions for that case
- replace a parameter with another type which can provoke an error or worse, side effects.
Is there a way to let users modify the internal list of parameters without opening the package ? For example I had thought about a function that could replace the .csv file by a new one provided by the user but this brings me back to the First Problem, when I restart the R prompt and reinstall the package, nothing happens unless I usethis::use_data(param.list, overwrite = TRUE) from inside the package. Would another type of data be more useful (e.g. make the data internal) ?
This is my first question on this website but I realize it's quite long and might be taken as a coding style question. But if anyone can see an obvious mistake or misunderstanding of package development in R from my side that would really help me.
Cheers
Related
Suddenly, R Studio has been giving me hard-to-reproduce errors in which it references old variable names and code that no longer exist in my script. Here is a short replicable example:
LoadInfectionsDrugs <- function() {
}
meta <- read.csv("tblhctmeta.csv")
infections <- read.csv("tblInfectionsCidPapers.csv")
drugs <- read.csv("tbldrug.csv")
workspace <- "w4v3.RData"
load(workspace)
print(head(input))
print(head(meta))
print(head(drugs))
print(head(infections))
LoadInfectionsDrugs() #This line generates the error - loads and prints correctly if this is commented out.
If I comment out the last line, it runs and prints just fine. If I leave the last line in, though, I get the error "Error in unique(input$PatientID): argument "input" is missing, with no default.
Thing is, the LoadInfectionsDrugs function used to (and normally does) have several arguments, including "input", which is a data frame with a column called PatientID. So this clearly seems to be referencing old code.
If I comment out the "load(workspace)" and "print(head(input))" lines, it also runs fine.
So my hypothesis is that, in my main code, when I call "save.image(workspace)", it is saving not only objects, such as the dataframes I'm creating, but also functions. Then, when I change the function in code, but load the workspace, it is replacing the functions I've defined in code with the definitions as they were when I last saved the workspace.
I didn't know that .RData files saved function definitions. I thought they just saved variables. Is there a way to save only the objects you've defined (i.e. dataframes), and not the functions?
https://www.oreilly.com/library/view/r-in-a/9781449377502/ch12s02.html
Today I discovered the difference between saving objects and saving my entire workspace. If you save the workspace, you save everything, including function definitions and such. If you change the code and then load the workspace, R will start using your old functions, not the ones you've got in your most recent version of the script.
Thanks for all your advice. My remaining question is this:
Can I replace column name 'sulphate' in the following statement ...
dataclean <- datatable$sulfate[!datanas]
.... with a reference to a parameter 'pollutant', which may or may not have a value of 'sulfate'?
When you attach values to arguments, they appear as they would be objects in your workspace. But the environment is not workspace but that of the function.
So in your case, directory would be a character string and it would work. For the first time. Your working directory is now changed and you need to revert back to the previous one for the function to work again. This can get pretty messy so what I like to do is just refer to raw files by full path. See ?list.files for more info.
For your second question, your best bet is to refer to a certain level within the variable, is to do
x[, pollutant]
It is convenient to add drop = FALSE argument there, in order to keep the what I'm assuming is a data.frame.
You could improve your function by also implementing the datatable argument. That way you have all the objects bundled together nicely.
The most important thing to note here would be "debugging". You should learn to use at least browser(). This function will stop the execution of your function at the very step where it was called. This enables you, in the R console, to inspect elements in the function and run code to see what's going. This way you can speed up the development of code, at least initially when you usually haven't internalized all the data structures and paradigms yet.
There's a bit of a preamble before I get to my question, so hang with me!
For an R package I'm working on I'd like to make it as easy as possible for users to partially apply functions inline. I had the idea of using the [] operators to call my partial application function, which I've name "partialApplication." What I aim to achieve is this:
dnorm[mean = 3](1:10)
# Which would be exactly equivalent to:
dnorm(1:10, mean = 3)
To achieve this I tried defining a new [] method for objects of class function, i.e.
`[.function` <- function(...) partialApplication(...)
However, R gives a warning that the [] method for function objects is "locked." (Is there any way to override this?)
My idea seemed to be thwarted, but I thought of one simple solution: I can invent a new S3 class "partialAppliable" and create a [] method for it, i.e.
`[.partialAppliable` = function(...) partialApplication(...)
Then, I can take any function I want and append 'partialAppliable' to its class, and now my method will work.
class(dnorm) = append(class(dnorm), 'partialAppliable')
dnorm[mean = 3](1:10)
# It works!
Now here's my question/problem: I'd like users to be able to use any function they want, so I thought, what if I loop through all the objects in the active environment (using ls) and append 'partialAppliable' to the class of all functions? For instance:
allobjs = unlist(lapply(search(), ls))
#This lists all objects defined in all active packages
for(i in allobjs) {
if(is.function(get(i))) {
curfunc = get(i)
class(curfunc) = append(class(curfunc), 'partialAppliable')
assign(i, curfunc)
}
}
VoilĂ ! It works. (I know, I should probably assign the modified functions back into their original package environments, but you get the picture).
Now, I'm not a professional programmer, but I've picked up that doing this sort of thing (globally modifying all variables in all packages) is generally considered unwise/risky. However, I can't think of any specific problems that will arise. So here's my question: what problems might arise from doing this? Can anyone think of specific functions/packages that will be broken by doing this?
Thanks!
This is similar to what the Defaults package did. The package is archived because the author decided that modifying other package's code is a "very bad thing". I think most people would agree. Just because you can do it, does not mean it's a good idea.
And, no, you most certainly should not assign the modified functions back to their original package environments. CRAN does not like when packages modify the users search path unnecessarily, so I would be surprised if they allowed a package to modify other package's function arguments.
You could work around that by putting all the modified functions in an environment on the search path. But then you have to ensure that environment is always searched first, which means modifying the search path every time another package is loaded.
Changing arguments for functions in other packages also has the potential to make it very difficult for others to reproduce your results because they must have all your argument settings. Unless you always call functions with all their arguments specified, which defeats the purpose of what you're trying to do.
I'm an instructor, and my students have expressed interest in a record of the code I run in class. Since this is often off-the-cuff, I would like to have an easy function I could run at the end of class which would save everything I had run in the session. I know savehistory() will save my entire history, but that is not what I'm looking for.
Similar to this question, except I believe I have a pedagogically sound reason for wanting the history, and I'm looking to limit what gets saved on a per-session basis, rather than on the basis of the number of lines.
I think if you invoke R with --no-restore-history (so that history from previous sessions isn't appended to the record for this one) and add
.Last <- function() {
if(interactive())
try(savehistory(paste0("~/.Rhistory_", sys.time())))
}
to your Rprofile, you should get self-contained, and timestamped history files every time R closes naturally.
The .Last function, when defined in the global environment, is called directly before normal close. See ?.Last
NB: this won't preserve your history in the case of a fatal error (crash) in R itself, though that shouldn't come up much in a teaching context.
NB2: the above code will have generated file names with spaces in them. Depending on your OS, this could range from no big deal to hairpulling nightmarescape. If it's a problem, wrap sys.time() with your favorite datetime formatting code, e.g. format(sys.time(), "<format string>") or something from lubridate (probably, I don't actually know as I don't use it myself).
In the development version of rite on GitHub (>= v0.3.6), you can use the sinkstart() function to dump all of your code, all of your results, or both to a little interactive Tcl/tk widget, from which you could then just copy or save the output.
To make it work you can do this:
devtools::install_github("leeper/rite")
library("rite")
sinkstart(print.eval = FALSE, prompt.echo = "", split = TRUE)
## any code here
sinkstop() # stop printing to the widget
That will look something like:
You can dynamically change what is printed in the widget from the context (right-click) menu on the widget. You can also dynamically switch between sinkstart() and sinkstop() if you only want some code and/or results output to there.
Full Disclosure: This is a package I developed.
Let's say some of my users cannot alter their R environments, but I need them to be able to open up RData files. These environment files require a package to be loaded (httpuv to be exact). We don't care about the package, we don't need its capabilities, we just need to get at the data. Is there a way to either force R to bypass loading namespaces when loading the RData file, or force it to save it without namespace dependencies at the originating end? Thanks.
To reproduce, install Shiny. Create and save a some R objects to the server's file system from within a Shiny applet as an RData file. Copy the file over to a computer that doesn't have Shiny or the httpuv package installed. Try loading the RData file, even if the actual objects you saved are completely ordinary data.frames that have nothing to do with Shiny or httpuv.
I did strings on the RData, and the damn thing is full of references to httpuv. The software is loading the file and then actively deciding to not continue in the internal loadFromConn2() function. Therefore there must be a way to make it stop doing so.
Really #baptiste should get credit for the link in his comment to some general solutions, especially the R CMD INSTALL --fake trick, and I will accept that if he reposts it as an answer. That is why I am not accepting the following answer of my own to the specific problem that caused it in my case, but I am posting my answer in case it helps someone else.
Some of the objects I was saving were lm fitted objects. Those contain formula/terms objects (at least two each, for some reason... maybe because they've been through stepAIC), and those formulas in turn each have an environment attribute. The environment attribute is .GlobalEnv which probably does contain copies of package functions someplace. When I dug through the objects inside the fitted models, and then the objects inside all the attributes of those objects, and then the objects inside the attributes of the attributes of those objects... and set every environment attribute I could find to NULL, eventually I was able to save that fitted model to a file that could be opened from a different R installation without getting the error about not being able to load a namespace.
I suppose I could also write a function to iterate through the objects within a fitted model, and their attributes, and remove environments but that sounds ugly and dangerous. Maybe there is a way to force formulas and fitted models not to retain environments, and that will be better. For the time being, instead of saving fitted models, I will save their call attributes after scrubbing any environment attributes I might find there. If that doesn't work, I'll deparse them into character strings.
PS: I used the RDS format and haven't yet tested it with RData, but I suspect that the problem was the saving of the evalution environment in some of the attributes, and had nothing to do with the format in which the objects get saved. I'll post an update if it turns out that this doesn't also work with RData.
PPS: I suspect I'm not the only one here who's hearing about the R CMD INSTALL --fake trick for the first time, and perhaps the word should be spread about this... because to the extent other R users don't know about it, this remains an obvious vector for denial-of-service attacks against R!
I will accept my own answer to get rid of the SO auto-nagger, but will unaccept it and accept #baptiste if they make it possible for me to do so by posting it as an answer. Thanks.