How can I certify a file is exactly the same?

How can I certify a file is exactly the same? - r

Is there a package or function that can be applied to a whole and heavy data object to get back a measure of changes in the file? Something based on hash keys would be great, so I can keep track on a shared file.

digest package (digest function) lets you create hash functions for R objects (possible ones: "md5", "sha1", "crc32", "sha256", "sha512", "xxhash32", "xxhash64"). You can also run external programs from R (e.g. md5sum on linux) with system commend (see e.g. here).

Related

Data acquisition using QuantTools with R

I am using the QuantTools package in R language
When get_finam_data () is used, how can I obtain a list of symbols that can be acquired?

You should go to package internals to get the list.
Just download data for arbitrary symbol so the list is fetched from Finam server and saved for later use.
Keep in mind this is not documented so it can be changed in future versions.
get_finam_data( 'GAZP', Sys.Date() )
QuantTools:::finam_downloader_env$instruments_info

I suppose there no way to get it from QuantTools package. You can get it from https://www.finam.ru/profile/moex-akcii/mosenrg/export/?market=1 by hand or use external web sources.

How to compress saves in R package build

I'm trying to include a (somewhat) large dataset in an R package. I keep getting the Warning during the check in Rstudio saying that I could save space with compression:
* checking data for ASCII and uncompressed saves ... WARNING
Note: significantly better compression could be obtained
by using R CMD build --resave-data
old_size new_size compress
slp.rda 499Kb 310Kb bzip2
sst.rda 1.3Mb 977Kb xz
I've tried adding -- resave-data to RStudio's "Configure Buid Tools" to no effect.

Another alternative, if you have a large dataset that you don't want to re-create, is to use tools::resaveRdaFiles from within R. Point it at the dataset file, or the entire data directory, and it will compress your data in a format of your choosing. See its manual page for more information.

The devtools function use_data takes a parameter for the type of compression and makes adding data to pkgs much easier in general. Using it, or just save on your own), use xz compression when you save your data (for save it's the compression_level parameter).
If you want to use --resave-data then you can try --resave-data=best since just using --resave-data defaults to gzip (gaining you pretty much nothing in this case).
See Building package tarballs for more information.

Building R packages - using environment variables in DESCRIPTION file?

At our site, we have a large amount of custom R code that is used to build a set of packages for internal use and distribution to our R users. We try to maintain the entire library in a versioning scheme so that the version numbers and the date are the same. The problem is that we've gotten to the point where the number of packages is substantial enough that manual modification of the DESCRIPTION file and the package .Rd file is very time consuming, and it would be nice to automate these pieces.
We could write a pre-script that goes through the full set of files and writes the current data and version number. This could be done with out a lot of pain, but it would modify our current build chain and we would have to adapt the various steps.
Is there a way that this can be done without having to do a pre-build file modification step? In other words, can the DESCRIPTION file and the .Rd file contain something akin to an environment variable that will be substituted with the current information when called upon by R CMD build ?

You cannot use environment variables as R, when running R CMD build ... or R CMD INSTALL ..., sees the file as fixed.
But the no problem that cannot be fixed by another layer of indirection saying remains true. Your R source code could simply be files within another layer in which you text substitution according to some pattern. If you like autoconf, you could just have DESCRIPTION.in and have a configure script query the environment variables, or a meta-config file or database, or something else, and have that written out. Similarly you could have a sed or perl or python or R or ... script doing the textual substitution.
I used to let svn fill in the argument to Date: in DESCRIPTION, and also encoded revision numbers in an included header file. It's all scriptable to your heart's content.

With Roxygen and testthat, what is the proper way to make internal helper functions available to testcases called during R CMD check?

I am creating an R package, and found it useful to break parts of the logic in one file into internal helper functions, which I define in the same file. I have sort of a special case where my function decides which helper function to use via match.fun(). Since they won't be useful to other functions or people, I don't want to put these in separate files, and I don't want to export them.
All my testthat cases pass using test_dir(). When I don't export these functions, my testthat cases fail during R CMD check.
"object 'helperfunction1' of mode 'function' was not found", quote(get(as.character(FUN),
mode = "function", envir = envir)))
After looking at this post, I am able to get things to work if I explicitly export or add export entries to NAMESPACE, but again I don't want to export these.
Is there a better way to do this and doesn't require me to export? (I'll admit that the source of the issue may be match.fun() and am open to other ways of calling functions at runtime.)

From memory it wasn't clear in the documentation last time I read it (it may have changed), but it will work correctly (without having to export) so long as everything is in the right directories:
You should have a file:
tests/run-all.R
That looks like:
library(testthat)
library(myPackage)
test_package("myPackage")
Then your individual test files should be in the directory inst/tests
These will be run when you do R CMD check, otherwise you can call test_package("myPackage") in R manually.

Can I load an RData file while bypassing loading the namespaces?

Let's say some of my users cannot alter their R environments, but I need them to be able to open up RData files. These environment files require a package to be loaded (httpuv to be exact). We don't care about the package, we don't need its capabilities, we just need to get at the data. Is there a way to either force R to bypass loading namespaces when loading the RData file, or force it to save it without namespace dependencies at the originating end? Thanks.
To reproduce, install Shiny. Create and save a some R objects to the server's file system from within a Shiny applet as an RData file. Copy the file over to a computer that doesn't have Shiny or the httpuv package installed. Try loading the RData file, even if the actual objects you saved are completely ordinary data.frames that have nothing to do with Shiny or httpuv.
I did strings on the RData, and the damn thing is full of references to httpuv. The software is loading the file and then actively deciding to not continue in the internal loadFromConn2() function. Therefore there must be a way to make it stop doing so.

Really #baptiste should get credit for the link in his comment to some general solutions, especially the R CMD INSTALL --fake trick, and I will accept that if he reposts it as an answer. That is why I am not accepting the following answer of my own to the specific problem that caused it in my case, but I am posting my answer in case it helps someone else.
Some of the objects I was saving were lm fitted objects. Those contain formula/terms objects (at least two each, for some reason... maybe because they've been through stepAIC), and those formulas in turn each have an environment attribute. The environment attribute is .GlobalEnv which probably does contain copies of package functions someplace. When I dug through the objects inside the fitted models, and then the objects inside all the attributes of those objects, and then the objects inside the attributes of the attributes of those objects... and set every environment attribute I could find to NULL, eventually I was able to save that fitted model to a file that could be opened from a different R installation without getting the error about not being able to load a namespace.
I suppose I could also write a function to iterate through the objects within a fitted model, and their attributes, and remove environments but that sounds ugly and dangerous. Maybe there is a way to force formulas and fitted models not to retain environments, and that will be better. For the time being, instead of saving fitted models, I will save their call attributes after scrubbing any environment attributes I might find there. If that doesn't work, I'll deparse them into character strings.
PS: I used the RDS format and haven't yet tested it with RData, but I suspect that the problem was the saving of the evalution environment in some of the attributes, and had nothing to do with the format in which the objects get saved. I'll post an update if it turns out that this doesn't also work with RData.
PPS: I suspect I'm not the only one here who's hearing about the R CMD INSTALL --fake trick for the first time, and perhaps the word should be spread about this... because to the extent other R users don't know about it, this remains an obvious vector for denial-of-service attacks against R!
I will accept my own answer to get rid of the SO auto-nagger, but will unaccept it and accept #baptiste if they make it possible for me to do so by posting it as an answer. Thanks.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex