How to include a full R distribution in my GitHub repository - r

I build transport models for various government agencies. My model is managed through GitHub, and it depends on R to perform certain calculations. I currently have my entire r installation folder in the repository. This can't be the right solution, but here are some of my constraints:
My clients are usually even less sophisticated programmers then I am. When they download/clone the model, it just needs to work.
This needs to be the case 10 years from now - regardless of what the current build of R and all the package dependencies are.
Placing my entire R folder in the repo solves these two problems, but creates some new ones:
The repository is much larger than it needs to be / longer download time.
If the transport model is updated to a new version (say v2.0), I'd want to update R and its packages to the latest versions. I'm afraid this would increase the size of the repo even further.
One solution I understand is submodules. I could place the full R folder in a separate repo and bring it in as a submodule. This, at the very least, cleans up the model repository.
What about zipping the R folder? Some early testing showed that git can diff the zip file, but I don't know if it is doing it as a flat file or reading the contents. Also, is GitHub going to complain about 100MB+ zip file? I'd like to avoid GitLFS if I can, but asking my clients to unzip that file wouldn't be a problem.
I also looked at packrat, but as far as I can tell, that only works for R projects.
Lastly, I don't entirely understand makefiles / recipes, but it would be nice if there was a script I could run that would download specific versions of R and it's libraries. One complicating thing is that some of the R packages are private GitHub repos.
Anyway, I'm happy to provide more info if needed. Thank you for your help!

Related

How to host example data for a R-packages on Github

I'm experimenting with GitHub and I created a little package for my colleagues to use. They install it with the devtools package and install_github() function directly in R. I also have some example data and a R-Markdown file that shows the usage of all functions in the package and can be published via GitHub Pages.
I would like to know what would be the best practice to enable others to use this example data to learn the package.
I can think of two different options:
Host the data in a separate directory which is not part of the installation and tell people to download it manually or use something like the download.file() function from R at the beginning of the example script to download all data that could be packed into a .zip.
Make the data part of the package installation, however this would require the data to be fairly small which is difficult in my particular case (data is 10MB).
Ideally the examples in the R-documentation (.Rd files in the man folder) could also use the same examples as in the markdown file. also in this case, option (2) seems to be favorable.
Could anybody give me some advice what would be the best way to go, sort of the "industry standard" if there is any.

Correct configuration for collaborative development of an R package on a shared drive

We have two people on two computers. Computer A has a shared drive with Computer B. We would like to collaborate on an R package. My understanding is that we have three ingredients:
The R Project source code
The installed R package
The git repository
Currently we share all three over the shared drive. However, this doesn't allow us to work on separate Git branches (unless there's something I'm missing), since we're working on the same source files.
What would be the correct way of arranging these files to allow both users to work on separate Git branches? Would both have local versions of the R Project, and work on a common git repository? If so, would it be better to also have separate installations of the package, or a share a single one on the shared drive?
Thanks!
The capabilities you require are best supported within Git. Ideally you should structure your work to minimize the degree to which multiple people are working on a given file at the same time to reduce the effort to merge multiple versions of the same file back into the master branch. An alternate strategy would be to make frequent commits and pulls to reduce the complexity of merges. A thorough treatment of branching strategies in Git are available in Chapter 3 of Pro Git.

Big R project with several packages and developers: Best setup for easy version controll based on packages

I have to restructure a big project written in R, which is later consisting several packages as well as developers. Everything is set up on a git server.
The question is: How do I manage frequent changes inside packages without having to build them every time and developers updating them after they made a new pull? Is there any best practice or automation for that? I don't want source() with unbuilt packages and R.files but would like to stick with a package like structure as much as possible. We will work in a Windows environment.
Thanks.
So I fiddled around a while, tried different setups and came up with an arrangement which fits my needs.
It basically consists two git repositories. The first on (let's call it base-repo) of them contains most scripts on which all later packages are based on. The second repo we will call the "package-repo".
Most development work should be done on the base-repo. The base-repo is under CI control via a build server and unit tests.
The package-repo contains folders for each package we want to build and the base-repo as a git-submodule.
Each package can now be constructed via a very simple bash/shell script (“build script”):
check out a commit/tag of the submodule base-repo on which the stable
package build should be based on
copy files which are necessary for the package into the specific package folder
checks and builds the package
script can also create a history file of package
script can either be invoked manually or by a build server
This approach can also be combined with packrat. Additional code which is very package specific can now be also added to the package-repo and is under version control while independed from the base-repo
The approach could be further extended to trigger the build of packages from the package-repo based on pushes to the base-repo. Packages with a build script pointing to master as a commit will always be up to date and if under control of a build server it will ensure that changes to the base-repo will not break the package. Also it is possible to create several packages containing the same scripts from base-repo.
See also: git: symlink/reference to a file in an external repository

should I be (git)ignoring my knitr cache?

In a vein similar to this question: I'm writing a package and am using knitr to write a few documents in inst/doc/. Since I'm using github to host my repo (and I intend to point to people to that repo to get the package), I'm wondering if I should be version controlling my the caches of my various documents.
I ask this question because it's unclear where cache falls in the guidelines provided by this other question (which addresses when certain file types should and shouldn't be in the .gitignore of a repo).
Can anyone shed some light on to how package developers that use knitr and git are handling their caches?
If R CMD check passes without the knitr cache, and I think it would, I wouldn't include them. In fact, I suspect R CMD check would give a note about the cache files being present in the package. I know for LaTeX files, you only want to include the .tex file in the R package and in the version control. The other required files should be automatically generated on install.

Including patches with an installer using a basic msi Installshield project

I'm currently stuck with an Installshield project for installing our ASP.Net Application and need to implement upgrading. From my initial investigation it seems extremely complicated for what is essentially copying over a number of files.
Of the options available: patches and small, minor and major upgrades, what seems to most suit our needs is a patch but it is done as a separate .exe.
Is there a way to include patches in the full setup.exe or another recommendation that makes the whole process less complicated.
EDIT
Any alternative recommendation still needs to be done as part of an installer.
No, there is no way to include patches in the installer setup.exe. Patches, as well as small and minor updates, are applied to already installed application. I mean users already used the original installation package to install your application. And patch update contains only small set of files that are modified.
What you want is a major update. This kind of package contains all the required files, and it can be used to install the application for the first time. In case where the application is already installed, this kind of installation package will automatically remove the old version and install the new one.
If it involves only copying files then IMO, the best option is to give the bunch of files in needed directory structure and ask to overwrite existing copies. A slightly more user-friendly measure would be to zip up the directory structure along with a batch file and ask to unzip it in the app directory under some designated folder and then run the batch file to overwrite files.

Resources