Code organisation in R package development

Code organisation in R package development - r

When developing packages in R all R source files are put in the subdirectory R/, and all compiled code is put in the subdirectory src/.
I would like to add some organisation to files within these folders, rather than have everything dumped at the top level. For example, lets say I'm hypothetically developing a client-server application. Logically, I would like to organise all my client R source files in R/client/ and all my server R source files in R/server/.
Is it possible to organise code in subfolders when developing a package, and if so, how? The Writing R Extensions manual doesn't offer any guidance, nor does R CMD build detect files stored in subfolders under R/.

You can't use subfolders without additional setup (like defining a custom makefile). The best you can do is to use prefixes: client-a.r, client-b.r, server-a.r, server-b.r, etc.

Expanding the comment to Hadley's IMHO incorrect answer:
Look at the Matrix package (written by R Core members) which has five folders below src/, and two of these contain other subfolders. Other example is the Rsymphony packages (co-)written and maintained by an R Core member.
Doing this is not for the faint of heart. R strongly prefers a src/Makevars fragment over a full src/Makefile in order to be able to construct its own Makefile versions for the different subarchitectures. But if you know a little make and are willing to put the effort in, this is entirely doable -- and being done.
That still does not make it recommended though.

I argued with R core team Allow for sub-folders in "package/R/" directory . They seem not really want improve it. So my workflow is as follows.
1) Create an R project same as other packages but allow sub-directories in folder R/ such as
R/mcmc/a.R
R/mcmc/b.R
R/prediction/p1.R
R/predection/p2.R
2) When I need to pack them, I convert all files under R/ as
R/mcmc_a.R
R/mcmc_b.R
R/prediction_p1.R
R/predection_p2.R
...
with my package.flatten() function
3) Then I install the flattened version to R.
I wrote a simple script for Linux to do everything
https://github.com/feng-li/flutils/blob/master/inst/bin/install.HS

Recognizing the thread is a bit old, I just thought I'd throw in my solution to this problem. Note that my issue is similar, but I am only concerned with preserving folder hierarchies in development.
In development, I organize my script files in subfolders to my heart's content, but rather than fight R's flat hierarchy in production, I added my own "compile-time constant", so to speak.
That is, in every file located in a subfolder (not in top-level scripts/), I add the following:
if (!exists("script.debug"))
script.debug = FALSE
Then, I load whatever other dependencies are required as follows:
source.list <- c(
"script_1.R",
"script_2.R",
"script_3.R",
"script_4.R"
)
if (script.debug)
source.list <- paste("./script_subfolder/", source.list, sep="")
lapply(source.list, source)
The default assumption is that the code is in production, (source.debug = FALSE), so when in development, just ensure that source.debug = TRUE and the project's script/ folder is set as the working directory before loading any script files.
Of course, this example's a bit simple - it assumes that all script file dependencies exist in the same folder, but it seems a simple issue to devise a system that would suit more complicated development folder hierarchies.

Related

How can I parse subdirectories in an R package? [duplicate]

When developing packages in R all R source files are put in the subdirectory R/, and all compiled code is put in the subdirectory src/.
I would like to add some organisation to files within these folders, rather than have everything dumped at the top level. For example, lets say I'm hypothetically developing a client-server application. Logically, I would like to organise all my client R source files in R/client/ and all my server R source files in R/server/.
Is it possible to organise code in subfolders when developing a package, and if so, how? The Writing R Extensions manual doesn't offer any guidance, nor does R CMD build detect files stored in subfolders under R/.

You can't use subfolders without additional setup (like defining a custom makefile). The best you can do is to use prefixes: client-a.r, client-b.r, server-a.r, server-b.r, etc.

Expanding the comment to Hadley's IMHO incorrect answer:
Look at the Matrix package (written by R Core members) which has five folders below src/, and two of these contain other subfolders. Other example is the Rsymphony packages (co-)written and maintained by an R Core member.
Doing this is not for the faint of heart. R strongly prefers a src/Makevars fragment over a full src/Makefile in order to be able to construct its own Makefile versions for the different subarchitectures. But if you know a little make and are willing to put the effort in, this is entirely doable -- and being done.
That still does not make it recommended though.

I argued with R core team Allow for sub-folders in "package/R/" directory . They seem not really want improve it. So my workflow is as follows.
1) Create an R project same as other packages but allow sub-directories in folder R/ such as
R/mcmc/a.R
R/mcmc/b.R
R/prediction/p1.R
R/predection/p2.R
2) When I need to pack them, I convert all files under R/ as
R/mcmc_a.R
R/mcmc_b.R
R/prediction_p1.R
R/predection_p2.R
...
with my package.flatten() function
3) Then I install the flattened version to R.
I wrote a simple script for Linux to do everything
https://github.com/feng-li/flutils/blob/master/inst/bin/install.HS

Recognizing the thread is a bit old, I just thought I'd throw in my solution to this problem. Note that my issue is similar, but I am only concerned with preserving folder hierarchies in development.
In development, I organize my script files in subfolders to my heart's content, but rather than fight R's flat hierarchy in production, I added my own "compile-time constant", so to speak.
That is, in every file located in a subfolder (not in top-level scripts/), I add the following:
if (!exists("script.debug"))
script.debug = FALSE
Then, I load whatever other dependencies are required as follows:
source.list <- c(
"script_1.R",
"script_2.R",
"script_3.R",
"script_4.R"
)
if (script.debug)
source.list <- paste("./script_subfolder/", source.list, sep="")
lapply(source.list, source)
The default assumption is that the code is in production, (source.debug = FALSE), so when in development, just ensure that source.debug = TRUE and the project's script/ folder is set as the working directory before loading any script files.
Of course, this example's a bit simple - it assumes that all script file dependencies exist in the same folder, but it seems a simple issue to devise a system that would suit more complicated development folder hierarchies.

Can you package multiple forge mods into a single jar?

When you are using Minecraft forge, it creates an external /mods/ folder that you place your mods in. Is there a way to package all the mods, configuration settings (like splash.propreties) and assets into a single .jar file for ease of distribution?
I am making a custom mod pack, and I don't like the fact that you have to install forge, then download the mod pack, then install the mods in order to run my mod pack. Is there a way to package it into a single jar so that you can just add it as a profile in the launcher and not have to do anything else?
I also need to be able to modify configuration files like splash.propreties, and I would like these to be packaged into the jar as well.
Note: I do not want to use premade launchers like Twitch or Technic.

I'm not aware of a way to package all of your mods into a single jar; there is such thing as a 'fat jar', but it's not what you're after. StackOverflow is intended more for code so you might find some more helpful advice over on the Forge forums.
That being said, you could utilise a Self Extracting Archive (SFX) such as IExpress or 7zip to package all of your mods into one, compressed, file which when executed can extract the mods into the right place. As for having one profile, as I'm sure you're aware, within your /mods directory, you can have another directory for profile-specific mods. So you could have your mods extract to a /mods/modpackname/... directory. Forge doesn't have great documentation but this writeup explains the process step by step.
Hope that helps

Can UNIX shared libraries be merged into a single library?

I am experimenting with my own BSD or Linux distribution. I want to organize the system files in a way that makes sense to an end user. I want them to have access to the system without all the file clutter that *nixes leave around.
Is there a way to merge several dynamic libraries into a single file without losing dynamic linking? I will have access to all of the source files.

It might be system-dependent, but at least with ELF (the executable format used by Linux), this is not possible. With ELF, shared libraries are a bit like executables: they are a final product of the linking process and are not designed to be decomposed or relinked into a different arrangement.
If you have the source for all of the components that go into a bunch of shared libraries, I suppose you could link them all together into one giant shared library, but you would use object files (*.o) or archive libraries (*.a) as the input to produce such a library.
As alluded to in comments, there is unlikely to be a good reason to want to actually do this.

What folders are commonly used by version control systems?

I need to know what folder names are commonly used by Version Control systems. Many VCSs will create a hidden folder, usually in the top level of the source tree, to store the information that they use.
So far, I only know that Git uses .git/ and SVN uses .svn/.
What are the folder names that other popular VCSs use?

We could probably divide the VCSs into three groups:
Special Subdirectory in each directory
CVS
Subversion (.svn)
The advantage of this is that each directory in the working copy is a self-contained working copy: you can copy it out somewhere else and it will still work. The obvious disadvantage is the clutter. Using automatic tools to scan over one of these working copies need special filtering or they will return spurious results.
Single special directory for each working copy
Mercurial (.hg)
SVK
(Maybe Git, I'm not sure?)
Special file system support
ClearCase (dynamic view is a mounted FS; snapshot view is more similar to the single directory case)

Mercurial = a single .hg directory

What is your experience with non-recursive make? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A few years ago, I read the Recursive Make Considered Harmful paper and implemented the idea in my own build process. Recently, I read another article with ideas about how to implement non-recursive make. So I have a few data points that non-recursive make works for at least a few projects.
But I'm curious about the experiences of others. Have you tried non-recursive make? Did it make things better or worse? Was it worth the time?

We use a non-recursive GNU Make system in the company I work for. It's based on Miller's paper and especially the "Implementing non-recursive make" link you gave. We've managed to refine Bergen's code into a system where there's no boiler plate at all in subdirectory makefiles. By and large, it works fine, and is much better than our previous system (a recursive thing done with GNU Automake).
We support all the "major" operating systems out there (commercially): AIX, HP-UX, Linux, OS X, Solaris, Windows, even the AS/400 mainframe. We compile the same code for all of these systems, with the platform dependent parts isolated into libraries.
There's more than two million lines of C code in our tree in about 2000 subdirectories and 20000 files. We seriously considered using SCons, but just couldn't make it work fast enough. On the slower systems, Python would use a couple of dozen seconds just parsing in the SCons files where GNU Make did the same thing in about one second. This was about three years ago, so things may have changed since then. Note that we usually keep the source code on an NFS/CIFS share and build the same code on multiple platforms. This means it's even slower for the build tool to scan the source tree for changes.
Our non-recursive GNU Make system is not without problems. Here are some of biggest hurdles you can expect to run into:
Making it portable, especially to Windows, is a lot of work.
While GNU Make is almost a usable functional programming language, it's not suitable for programming in the large. In particular, there are no namespaces, modules, or anything like that to help you isolate pieces from each other. This can cause problems, although not as much as you might think.
The major wins over our old recursive makefile system are:
It's fast. It takes about two seconds to check the entire tree (2k directories, 20k files) and either decide it's up to date or start compiling. The old recursive thing would take more than a minute to do nothing.
It handles dependencies correctly. Our old system relied on the order subdirectories were built etc. Just like you'd expect from reading Miller's paper, treating the whole tree as a single entity is really the right way to tackle this problem.
It's portable to all of our supported systems, after all the hard work we've poured into it. It's pretty cool.
The abstraction system allows us to write very concise makefiles. A typical subdirectory which defines just a library is just two lines. One line gives the name of the library and the other lists the libraries this one depends on.
Regarding the last item in the above list. We ended up implementing a sort of macro expansion facility within the build system. Subdirectory makefiles list programs, subdirectories, libraries, and other common things in variables like PROGRAMS, SUBDIRS, LIBS. Then each of these are expanded into "real" GNU Make rules. This allows us to avoid much of the namespace problems. For example, in our system it's fine to have multiple source files with the same name, no problem there.
In any case, this ended up being a lot of work. If you can get SCons or similar working for your code, I'd advice you look at that first.

After reading the RMCH paper, I set out with the goal of writing a proper non-recursive Makefile for a small project I was working on at the time. After I finished, I realized that it should be possible to create a generic Makefile "framework" which can be used to very simply and concisely tell make what final targets you would like to build, what kind of targets they are (e.g. libraries or executables) and what source files should be compiled to make them.
After a few iterations I eventually created just that: a single boilerplate Makefile of about 150 lines of GNU Make syntax that never needs any modification -- it just works for any kind of project I care to use it on, and is flexible enough to build multiple targets of varying types with enough granularity to specify exact compile flags for each source file (if I want) and precise linker flags for each executable. For each project, all I need to do is supply it with small, separate Makefiles that contain bits similar to this:
TARGET := foo
TGT_LDLIBS := -lbar
SOURCES := foo.c baz.cpp
SRC_CFLAGS := -std=c99
SRC_CXXFLAGS := -fstrict-aliasing
SRC_INCDIRS := inc /usr/local/include/bar
A project Makefile such as the above would do exactly what you'd expect: build an executable named "foo", compiling foo.c (with CFLAGS=-std=c99) and baz.cpp (with CXXFLAGS=-fstrict-aliasing) and adding "./inc" and "/usr/local/include/bar" to the #include search path, with final linking including the "libbar" library. It would also notice that there is a C++ source file and know to use the C++ linker instead of the C linker. The framework allows me to specify a lot more than what is shown here in this simple example.
The boilerplate Makefile does all the rule building and automatic dependency generation required to build the specified targets. All build-generated files are placed in a separate output directory hierarchy, so they're not intermingled with source files (and this is done without use of VPATH so there's no problem with having multiple source files that have the same name).
I've now (re)used this same Makefile on at least two dozen different projects that I've worked on. Some of the things I like best about this system (aside from how easy it is to create a proper Makefile for any new project) are:
It's fast. It can virtually instantly tell if anything is out-of-date.
100% reliable dependencies. There is zero chance that parallel builds will mysteriously break, and it always builds exactly the minimum required to bring everything back up-to-date.
I will never need to rewrite a complete makefile again :D
Finally I'd just mention that, with the problems inherent in recursive make, I don't think it would have been possible for me to pull this off. I'd probably have been doomed to rewriting flawed makefiles over and over again, trying in vain to create one that actually worked properly.

Let me stress one argument of Miller's paper: When you start to manually resolve dependency relationships between different modules and have a hard time to ensure the build order, you are effectively reimplementing the logic the build system was made to solve in the first place. Constructing reliable recursive make build systems is very hard. Real life projects have many interdependent parts whose build order is non-trivial to figure out and thus, this task should be left to the build system. However, it can only resolve that problem if it has global knowledge of the system.
Furthermore, recursive make build-systems are prone to fall apart when building concurrently on multiple processors/cores. While these build systems may seem to work reliably on a single processor, many missing dependencies go undetected until you start to build your project in parallel. I've worked with a recursive make build system which worked on up to four processors, but suddenly crashed on a machine with two quad-cores. Then I was facing another problem: These concurrency issues are next to impossible to debug and I ended up drawing a flow-chart of the whole system to figure out what went wrong.
To come back to your question, I find it hard to think of good reasons why one wants to use recursive make. The runtime performance of non-recursive GNU Make build systems is hard to beat and, quite the contrary, many recursive make systems have serious performance problems (weak parallel build support is again a part of the problem). There is a paper in which I evaluated a specific (recursive) Make build system and compared it to a SCons port. The performance results are not representative because the build system was very non-standard, but in this particular case the SCons port was actually faster.
Bottom line: If you really want to use Make to control your software builds, go for non-recursive Make because it makes your life far easier in the long run. Personally, I would rather use SCons for usability reasons (or Rake - basically any build system using a modern scripting language and which has implicit dependency support).

I made a half-hearted attempt at my previous job at making the build system (based on GNU make) completely non-recursive, but I ran into a number of problems:
The artifacts (i.e. libraries and executables built) had their sources spread out over a number of directories, relying on vpath to find them
Several source files with the same name existed in different directories
Several source files were shared between artifacts, often compiled with different compiler flags
Different artifacts often had different compiler flags, optimization settings, etc.
One feature of GNU make which simplifies non-recursive use is target-specific variable values:
foo: FOO=banana
bar: FOO=orange
This means that when building target "foo", $(FOO) will expand to "banana", but when building target "bar", $(FOO) will expand to "orange".
One limitation of this is that it is not possible to have target-specific VPATH definitions, i.e. there is no way to uniquely define VPATH individually for each target. This was necessary in our case in order to find the correct source files.
The main missing feature of GNU make needed in order to support non-recursiveness is that it lacks namespaces. Target-specific variables can in a limited manner be used to "simulate" namespaces, but what you really would need is to be able to include a Makefile in a sub-directory using a local scope.
EDIT: Another very useful (and often under-used) feature of GNU make in this context is the macro-expansion facilities (see the eval function, for example). This is very useful when you have several targets which have similar rules/goals, but differ in ways which cannot be expressed using regular GNU make syntax.

I agree with the statements in the refered article, but it took me a long time to find a good template which does all this and is still easy to use.
Currenty I'm working on a small research project, where I'm experimenting with continuous integration; automatically unit-test on pc, and then run a system test on a (embedded) target. This is non-trivial in make, and I've searched for a good solution. Finding that make is still a good choice for portable multiplatform builds I finally found a good starting point in http://code.google.com/p/nonrec-make
This was a true relief. Now my makefiles are
very simple to modify (even with limited make knowledge)
fast to compile
completely checking (.h) dependencies with no effort
I will certainly also use it for the next (big) project (assuming C/C++)

I have developed a non-recursive make system for a one medium sized C++ project, which is intended for use on unix-like systems (including macs). The code in this project is all in a directory tree rooted at a src/ directory. I wanted to write a non-recursive system in which it is possible to type "make all" from any subdirectory of the top level src/ directory in order to compile all of the source files in the directory tree rooted at the working directory, as in a recursive make system. Because my solution seems to be slightly different from others I have seen, I'd like to describe it here and see if I get any reactions.
The main elements of my solution were as follows:
1) Each directory in the src/ tree has a file named sources.mk. Each such file defines a makefile variable that lists all of the source files in the tree rooted at the directory. The name of this variable is of the form [directory]_SRCS, in which [directory] represents a canonicalized form of the path from the top level src/ directory to that directory, with backslashes replaced by underscores. For example, the file src/util/param/sources.mk defines a variable named util_param_SRCS that contains a list of all source files in src/util/param and its subdirectories, if any. Each sources.mk file also defines a variable named [directory]_OBJS that contains a list of the the corresponding object file *.o targets. In each directory that contains subdirectories, the sources.mk includes the sources.mk file from each of the subdirectories, and concatenates the [subdirectory]_SRCS variables to create its own [directory]_SRCS variable.
2) All paths are expressed in sources.mk files as absolute paths in which the src/ directory is represented by a variable $(SRC_DIR). For example, in the file src/util/param/sources.mk, the file src/util/param/Componenent.cpp would be listed as $(SRC_DIR)/util/param/Component.cpp. The value of $(SRC_DIR) is not set in any sources.mk file.
3) Each directory also contains a Makefile. Every Makefile includes a global configuration file that sets the value of the variable $(SRC_DIR) to the absolute path to the root src/ directory. I chose to use a symbolic form of absolute paths because this appeared to be the easiest way to create multiple makefiles in multiple directories that would interpret paths for dependencies and targets in the same way, while still allowing one to move the entire source tree if desired, by changing the value of $(SRC_DIR) in one file. This value is set automatically by a simple script that the user is instructed to run when the package is dowloaded or cloned from the git repository, or when the entire source tree is moved.
4) The makefile in each directory includes the sources.mk file for that directory. The "all" target for each such Makefile lists the [directory]_OBJS file for that directory as a dependency, thus requiring compilation of all of the source files in that directory and its subdirectories.
5) The rule for compiling *.cpp files create a dependency file for each source file, with a suffix *.d, as a side-effect of compilation, as described here: http://mad-scientist.net/make/autodep.html. I chose to use the gcc compiler for dependency generation, using the -M option. I use gcc for dependency generation even when using another compiler to compile the source files, because gcc is almost always available on unix-like systems, and because this helps standardize this part of the build system. A different compiler can be used to actually compile the source files.
6) The use of absolute paths for all files in the _OBJS and _SRCS variables required that I write a script to edit the dependency files generated by gcc, which creates files with relative paths. I wrote a python script for this purpose, but another person might have used sed. The paths for dependencies in the resulting dependency files are literal absolute paths. This is fine in this context because the dependency files (unlike the sources.mk files) are generated locally and rather than being distributed as part of the package.
7) The Makefile in each director includes the sources.mk file from the same directory, and contains a line "-include $([directory]_OBJS:.o=.d)" that attempts to include a dependency files for every source file in the directory and its subdirectories, as described in the URL given above.
The main difference between this an other schemes that I have seen that allow "make all" to be invoked from any directory is the use of absolute paths to allow the same paths to be interpreted consistently when Make is invoked from different directories. As long as these paths are expressed using a variable to represent the top level source directory, this does not prevent one from moving the source tree, and is simpler than some alternative methods of achieving the same goal.
Currently, my system for this project always does an "in-place" build: The object file produced by compiling each source file is placed in the same directory as the source file. It would be straightforward to enable out-of place builds by changing the script that edits the gcc dependency files so as to replace the absolute path to the src/ dirctory by a variable $(BUILD_DIR) that represents the build directory in the expression for the object file target in the rule for each object file.
Thus far, I've found this system easy to use and maintain. The required makefile fragments are short and comparatively easy for collaborators to understand.
The project for which I developed this system is written in completely self-contained ANSI C++ with no external dependencies. I think that this sort of homemade non-recursive makefile system is a reasonable option for self-contained, highly portable code. I would consider a more powerful build system such as CMake or gnu autotools, however, for any project that has nontrivial dependencies on external programs or libraries or on non-standard operating system features.

I know of at least one large scale project (ROOT), which advertises using [powerpoint link] the mechanism described in Recursive Make Considered Harmful. The framework exceeds a million lines of code and compiles quite smartly.
And, of course, all the largish projects I work with that do use recursive make are painfully slow to compile. ::sigh::

I written a not very good non-recursive make build system, and since then a very clean modular recursive make build system for a project called Pd-extended. Its basically kind of like a scripting language with a bunch of libraries included. Now I'm also working Android's non-recursive system, so that's the context of my thoughts on this topic.
I can't really say much about the differences in performance between the two, I haven't really paid attention since full builds are really only ever done on the build server. I am usually working either on the core language, or a particular library, so I am only interested in building that subset of the whole package. The recursive make technique has the huge advantage of making the build system be both standalone and integrated into a larger whole. This is important to us since we want to use one build system for all libraries, whether they are integrated in or written by an external author.
I'm now working on building custom version of Android internals, for example an version of Android's SQLite classes that are based on the SQLCipher encrypted sqlite. So I have to write non-recursive Android.mk files that are wrapping all sorts of weird build systems, like sqlite's. I can't figure out how to make the Android.mk execute an arbitrary script, while this would be easy in a traditional recursive make system, from my experience.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex