Organizing Drupal Code - drupal

How do you like to organize your Drupal code? One giant module? Separate modules per feature? Separate modules per code type (theme functions, menu hooks, etc...)?
I've started by trying to organize by feature, treating modules like they were libraries. Ultimately though things are never perfectly contained... modules want to use each other's theme functions, and modules are all contributing various tabs to a common page -- two examples of it not always being so clear where to find code. This tempts me to keep all theme functions together, and all hook_menus together, but this would be awkward for other reasons...
Assume that all code is too specific to eventually share, so there's no attempt here to make self contained contributed modules. I'm mostly concerned about maintaining sanity and cleanliness in a large scale Drupal site.

I tend to have a folder with one main module with all the shared functions, and a variety of sub-modules that are broken up by logical functional divisions. I've found the single huge module approach makes finding stuff in it rather unfun.
It really doesn't make much of a difference if you're not distributing it on Drupal.org, though, so whatever makes sense to you is fine.

I load all customizations into a single module per project (menu/form/link alters, etc.). If enough customization is done, I will fork the original module or create a new module with the original module as a dependency. It's at this point that it pretty subjective: I have no hard and fast rule saying 'fork a module when I reach this many function points or lines of code'.
Anything that adds functionality (meaning that it doesn't override something else) goes into it's own module.
If any newly created or forked modules can be used in other projects or contexts, I will publish them to my personal repository.

I most often use a single module and a set of include files where I store my classes. Although views uses more than one module, it is a great example of the this strategy. Take a look at the views module includes folder to see what I mean.

Related

What is the best practice for transferring objects across R projects?

I would like to use R objects (e.g., cleaned data) generated in one git-versioned R project in another git-versioned R project.
Specifically, I have multiple git-versioned R projects (that hold drake plans) that do various things for my thesis experiments (e.g., generate materials, import and clean data, generate reports/articles).
The experiment-specific projects should ideally be:
Connectable - so that I can get objects (mainly data and materials) that I generated in these projects into another git-versioned R project that generates my thesis report.
Self-contained - so that I can use them in other non-thesis projects (such as presentations, reports, and journal manuscripts). When sharing such projects, I'd ideally like not to need to share a monolithic thesis project.
Versioned - so that their use in different projects can be independent (e.g., if I make changes to the data cleaning for a manuscript after submitting the thesis, I still want the thesis to be reproducible as it was originally compiled).
At the moment I can see three ways of doing this:
Re-create the data cleaning process
But: this involves copy/paste, which I'd like to avoid, especially if things change upstream.
Access the relevant scripts/functions by changing the working directory
But: even if I used here it seems that this would introduce poor reproducibility.
Make the source projects into packages and make the objects I want to "export" into exported data (as per the data section of Hadley's R packages guide)
But: I'd like to avoid the unnecessary metadata, artefacts, and noise (e.g., see Miles McBain's "Project as an R package: An okay idea") if I can.
Is there any other way of doing this?
Edit: I tried #landau's suggestion of using a single drake plan, which worked well for a while, until (similar to #vrognas' case) I ended up with too many sub-projects (e.g., conference presentations and manuscripts) that relied on the same objects. Therefore, I added some clarifications above to my intentions with the question.
My first recommendation is to use a single drake plan to unite the stages of the overall project that need to share data. drake is designed to handle a lot of moving parts this way, and it will be more seamless when it comes to drake's decisions about what to rerun downstream. But if you really do need different plans in different sub-projects that share data, you can track each shared dataset as a file_out() file in one plan and track it with file_in() in another plan.
upstream_plan <- drake_plan(
export_file = write_csv(dataset, file_out("exported_data/dataset.csv"))
)
downstream_plan <- drake_plan(
dataset = read_csv(file_in("../upstream_project/exported_data/dataset.csv"))
)
You fundamentally misunderstood Miles McBain’s critique. He isn’t saying that you shouldn’t write reusable code nor that you shouldn’t use packages. He’s saying that you shouldn’t use packages for everything. But reusable code (i.e. code that you want to reuse) absolutely belongs in packages (or, better, modules), which can then be used in multiple projects.
That being said, first off, pay attention to Will Landau’s advice.
Secondly, you can make your RStudio projects configurable such that they can load data based on paths given in a configuration. Once that’s accomplished, nothing speaks against hard-coding paths to data in different projects inside that config file.
I am in a similar situation. I have many projects that are spawned from one raw dataset. Previously, when the project was young and small, I had it all in one version controlled project. This got out of hand as more sub-projects were spawned and my git history got cluttered from working on projects in parallel. This could be to my lack of skills with git. My folder structure looked something like this:
project/.git
project/main/
project/sub-project_1/
project/sub-project_2/
project/sub-project_n/
I contemplated having each project in its own git branch, but then I could not access them simultaneously. If I had to change something to the main dataset (eg I might have not cleaned some parts) then project 1 could become outdated and nonfunctional. Once I had finished project 1, I would have liked it to be isolated and contained for reproducibility. This is easier to achieve if the projects are separated. I don't think a drake/targets plan would solve this?
I also looked briefly into having the projects as git submodules but it seemed to add too much complexity. Again, my git ignorance might shine through here.
My current solution is to have the main data as an R-package, and each sub-project as a separate git-versioned folder (they are actually packages as well, but this is not necessary). This way I can load in a specific version of the data (using renv for package versions).
My folder structure now looks something like this:
main/.git
sub-project_1/.git
sub-project_2/.git
sub-project_n/.git
And inside each sub-project, I call library(main) to load the cleaned data. Within each sub-project, a drake/targets plan could be used.

Julia: packaging things into modules vs include()-ing them

I'm building a simulation in Julia and I have my code split across a bunch of files. Are there any benefits to wrapping everything in modules versus simplying include()-ing them in the runscript?
I have something like the following at the top of my runscript right now:
for filename in split(readall(`git ls-files`))
#everywhere include(filename)
end
I'm not planning to use the code outside of this immediate project, but I am running the simulation in parallel. Is there any benefit in creating modules?
I would say that the most important benefit is modularity :)
If you have different files that deal with different things, splitting the code into modules let's you keep track on the dependencies between the modules:
Which functions are purely implementation details of the given module and subject to change?
Which modules depend on which other modules?
It also lets you reuse the same name for different things in the different modules if you need to, if you're a little careful of what you export. (You can still access those names from the outside as qualified names)
For an example of such organisation, you can look at my repo https://github.com/toivoh/Debug.jl

adding modules to my system

I'm designing an information system (in asp.net) in witch will be handling different modules once its done.
and i don't have enough time nor money to make all of the modules at once, so I've decided to do a few modules first and later on when i have time or money continue with the reset of them.
Now the question is: is there a generic way to call a module from a list for example:I would create a directory with modules where i'm planning to drop the .dll of the modules, so when i make a new one i will put the new .dll there. On the other hand, i want to build something like a skeleton where i will generically call all the modules in the directory I've made via code, without having to re write the code of the skeleton whenever new modules are dropped into the directory. finally I've planned that each module should have three layers one for db accessing another one for logic and a the last one for interface drawing so each module should be independent of each other.
is it possible? how should i do this I've been looking but cant find anything yet.
is there a better way you suggest?
You would definitely need to create common interfaces that modules implement and common data contracts. If you need to load dlls dynamically - it is possible but you would need to use reflection. Look here:
http://dranaxum.wordpress.com/2008/02/25/dynamic-load-net-dll-files-creating-a-plug-in-system-c/

Documentation Generation - What boxes should I aim to tick?

I'm looking at requiring my team to document their code more thoroughly for some major upcoming projects and to make life a little less painful, I am steering towards XML documentation generators such as Sandcastle, Doxygen or Box Live Documenter.
What are the key considerations I should keep in mind when evaluating the best option and what experiences have led you to a particular decision?
For me the key considerations would be:
Fully automated: Can it be set up in such a way so that pretty much
no outside work is required to
create or edit the documentation.
Fully styled: Can the documentation be fully styled so
that it looks great in a wiki or pdf
after it’s generated. I should be
able to change colors, font sizes,
layouts, etc.
Good Filtering: Can I select only the items I want to be
generated. I should be able to
filter the namespaces, file types,
classes, etc.
Customization: Can I include headers, footers, custom elements,
etc.
I found Doxygen could do all of this. Our workflow is as follows:
Developer makes a change to the code
They update the documentation tags right above the code they just changed
We click a generate button
Doxygen will then extract all the XML documentation from the code, filter it to only include the classes and methods we want, and apply the CSS styling we’ve pre-made for it. Our end result is an internal wiki that looks the way we want, and doesn’t require editing.
Extra: We have all our projects in various git repositories. We pull all these down to one root folder and generate the docs form this root folder..
Would be interested to know how others are automating even further..?
Who is paying for the documentation and why? (is the system stable enough, does it add enough value)
Who is going to read it, and why is she not using a more effective communication channel?
(if correct mostly distance in time/place)
Who is going to keep it up to date.
When are you going to destroy it? (Automatically if it hasn't been read or updated in the past three months?)
I mostly prefer better code to make my life less painful, over more documentation, but I like scenario & unit tests and a high level architecture description.
[edit] Documentation costs time and money to write and keep up to date. JavaDoc style documentation has a serious detrimental effect on the amount of code simultaneously visible and might be a good idea for the developers using the code, but not for those writing it.

Module Multi-instance in OrchardCMS

assuming i have a contacts orchard module which manages contacts
can i have two instance like so
mysite.com/WorkContacts/...
mySite.com/HomeContacts/....
and have the data partitioned by instance/location type etc.
I assume it should be but want to be sure before i dig any deeper
It's not possible by default (although I'm not saying impossible at all).
Each module has it's unique, hardcoded Id which prevents multi-instancing of modules by design. There are also many other reasons why it wouldn't be a good idea...
Achieving such behavior is possible of course, but in slightly different way. As Orchard is mainly about content, you are free to build your own, different content types for different contact types from existing parts and fields. And then you're free to create instances of those. It's described very well here.
HTH
This would probably be better asked over on the Orchard sites.
If you look at blogs functionality you can have multiple of those, following a similar pattern of code you could have multiple of the contacts modules.
The path /HomeContacts ... etc would be set through the routing functionality of Orchard.
I think what you're looking for might the multi-tenancy module, available from the gallery. The only difference with what you describe is that the instances would need different server names rather than subfolders like you decribed.
Then again it's not quite clear whether you only want to separate just the data for that module (in which case the suggestion to model it after blog is a good one) or for the whole site (that would be multi-tenancy).

Resources