node_load or direct query? - drupal

What rule of thumb do you use for deciding to use node_load() or just writing a direct db_query()?
In a situation I'm looking at right now I need to get some node data and resolve data on two nodereference fields. So that would be 3 calls to node_load(). At some point here, would it be more efficient to construct the query with Joins directly?
This is for use in a self contained module that won't be distributed or used anywhere else, so I don't believe I need to worry about subverting node modification hooks (or do I?).
Edit:
Thinking about my question more, node_load() is only really applicable when you have one node to grab (and then maybe drilling down further into nodereferences like in my example). But as soon as you need to return more than one node based on some criteria, you're pretty much forced to use db_query right? Does Drupal have any abstracted API for writing queries like this?

Not a full answer (Not sure myself), just some hints.
node_load() is using a static cache (in Drupal 7, you can even use the entity_cache module to make it a permanent cache). If the nodes you are loading are being used a second time on the same page, that call will be free.
Querying CCK-tables is tricky. The schema structure can change completely based on configuration, for example when using a single or multiple values.

The reasoning behind using API methods for DB calls over direct DB calls is to provide a DB abstraction layer so that your app could move between supported database engines etc, also it enables your app to gracefully handle any schema changes (however unlikely) that core/module may make to the tables in question. It's also likely easier as #Berdir says for CCK fields and Node_Ref fields, but that depends on which you are more confident with Drupal API& PHP or MySQL...the payoff of doing it the Drupal way is increased future productivity and understanding of the codebase and what is possible :)
Oh and my rule of thumb is - Do it the Drupal way if at all possible (possible being variable depending on app time/cost/performance/whatever requirements)

Related

What are the rules for deciding when a new catalog should be created?

I'd like to learn about using catalogs correctly.
I have about 30 useful content types, about 50 indexes in catalog.xml, and about 45 metadatas. There are just three types which account for most of the site's data - and I may need millions of these. I've been reading, and there's lots to do, but I want to have the basic configuration right before I begin all that.
This page told me that any non-default indexes should not be added to the portal_catalog. I've even read people explaining how removing one, or two of the default indexes makes a performance difference.
My question is: what are the rules for dividing up the indexes into different catalogs, and for selecting which catalog(s) index which type(s)?
So far I have created one additional catalog, used to catalog all indexes for my 'site-setup' objects (which I have caused to no longer be indexed in portal_catalog). The site-setup indexes are very often used, but more rarely modified than others, so I thought it was correct to separate them from objects which are reindexed more often. I'm not sure if that's the main consideration though.
Another similar question (a good example of the kind of thing I want to solve): how would you handle something like secondary workflow review_state variables? I give each workflow's review_state variable an index (and search on them quite often), but some of my workflows are only used on just a few types. (my most prolific objects have secondary workflows...)
I'd be very grateful for advice!
Campbell
This won't cover everything but I'll bring up some points..
Anything not in the portal_catalog won't work with collections, folder_contents view, getFolderContents method, search, portlet collections, related items(I think) and anything else the assumes you're using the portal_catalog.
I like to use an additional catalog when I need to be able to query the data but it only affects a sub-set of the content objects.
Use collective.indexing to speed up indexing operations.
Mount the catalogs on their own mount points so you can cache them differently from the rest of the site(so you can cache the whole catalog). Then, you can even serve the the catalogs from dedicated zeoserver.
Also, if your content doesn't have to be cataloged by the portal_catalog(with all the constraints listed), you may even want to think about if you need it as a full-fledged (archetype|dexterity) type in the first place. You can use a more slim repoze.catalog to catalog arbitrary objects(which could be very simple data) for whatever your purpose is and get even more performance. Or better yet, look into Solr for indexing it for VERY good performance.
On more thing, depending on the type of data you're storing, you could even look into using a relational database for a data store. But I don't know what kind of queries, indexes, data, etc you have...
30 different types seems like a lot but I don't know what your use case is. Care to share? Perhaps there is a better way to do it.

How do I do this in Drupal?

Im currently evaluating Drupal to see if we can use it to replace our framework. My problem is I have this legacy tables which I would want to try to reflect in Drupal. It involves a join table. There's quite a lot of this kind of relationship in our existing web app so I am looking for possible ways to solve it.
Thank you for your insight!
There are several ways to do this, and it's hard to know which is best with no context about what you're actually doing with the data, but here are some options:
One way to do this is to make a content type representing each table (using CCK) with the foreign keys represented by type-specific node reference fields. Doing everything as nodes gives you a bunch of prebuilt functionality around nodes, but has a bit of overhead you may want to avoid.
Another option is to leave your database just like it is now. Drupal can do direct database queries, or you can use Data to expose your tables to Views.
Another option, if those referenced tables really only have 1 non-ID field, is to do the project_companies_assignments as nodes and do the other 3 as taxonomies. But this won't work if those are really more complex entities, and wouldn't be very flexible if they might become more complex.
What about using hook_views_api and exposing your legacy tables in hook_views_data? i tried something like this myself - not sure if that is what you want...
try and let me know if that works for you.
http://drupalwalla.blogspot.com/2011/09/how-do-you-expose-your-legacy-database.html
Going with Views and CCK, optionally with the additional Data module has one huge disadvantage: it comes with complexity.
My preferred alternative, is to write your own module. Drupal offers little help wrt database abstraction, it comes not with a proper ORM or such. But with some simple CRUD functions for the data in the database, a few simple forms in front, and a menu-callback with some pages to present the data, you can -quite often- get your datamodel worked out much faster then going the route of the overly complex, often poorly documented CCK and views modules. KISS.

When not to use a Drupal node?

I've recently created a very simple CRUD table where the user stores some data. For the data, I created a custom node. The functionality works great for creating, editing, and deleting data in the CRUD table using the basic node functionality (I'm actually amazed how fast and easy it was to program the basic functionality with proper access controls using only a tiny bit of code)....
Since the data isn't meant to be treated the same way as 'content' such as a blog post (no title, no body, no commments, no revisions, shouldn't show up on ?q=node page, no previews, no teasers, etc)... I find that I'm spending most of my time 'turning off' and modifying the stuff that drupal does automatically for nodes.
I know its a matter of taste, but where should one draw the line on what should be treated as a node and what shouldn't? In other words, would it be better to program this stuff from scratch without using nodes?
Using nodes for custom data has quite some additional benefits besides easy edit/update/delete functionality:
possible categorization via taxonomy
implicit 'ownership' via author tracking
implicit tracking of creation/modification time
basic access control by default, expandable by a huge selection of modules
flexible query generation/listing/filtering via views
possible ad hoc extensions/annotations via CCK fields
possible definition of workflows, actions and the like
a huge number of hooks to programmatically intercept/adjust almost every usage aspect/scenario
commenting, voting, rating and tons of other functionality provided by all contributed modules that work on/with nodes ...
Given all this, I'd say you need a very good reason to not use nodes to store data in Drupal. Nodes are simply the fundamental building blocks for just about everything in the Drupal ecosystem, and the overhead of removing some unwanted default 'features' seems pretty small in comparison to the gains.
That said, one possible reason/argument to handle data separate from the node system might be if that data is directly aimed at annotating other nodes (think taxonomy). But since you can easily reference nodes from other nodes (with lots of different options on how to do this), the argument is not to strong.
Another (much stronger) argument would be data integrity - Drupal is not very strong (to put it politely) concerning normalized, relational data storage, referential integrity, transaction handling and other related topics. If you have requirements in that direction, you might have no choice but to skip the node concept and create and maintain a separate data island within the system on your own.
It helps to think also that a node doesn't need to be public either. Some nodes are private/internal and can be controlled further with access controls. The way you are doing it, whatever you're doing, makes all the scalability and extending it on your shoulders.
I would probably approach it with CCK/Taxonomy depending on what I was doing. That way, I get the added benefit of Views/Panels/etc module integration without writing any additional code.

How can I handle parameterized queries in Drupal?

We have a client who is currently using Lotus Notes/Domino as their content management system and web server. For many reasons, we are recommending they sunset their Notes/Domino implementation and transition onto a more modern platform--such as Drupal.
The client has several web applications which would be a natural fit for Drupal. However, I am unsure of the best way to implement one of the web applications in Drupal. I am running into a knowledge barrier and wondered if any of you could fill in the gaps.
Situation
The client has a Lotus Domino application which serves as a front-end for querying a large DB2 data store and returning a result set (generally in table form) to a user via the web. The web application provides access to approximately 100 pre-defined queries--50 of which are public and 50 of which are secured. Most of the queries accept some set of user selected parameters as input. The output of the queries is typically returned to users in a list (table) format. A limited number of result sets allow drill-down through the HTML table into detail records.
The query parameters often involve database queries themselves. For example, a single query may pull a list of company divisions into a drop-down. Once a division is selected, second drop-down with the departments from that division is populated--but perhaps only departments which meet some special criteria--such as those having taken a loss within a specific time frame. Most queries have 2-4 parameters with the average probably being 3.
The application involves no data entry. None of the back-end data is ever modified by the web application. All access is purely based around querying data and viewing results.
The queries change relatively infrequently, and the current system has been in place for approximately 10 years. There may be 10-20 query additions, modifications, or other changes in a given year. The client simply desires to change the presentation platform but absolutely does not want to re-do the 100 database queries.
Once the project is implemented, the client wants their staff to take over and manage future changes. The client's staff have no background in Drupal or PHP but are somewhat willing to learn as necessary.
How would you transition this into Drupal? My major knowledge void relates to how we would manage the query parameters and access the queries themselves. Here are a few specific questions but feel free to chime in on any issue related to this implementation.
Would we have to build 100 forms by hand--with each form containing the parameters for a given query? If so, how would we do this?
Approximately how long would it take to build/configure each of these forms?
Is there a better way than manually building 100 forms? (I understand using CCK to enter data into custom content types but since we aren't adding any nodes, I am a little stuck as to how this might work.)
Would it be possible for the internal staff to learn to create these query parameter forms--even if they are unfamiliar with Drupal today? Would they be required to do any PHP programming?
How would we take the query parameters from a form and execute a query against DB2? Would this require a custom module? If so, would it require one module total or one module per query? (Note: There is apparently a DB2 driver available for Drupal. See http://groups.drupal.org/node/5511.)
Note: I am not looking for CMS recommendations other than Drupal as Drupal nicely fits all of the client's other requirements, and I hope to help them standardize on a single platform.
Any assistance you can provide would be helpful. Thank you in advance for your help!
Have a look at the Data module - it might be able to get you a long way towards a solution.
The biggest problem you are likely to have is connecting to the DB2 server through Drupal since it's DBA layer doesn't support it without patches (as you've discovered).
A couple things come to mind.
You can use Table Wizard to expose any custom tables to views. Views basically gives you a UI for writing custom sql queries. Once your tables are exposed to views you can use filters to "parameterize" the query. Also, views supports many display options including tables and pagers.
If you do need to write your own forms for advanced queries, take a look at the Drupal Form API which makes creating custom php forms a peice of cake.
Views is fairly easy to learn and, in my experience, most clients pick it up quickly. It really depends on the complexity of the query.
Form API should be easy for a developer to pick up, but does require a basic knowledge of php and writing Drupal code.
Does Lotus provide a web service to query it? If so, you can easily do this using a custom form (in your own module) and use Services module to query the data.

Best way to save extra data for user in Drupal 6

I am developing a site that is saving non visible user data upon login (e.g. external ID on another site). We are going to create/save this data as soon as the account is created.
I could see us saving data using
the content profile module (already in use on our side)
the profile module
the data column in the user table
creating our own table
I feel like #1 is probably the most logical place, however creating a node within a module does not seem to be a trivial thing.
#3 feels like a typical way to solve this, but just having a bunch of serialized data in a catchall field does not feel like the best design.
Any suggestions?
IMO, each option has its pro's and con's, and you should be the one to make the final call, given that you are the only one to know what your project is about, what are the critical points of the project, what is the expected typical user pattern, what are the resources available, etc...
If I was totally free to chose, my personal favourites would be option #4, #1 and #5 (wait! #5? Yes: see below!). My guiding principles in making the choice would be:
Keep it clean
Keep it simple
Make it extensible
#1 - The content profile module
This would be a clean solution in that you would make easier for developer to maintain your code, as all the alteration to the user would pass through the same channel, and it would be easier to track down problems or add new functionality.
I do not find it particularly simple as it requires you to interact with the custom API of that module.
As for extensibility that depends from how well the content profile module API is designed. The temptation could be to simply use the tables done by said module for your purpose, bypassing the API's but that exposes you to the possibility that on a critical security update one day in which you are in a hurry the entire system will break down because the schema has changed...
#4 - Creating your own table
This would be a clean solution because you could design your table (and your module to do exactly what you need to), and you could create your own API to be used by other modules. On the other hand you would introduce yet another piece of code altering the registration process, and this might make it more difficult for devs to track problems and expand the system in a consistent way.
This would be very simple to implement code-wise. Also the DB design would benefit though: another thing to consider is that the tables would be very easy to inspect and query. Creating a new handler for views is dead easy in most of the cases: 4 out of 5 you simply use one of the prototype objects shipping with views.
This would be extremely easy to extend, of course. Once you created the module for one field, you could manage as many as you want by mostly copy/pasting the code for one field to another (or to inherit from the same ancestor if you go OOP).
I understand you are already knowledgeable about drupal, but if you need a pointer on how to do that, I gave some direction in this other answer.
#5 - Creating your own table and porting already existing fields there
That would essentailly be #4 minus the drawback of scattering functionality across various modules... Of course if you are already managing 200 fields the other way this is not a viable option, but if you are early into your design, you could consider this.
In my experience nearly every project that requires system integration (meaning: synchronising data for the same user in multiple systems) has custom needs for user registration, and I found this solution the one that suits my need best for two reasons:
I found I reuse a lot of the custom code I wrote from project to project.
It's the most flexible way to integrate data with the other system (in some case I even split data for the user in two custom tables managed by the same module: one that contains custom fields used by drupal only and one that contains the "non visible fields", as you call them. I find this very handy in a lot of scenarios, as it makes very easy to inspect and manipulate the data of the two systems separately.
HTH!
If you're already using the Content Profile module, then I'd really suggest continuing to use it and attach the field to it. You're saving the data when the account is created, and creating a node for the user at the same time isn't that hard. Really.
$node = new stdClass();
$node->title = $user->name; // what I'd use, or you can have node auto title handle the title for you. Up to you!
$node->field_hidden_field[0]['value'] = '$*#$f82hff';
$node->uid = $user->uid;
node_save($node);
Bob's your uncle!
I would go for option 3. Eventually even other modules will store the data in the database against a user. So you could directly save it yourself probably in a more efficient way than those modules.
Of course depending on the size of the data, you can take a call whether to add a new column in the users table or to create a new table for your data with the user's id as the foreign key.

Resources