Threaded comment system nested set - multiple roots? - nested-sets

I'm implementing a threaded comment system for topics ie there is a topic and then threaded comments on each topic.
Nested set 'seems' like the popular way to go, but how should I implement the roots of each thread? For example:
the comments could be one massive nested set. I may be wrong but it seems like it would be slower all as one tree.
the comments could have one root thread for each topic. But then it would seem like I would have to make a blank root for the each topic, having blank roots just seems odd.
each first level comment could be a root. This eliminates the blank root but seems like there would be a ton of root threads and say to render a page with 50 first level commments would I have to do 50 queries :S.
Am I missing something here, is there a better way to do this? I'm leaning towards blank root but it doesn't quite seem right
Thanks.

I think you'd usually have a separate foreign key for ‘owner topic’ in the comments table, rather than hide that information in the nested-set structure. IMO the left/right pair should only dictate nesting/ordering inside the owner context.
Then it doesn't matter if you have a single root or multiple-root nested set structure. (FWIW, I use multiple-root.) You can certainly grab all the comments for one topic in a single query either way.

Related

Drupal Views Exposed Filters with approximate matches

I've got a view with exposed filters to help find matches out of thousands of entries. What I'm looking for is exact matches up top (this is done and working) followed by "approximate" matches underneath. The approximate matches may have one or two elements that are not what the user specified, but should be presented as options anyway. Are there any modules that support this functionality?
You could create 2 different views. One being the page you already created with the exposed filter, and the other being a block with almost the exact same settings, but make the filter less strict (ex. "contains" vs. "is equal to"). Then you could print the block below your results on the page, possibly in the footer, and call it something along the lines of "Approximate Matches". If you're not sure how to print a block, there's a good description here.
There may be a more efficient way of doing this, but this is the first thing that came to mind.

What are the rules for deciding when a new catalog should be created?

I'd like to learn about using catalogs correctly.
I have about 30 useful content types, about 50 indexes in catalog.xml, and about 45 metadatas. There are just three types which account for most of the site's data - and I may need millions of these. I've been reading, and there's lots to do, but I want to have the basic configuration right before I begin all that.
This page told me that any non-default indexes should not be added to the portal_catalog. I've even read people explaining how removing one, or two of the default indexes makes a performance difference.
My question is: what are the rules for dividing up the indexes into different catalogs, and for selecting which catalog(s) index which type(s)?
So far I have created one additional catalog, used to catalog all indexes for my 'site-setup' objects (which I have caused to no longer be indexed in portal_catalog). The site-setup indexes are very often used, but more rarely modified than others, so I thought it was correct to separate them from objects which are reindexed more often. I'm not sure if that's the main consideration though.
Another similar question (a good example of the kind of thing I want to solve): how would you handle something like secondary workflow review_state variables? I give each workflow's review_state variable an index (and search on them quite often), but some of my workflows are only used on just a few types. (my most prolific objects have secondary workflows...)
I'd be very grateful for advice!
Campbell
This won't cover everything but I'll bring up some points..
Anything not in the portal_catalog won't work with collections, folder_contents view, getFolderContents method, search, portlet collections, related items(I think) and anything else the assumes you're using the portal_catalog.
I like to use an additional catalog when I need to be able to query the data but it only affects a sub-set of the content objects.
Use collective.indexing to speed up indexing operations.
Mount the catalogs on their own mount points so you can cache them differently from the rest of the site(so you can cache the whole catalog). Then, you can even serve the the catalogs from dedicated zeoserver.
Also, if your content doesn't have to be cataloged by the portal_catalog(with all the constraints listed), you may even want to think about if you need it as a full-fledged (archetype|dexterity) type in the first place. You can use a more slim repoze.catalog to catalog arbitrary objects(which could be very simple data) for whatever your purpose is and get even more performance. Or better yet, look into Solr for indexing it for VERY good performance.
On more thing, depending on the type of data you're storing, you could even look into using a relational database for a data store. But I don't know what kind of queries, indexes, data, etc you have...
30 different types seems like a lot but I don't know what your use case is. Care to share? Perhaps there is a better way to do it.

node_load or direct query?

What rule of thumb do you use for deciding to use node_load() or just writing a direct db_query()?
In a situation I'm looking at right now I need to get some node data and resolve data on two nodereference fields. So that would be 3 calls to node_load(). At some point here, would it be more efficient to construct the query with Joins directly?
This is for use in a self contained module that won't be distributed or used anywhere else, so I don't believe I need to worry about subverting node modification hooks (or do I?).
Edit:
Thinking about my question more, node_load() is only really applicable when you have one node to grab (and then maybe drilling down further into nodereferences like in my example). But as soon as you need to return more than one node based on some criteria, you're pretty much forced to use db_query right? Does Drupal have any abstracted API for writing queries like this?
Not a full answer (Not sure myself), just some hints.
node_load() is using a static cache (in Drupal 7, you can even use the entity_cache module to make it a permanent cache). If the nodes you are loading are being used a second time on the same page, that call will be free.
Querying CCK-tables is tricky. The schema structure can change completely based on configuration, for example when using a single or multiple values.
The reasoning behind using API methods for DB calls over direct DB calls is to provide a DB abstraction layer so that your app could move between supported database engines etc, also it enables your app to gracefully handle any schema changes (however unlikely) that core/module may make to the tables in question. It's also likely easier as #Berdir says for CCK fields and Node_Ref fields, but that depends on which you are more confident with Drupal API& PHP or MySQL...the payoff of doing it the Drupal way is increased future productivity and understanding of the codebase and what is possible :)
Oh and my rule of thumb is - Do it the Drupal way if at all possible (possible being variable depending on app time/cost/performance/whatever requirements)

How to mark "seen" RSS entries?

So I have played with the idea of making a specialized RSS-reader for some time now, but I have never gotten around to it. I have several project that could benefit from reading feeds in one way or another.
One project for this is an RSS-bot for an IRC-channel I'm on. But I havent quite wrapped my mind around how I can "mark as read" a story, so that it doesn't spit out all the stories in the feed everytime it runs.
Now, I haven't read the specs extencively yet either, so there might be some kind of unique ID I could use to mark the entry as read using a database of some kind. But is this the right way to do it?
From reading the specs for RSS 2.0 at ttp://cyber.law.harvard.edu/rss/rss.html#hrelementsOfLtitemgt it seems each item has a GUID which you can use to know which articles have been read or not.

Drupal question: Views, arguments and nodequeues

Hello :) I posted this same question on a drupal-oriented site, but didn't get any replies at all. I grumbled to myself and wished that the site was more like StackOverflow, so I thought, why not try asking it here :)
I'm playing around with a view that displays nodes belonging to a taxonomy term. The vocabulary also has a taxonomy nodequeue with subqueues for all the terms.
So far the view has one argument, taxonomy term ID, and is sorted by post date. But what if I wanted to display all of the nodes of a particular term, with all the nodequeue nodes on top, and all the non-nodequeue nodes (but still under this particular taxonomy term) below, sorted by date?
To clarify, say this is my vocabulary, we'll call it 'living stuff'
Plant
--Fruit
--Vegetable
Animal
--Fish
--Dinosaurs
The following nodes are found under Dinosaurs:
Tyrannosaurus Rex (added 2009-01-01)
Megalosaurus (added 2009-01-02)
Velociraptor (added 2009-01-03)
Brachiosaurus (added 2009-01-04)
Since tyrannosauruses and velociraptors are extra awesome dinosaurs, they're also added to the nodequeue living stuff, subqueue dinosaurs:
The subqueue:
Velociraptor
Tyrannosaurus rex
The final view should display them in this order:
Velociraptor (it's first in the NQ)
Tyrannosaurus Rex (2nd in NQ)
Brachiosaurus (of the remaining dinosaurs, this is the newest)
Megalosaurus (oldest non-queue dinosaur)
I created a relationship to a nodequeue, but it wouldn't let me pick a subqueue, I could only limit to the 'living stuff' nodequeue.
My first view argument is term ID, so I thought that if I added "Nodequeue: subqueue reference" as the second argument, I'd get the expected behavior, but this only shows the dinosaurs listed in the nodequeue.
Any help or suggestions on this problem would be highly appreciated. Thanks!
I haven't really tried much with nodequeues' subqueues, so I'm not completely certain of this. But from my experience with nodequeues, it seems like when using views, you are limited to the basic things they support and can't really do the type of customization you are looking for. I think your best bet, would be to create your own views sort handler, where you can sort it like this. It will probably be quite tricky to make such a handler, since you have to figure out both views and nodequeues in order to make it work. You should really give it some thought if it would be worth it before venturing down that path, unless you have done this sort of things with views before.
More hacks:
A work-around for the behavior your trying to accomplish might be to forgo using nodequeues at all. I'm not sure the entire impetus for using the nodequeues nor the importance of dates, but faced with similar issues before, I've been able to tackle it using the following:
Sticky
Modified dates
If you sticky your super-cool dinosaurs, and modify the published dates of the elements so that they match your order, you could produce what you're looking for in a single view. It's sorta hokey, and it's predicated on not really caring about publishing dates (something that always depends on situation) nor having a more pressing reason for using a nodequeue. That said, if you don't need the nodequeue or the dates, it's a workable solution.
The 2-view solution by Jeremy should be workable, too, and I'd say that's another common way to handle the given scenario.
Hacky solution warning!
Have your primary view in your page with the nodequeue items.
Create another view which is exposed as a block for the non nodequeue items. Put this block in the main content region and limit it to only show on URLs which are the same as the first view.
You may have to do some fiddling with the url variables but I think it will work.
Why don't you concatenate the views behind each nodequeue? (each nodequeue generates a view)
You can add a header (see 'Basic Settings' in view edit page) to the second nodequeue that contains php code that invokes views_embed_view('first nodequeue') (you just need to change the header's input format to 'php'). Or rather, create a custom view that includes each nodequeue by invoking views_embed_view(). This would effectively place one nodequeue on top of another, and if they are of the same format/content type you don't even need to mess around with fields: you can use Row Style == Node. As far as your arguments, they can be passed to views_embed_view, as the third parameter (the docs don't say that AFAICR, but I found a post in the forums (http://drupal.org/node/99721) that indicated args can be sent as '$current_view->args' to the view being embedded).
HTH

Resources