I’m currently reading a book on website programming and author mentions that he will code DLL objects to use lazy load pattern. I think that conceptually I somewhat understand lazy load pattern, but I’m not sure if I understand its usefulness in the way author implemented it
BTW - Here I’m not asking for usefulness of lazy load patterns in general, but whether it is useful in the way this particular book implements it:
1) Anyways, when DLL object is created, a DB query is performed (via DAL), which retrieves data from various columns and with it populates the properties of our DLL object. Since one of the fields ( call it "L" ) may contain quite a substantial amount of text, author decided to retrieve that field only when that property is read for the first time.
A) In our situation, what exactly did we gain by applying lazy load pattern? Just less of memory usage?
B) But on the other hand, doesn’t the way author implemented lazy load pattern cause for CPU to do more work and thus longer time to complete, since if L is retrieved separately from other fields, then that will require of our application to make an additional call to Sql Server in order to retrieve "L", while without lazy load pattern only one call to Sql Server would be needed, since we would get all the fields at once?!
BTW – I realize that lazy load pattern may be extremely beneficial in situations where retrieving particular piece of data would require heavy computing, but that’s not the case in the above example
thanx
It makes sense if the DLL objects can be used without the L field (most of the time). If that is the case your program can work with the available data while waiting for L to load. If L is always needed then the pattern just increases complexity. I do not think it will significantly slow things down especially if loading L takes more time than anything else. But that is just a guess. Write both with lazy loading and without then see which is better.
I think that this is pretty useful when applied to the correct columns. For instance let's say that you have a table in your database, Customers, and in that table you have a column CustomerPublicProfile, which is a text column which can be pretty big.
If you have a screen (let's call it Customers.aspx) which displays a list of the customers (but not the CustomerPublicProfile column), then you should try and avoid populating that column.
For instance if your Customers.aspx page shows 50 customers at a time you shouldn't have to get the CustomerPublicProfilecolumn for each customer. If the user decides to drill down into a specific customer then you would go and get the CustomerPublicProfile column.
About B, yes this does make N extra calls, where N is the number of customers that the user decided to drill down into. But the advantage is that you saved a lot of extra un-needed overhead in skipping the column in the first place. Specifically you avoided getting M-N values of the CustomerPublicProfile column, where M is the number of customers that were retrieved on the Customers.aspx page to begin with.
If in your scenario M has a value close to N then it is not worth it. But in the situation I described M is usually much larger than N so it makes sense.
Sayed Ibrahim Hashimi
I had a situation like this recently where I was storing large binary objects in the database. I certainly didn't want these loading into the DLL object every time it was initialised, especially when the object was part of a collection. So there are cases when lazy loading a field would make sense. However, I don't think there is any general rule you can follow - you know your data and how it will be accessed. If you think it's more efficient to make one trip to the database and use a little more memory, then that is what you should do.
Related
I've been recently assigned on a project using Teradata.
I've been told to strictly use DROP+CREATE instead of DELETE ALL, because the latter "leaves some space allocated someway". This is counter-intuitive to me, and I think it's probably wrong. I searched the web for a comparison between the two methods, but I found nothing.
This only reinforces my belief that DELETE ALL doesn't suffer from the issue above.
However, if this is the case, I must prove it (both practically and theoretically).
So, my question is: is there a difference in space allocation between the two methods? If not, is there an official document (user guide, technical specification, whatever else) that proves it?
Thank you!
There's a discussion here: http://teradataforum.com/teradata/20120403_105705.htm about the very same subject (although it does not really answer the "leaves some space allocated someway" part). They actually recommend DELETE ALL but for other (performance) reasons:
I'll quote just in case the link goes dead:
"Delete all" will be quicker, although being practical there often isn't a lot of difference in the performance of them.
However, especially for a process that is run regularly (say a daily batch process) then I recommend the "delete all" approach. This will do less work as it only removes the data and leaves the definition in place. Remember that if you remove the definition then this requires accessing multiple dictionary tables, and of course you then have to access those same tables (typically) when you re-create the object.
Apart from the performance aspect, the downside of the drop/create approach is that every time you create an object Teradata inserts "default rows" into the AccessRights table, even if subsequent access to the object is controlled via Role security and/or database level security. As you may well know the AccessRights table can easily get large and very skewed. In my experience many sites have a process which cleans this table on a regular basis, removing redundant rows. If your (typically batch) processes regularly drop/create objects, then you're simply adding rows into the table which have previously been removed by a clean process, and which will be removed in the future by the same process. This all sounds like a complete waste of time to me.
Your impression is correct, you didn't find any reference to "DELETE leaves some space allocated" in any place, because it's simply wrong :-)
DELETE ALL is similar to a TRUNCATE in other DBMSes and in most cases use fastpath processing:
First of all, you cannot do DROP/CREATE in one transaction in Teradata (in Oracle there are other problems with everyday DDL) so when ETL processes become complicated you might end up with the dependence where more important business processes depend on less important (like you might see the customers table empty just because the interests rates were not refreshed
or you have an exceeding varchar value in just one minor column)
My opinion: Use transactions and modular programming. In Teradata this means avoiding DDL where possible and using DELETE/UPDATE/MERGE/INSERT instead of DROP/CREATE.
We have a slightly different situation in Postgres where DDL statements are transactional.
How do you usually work with the data contained in a RecordStore:
Do you always "query" directly the RecordStore when you have to
perform
some operations over its records (searching, sorting,etc) or
Do you "cache" those records in a vector or array so that you query
that vector or array later, instead of the RecordStore?
Personally, I was following the second approach until yesterday when I got a nasty exception, reminding me that memory is a luxury we should be really careful about when developing j2me apps :S
Taking memory in consideration, now I'm not really sure that keeping arrays would be such a good idea.
In any case, I would like to hear your opinions guys.After all, you've got more experience.
Thanks for your time.
That depends on the number of records and the size of each record.
If you have already had OOME with the Vector approach, then try to work with only a single register at a time.
If you structure well your record you can do some fast searches on it. String searches will probably be slower.
Keep in mind that, although RMS has no fixed max size, it is advisable to call RecordStore.getSizeAvailable to give you an idea of how much info you can store in a given device.
Here you have a good tutorial on RMS:
http://www.ibm.com/developerworks/library/j-j2me3/
First, I do not want to use Lucene as a database, per se, but rather as the primary look-up for displaying lists to the user. This would be a canned search to Lucene where we would pull, say, all user information to be displayed in a grid list. We are building an ASP.Net web application, first of all. Is it a good idea to pull, from Lucene initially, a list of items (that can be paged) to display to the user in some sort of grid format? The only time we would call the database is when a user selects a specific record to view or update.
My concern is stale data coming from Lucene. I have been looking for information about add and updates to an index, but it is unclear to me if my scenario is better suited for a database rather than Lucene. My other developers and I have been going back and forth about this, but unfortuneatley, we don't know enough about how Lucene handles writes and reads.
I'm not sure if it's a good or bad fit for your use case. Hopefully I can give you some insight on how Lucene stores its data, and you can make a decision from that.
Lucene is extremely quick if you want to search for an item in its index. The time it takes to index its items isn't so quick. It's by no means slow if you look at everything its doing, but it adds complexity to know what you need to do about it.
Lucene is essentially a document store. So each item in Lucene is a Document, which can hold a certain amount of fields. Those fields are essentially key value pairs, though right now, Lucene only supports types of string and byte[] as values, and strings only as keys. Each field can be index and/or analyzed (or neither). Indexing simply means you can search against that field's data, generally only via exact matches and wildcards. Analyzing gives you better searching capabilities, since it will take the string and tokenize it. Depending on the analyzer it will tokenize it differently. The most common is whitespace and stopwords; essentially marking each word as a term unless its something like (a, an, the, as, etc...).
The real killer when used for many pieces, you can't update a document in an index. When you pull out a document to update it and change the field, the call to UpdateDocument() actually marks the old document as deleted and inserts a new document.
Notice I said it marks it as deleted. That introduces another thing related to Lucene indexes: Optimization of the index. When you write to an index, every so often a segment of the index is written to disk. (It's temporarily stored in RAM for fast indexing) When you run a search on an index, lucene needs to open all those different segments to find the terms to search against (it has to order them in a way too). This means if you have many segments, searching can be slow. A call to Optimize() will not only merge the segments together, it will also remove any documents marked for deletion, thus lowering your index size, as well.
However, optimizing your index requires around 1.5x more space while the optimization is being done, sometimes more. Fortunately, Lucene.net is transactional during an optimization, which means not only will your index not be corrupt if an optimization fails, any existing IndexReader you have open will still be able to search and read from the index when you're optimizing it.
In short, if it were me, if you were expecting only get one result from a search each time, I may not recommend lucene. Lucene especially shines when you're searching through many documents for many documents. It's an inverted index and it's good at that. For a single lookup, you may be better off with a database. Unfortunately, the only way you'll really find out is to benchmark it. Fortunately, at least Lucene.Net is very easy to setup for something like that.
Also, if you do use Lucene.Net, consider our 2.9.4g branch. You may not be able to use it, since it is technically not release code, but it is a bit faster than normal lucene, as we've added generics and removed a bit of the costly boxing done in previous versions.
Lucene is not a good fit for the scenario you're describing. You're looking at caching data.
Why not use the Asp.net cache? If you need a more robust caching solution, there's memcached and a whole host of other ones ... even NoSql stores like mongo, redis, etc.
Obviously, you'll need to manually remove items from the cache on updates to stop serving stale data.
I think this is a viable solution, and I say this because there is a major open source content management system that is using a technique very similar to what you've described. It's called Umbraco, and it's version 5 is going to be using a customized version of Lucene.NET for a sort of cache.
you can look at the project and source here: http://umbraco.codeplex.com/SourceControl/changeset/view/5a7c9af9bbf9
Say I need to populate 4 or 5 dropdowns w/ items from a database. Each drop down will have < 15 items in it. These items almost never change.
Now I could query the DB each time the page is accessed or I could grab the values from a custom class that would check to see if they already exist in ASP.Net's cache and only if they don't query the DB to update the cache.
It's trivial for me to write but I'm unsure if the performace would be better or not. I think it would be (although not likely anything huge).
What do you think?
When dealing with performance issues you should always:
Do things the simplest way first (avoid premature optimisation)
Performance test your code with set performance goals (e.g. 200ms response time under load of N concurent users)
Then, IF your code doesn't perform then profile your code to determine what is slow, and profile your proposed performance fixes to accurately measure what the real-world performance change will be.
Having said that then yes, what you are suggesting seems sensible (you would usually expect an in-memory cache to be quicker than a database), however it also depends on what data is being returned, what the memory load of your application is, how expensive the query is, what the query parameters are etc...
You should performance test your changes before and after to determine the actual effect of your changes (including things like memory load), and you should only really be doing things like this once you have identified that these dropdowns are the cause of an unacceptable performance problem.
That's what System.Web.Helpers.WebCache class exists for.
IO is usually more expensive than memory operations (by orders of magnitude). Especially if your database is in another machine, then you would even be using network resources, and it will definitely be faster to just use the cache.
But indeed, optimize in the end when you have really identified it as a performance bottleneck by measuring.
Quick answer to your question:
Use the built in .Net cache.
Additional points to ponder over..
Preferably, retrieve all master data in a single database retrieval (think stored procedure and dataset): though, I do not advocate the used of stored procs in all scenarios.
As you rightly said, ensure that your data access layer checks the cache before making a round trip to the database
Also, as your drop down values do not change very often; do remember to keep a long expiry duration
Finally, based on your page design you could also look at Fragment Caching (partial page caching: user controls) which could give you bigger benefits since now you neither access the data cache nor the database.
Performance:
Again, the performance depends more on the application's load as compared to your direct round trips for fetching the master data. Put simply, As Thomas suggested use the cache class!
I have a Dictionary<int, string> cached (for 20 minutes) that has ~120 ID/Name pairs for a reference table. I iterate over this collection when populating dropdown lists and I'm pretty sure this is faster than querying the DB for the full list each time.
My question is more about if it makes sense to use this cached dictionary when displaying records that have a foreign key into this reference table.
Say this cached reference table is a EmployeeType table. If I were to query and display a list of employee names and types should I query for EmployeeName and EmployeeTypeID and use my cached dictionary to grab the EmployeeTypeIDs name as each record is displayed or is it faster to just have the DB grab the EmployeeName and JOIN to get the EmployeeType string bypassing the cached Dictionary all together.
I know both will work but I'm interested in what will perform the fastest. Thanks for any help.
Optimization 101 says don't do it unless you need to:- Tips for optimizing C#/.NET programs
But, yes, if this really is a totally static lookup for the lifetime of the application AND it takes up very little RAM then caching it would seem fairly harmless and a Dictionary lookup from RAM will be faster than a trip to the database.
As for the 2nd part you might as well let the database do the join, it'll probably have that table in RAM already, and the increased network payload would seem small.
But again, if you don't need to do it, don't do it! The danger here is that you do this one, then another, then another, the code grows ever more complex and RAM fills up with things you think you might need but which in fact are used rarely leaving less space for the OS/ORM/DB to do its work. Let the compiler, ORM and database decide what to keep in memory instead - they have a much bigger team focused on optimization!
I know you won't like the answer, but common sense dictates you do the easiest thing and if too slow then you put remedy to it.
I'll explain myself. As a matter of fact if you cache it it'll probably be faster as you wouldn't be hitting the database every time you load the page, but the gain might not be noticeable for what you're doing (i.e. you might have some other bottle-neck that makes that gain insignificant), defeating the purpose of caching in the first place.
The only way, again, is to do it the easiest way (no caching) and if you're not happy only then you'll go the extra bit.