which index strategy choose for my web application using hibernate search and very huge database? - glassfish-3

what i have :
1 - big database which is in constant growth.
2 - Web application made in glassfish and hibernate search.
what's the problems :
1 - what the best index strategy to use knowing i tried the manuel index and it takes too long each time .
2 - the database is filled also from out of the web application (second application), then automatic index will not index these data

If the database gets changed via another non Hibernate Search application you need to use manual indexing. Have you tried the mass indexer? There are many parameters you can play with to adjust indexing speed. To avoid full re-indexing every time, I recommend to use some sort of timestamp based approach. At least the indexed entity should have a last update column of some sorts which can be queried to determine the entities which have changed since the last index run. Only these entities need to be indexed.

Related

Performance issue in oracle coherence cache using LikeFilter wildcard search

I have implemented wildcard search using oracle coherence API. When I execute the search on string fields(four fields) using
1) "LikeFilter" with "fIgnoreCase" as true and
2) search text is % patterns(eg: "%test%") and
3) accumulated those using " AnyFilter", and
4) the volume of data in the cache is huge then the searches become very slow.
Applying the standard index does not have any effect on the performance, as it appears that this index works only for exact matches or comparisons.
Is there any special type of index in Coherence for wildcard searches (similar to the new indexes in Oracle TEXT)? If not, is there any other way to improve wildcard query performance on Coherence, with large data sets in the cache?
Please provide code snippet to understand the current solution applied. Also, hope following practices already applied:
Explain plan to see the query performance
Leveraging data-grid wide execution for parallel processing considering volume of data
Also, need information on volume of data (in GB) along with Coherence setup in place (no. of nodes, size of each node) to understand sizing of the cluster.

Doctrine 2 optimistic locking only on changed fields

Is anyone aware of any bundle, or of any plans for Doctrine to implement a new optimistic locking strategy like the one in Telerik framework?
http://docs.telerik.com/data-access/developers-guide/crud-operations/concurrency-control/data-access-tasks-define-model-concurrency-optimistic#checking_for_any_changes
This strategy is great because it's not using version numbers, or timestamps, it compares the old values of the inputs with new values, and only for the changed inputs.
So if I have two different forms updating the same entity, using the current strategies (timestamp or versioning), 2 users will get into conflict even if the do not update the same data.
I know this is a rather old question on the interwebs, so, I'm going to glaze over how this can be done in theory.
The per column lock implies there is a mechanism for detecting per column changes - therefore either the row is hashed (or the columns we are concerned about) and that hash comparison is used to detect/track changes OR we track each column individually.
This tracking would then have to be applied on a per request basis to determine what's changed, when. This is where the normal #Version decorator logic would apply.
Telerik is able to handle the per column changes because the information is compared (array comparisons, most likely) in the browser. This is an isolated state of the data that is not in sync with other browsers/database until update.
Easiest option here is to not allow the two forms to collide (in terms of data) and remove the lock - OR - decouple the forms data from the primary table and allow them to be updated separately (and merged with a sql join).

Is it possible to create a database/table/view alias?

Let's say there is a database owned by someone else called theirdb with a very slow view named slowview. I have an app that queries this view regularly, but, because it takes too long, I want to materialize it to a table within a database that I own (mydb.materializedview).
Is there a way in Teradata to create an alias database object so that I can go like select * from theirdb.slowview, but actually be selecting from mydb.materializedview?
I need to do some rigorous testing against their view, but it's so slow that I hardly have time to test anything. The other option is to edit the code so that it reads from mydb.materializedview, but that is, unfortunately, not an option in this particular case.
Teradata does not allow for you to create aliases or symbolic links between objects.
If the object is fully qualified by database name and view name in the application your options are a little more restricted. You have have to create a backup of their view definition and them place your materialized table in the same database. This would obviously be best done during a planned application outage.
If the object is not fully qualified by database name and view name in the application and relies on a default database setting or application variable you have a little more flexibility. If all the work is done at a view level you can duplicate the environment in another database where you plan to have a materialized version of their slowview. Then by changing the users default database or application variable you can point it at the duplicate environment to complete your testing.
Additionally, you can try to cover (partially or fully) the query that makes up the slowview by using a join index. This allows you to leave the codebase as it is in the application but for queries that can be satisfied by the join index the optimizer will use the join index. Keep in mind that a join index does incur a cost as it is in essence a materialized version of the SQL which was used to construct it. This means additional IO and change management issues have to be taken in to account.
Lastly, you could try to create additional secondary or hash indexes on the objects within the slowview to improve it's performance.

Caching result of SELECT statement for reuse in multiple queries

I have a reasonably complex query to extract the Id field of the results I am interested in based on parameters entered by the user.
After extracting the relevant Ids I am using the resulting set of Ids several times, in separate queries, to extract the actual output record sets I want (by joining to other tables, using aggregate functions, etc).
I would like to avoid running the initial query separately for every set of results I want to return. I imagine my situation is a common pattern so I am interested in what the best approach is.
The database is in MS SQL Server and I am using .NET 3.5.
It would definitely help if the question contained some measurements of the unoptimized solution (data sizes, timings). There is a variety of techniques that could be considered here, some listed in the other answers. I will assume that the reason why you do not want to run the same query repeatedly is performance.
If all the uses of the set of cached IDs consist of joins of the whole set to additional tables, the solution should definitely not involve caching the set of IDs outside of the database. Data should not travel there and back again if you can avoid it.
In some cases (when cursors or extremely complex SQL are not involved) it may be best (even if counterintuitive) to perform no caching and simply join the repetitive SQL to all desired queries. After all, each query needs to be traversed based on one of the joined tables and then the performance depends to a large degree on availability of indexes necessary to join and evaluate all the remaining information quickly.
The most intuitive approach to "caching" the set of IDs within the database is a temporary table (if named #something, it is private to the connection and therefore usable by parallel independent clients; or it can be named ##something and be global). If the table is going to have many records, indexes are necessary. For optimum performance, the index should be a clustered index (only one per table allowed), or be only created after constructing that set, where index creation is slightly faster.
Indexed views are cleary preferable to temporary tables except when the underlying data is read only during the whole process or when you can and want to ignore such updates to keep the whole set of reports consistent as far as the set goes. However, the ability of indexed views to always accurately project the underlying data comes at a cost of slowing down those updates.
One other answer to this question mentions stored procedures. This is largely a way of organizing your code. However, it if you go this way, it is preferable to avoid using temporary tables, because such references to a temporary table prevent pre-compilation of the stored procedure; go for views or indexed views if you can.
Regardless of the approach you choose, do not guess at the performance characteristics and query optimizer behavior. Learn to display query execution plans (within SQL Server Management Studio) and make sure that you see index accesses as opposed to nested loops combining multiple large sets of data; only add indexes that demonstrably and drastically change the performance of your queries. A well chosen index can often change the performance of a query by a factor of 1000, so this is somewhat complex to learn but crucial for success.
And last but not least, make sure you use UPDATE STATISTICS when repopulating the database (and nightly in production), or your query optimizer will not be able to put the indexes you have created to their best uses.
If you are planning to cache the result set in your application code, then ASP.NET has cache, Your Winform will have the object holding the data with it with which you can reuse the data.
If planning to do the same in SQL Server, you might consider using indexed views to find out the Id's. The view will be materialized and hence you can get the results faster. You might even consider using a staging table to hold the id's temporarily.
With SQL Server 2008 you can pass table variables as params to SQL. Just cache the IDs and then pass them as a table variable to the queries that fetch the data. The only caveat of this approach is that you have to predefine the table type as UDT.
http://msdn.microsoft.com/en-us/library/bb510489.aspx
For SQL Server, Microsoft generally recommends using stored procedures whenever practical.
Here are a few of the advantages:
http://blog.sqlauthority.com/2007/04/13/sql-server-stored-procedures-advantages-and-best-advantage/
* Execution plan retention and reuse
* Query auto-parameterization
* Encapsulation of business rules and policies
* Application modularization
* Sharing of application logic between applications
* Access to database objects that is both secure and uniform
* Consistent, safe data modification
* Network bandwidth conservation
* Support for automatic execution at system start-up
* Enhanced hardware and software capabilities
* Improved security
* Reduced development cost and increased reliability
* Centralized security, administration, and maintenance for common routines
It's also worth noting that, unlike other RDBMS vendors (like Oracle, for example), MSSQL automatically caches all execution plans:
http://msdn.microsoft.com/en-us/library/ms973918.aspx
However, for the last couple of versions of SQL Server, execution
plans are cached for all T-SQL batches, regardless of whether or not
they are in a stored procedure
The best approach depends on how often the Id changes, or how often you want to look it up again.
One technique is to simply store the result in the ASP.NET object cache, using the Cache object (also accessible from HttpRuntime.Cache). For example (from a page):
this.Cache["key"] = "value";
There are many possible variations on this theme.
You can use Memcached to cache values in the memory.
As I see there are some .net ports.
How frequently does the data change that you'll be querying? To me, this sounds like a perfect scenario for data warehousing, where you flatting the data for quicker data retrieval and create the tables exactly as your 'DTO' wants to see the data. This method is different than an indexed view in that it's simply a table which will have quick seek operations, and could especially be improved if you setup the indexes properly on the columns that you plan to query
You can create Global temporary Table. Create the table on the fly. Now insert the records as per your request. Access this table in your next request in your joins... for reusability

Create new database programmatically in Asp.Net MVC application?

I have worked on a timesheet application application in MVC 2 for internal use in our company. Now other small companies have showed interest in the application. I hadn't considered this use of the application, but it got me interested in what it might imply.
I believe I could make it work for several clients by modifying the database (Sql Server accessed by Entity Framework model). But I have read some people advocating multiple databases (one for each client).
Intuitively, this feels like a good idea, since I wouldn't risk having the data of various clients mixed up in the same database (which shouldn't happen of course, but what if it did...). But how would a multiple database solution be implemented specifically?
I.e. with a single database I could just have a client register and all the data needed would be added by the application the same way it is now when there's just one client (my own company).
But with a multiple database solution, how would I create a new database programmatically when a user registers? Please note that I have done all database stuff using Linq to Sql, and I am not very familiar with regular SQL programming...
I would really appreciate a clear detailed explanation of how this could be done (as well as input on whether it is a good idea or if a single database would be better for some reason).
EDIT:
I have also seen discussions about the single database alternative, suggesting that you would then add ClientId to each table... But wouldn't that be hard to maintain in the code? I would have to add "where" conditions to a lot of linq queries I assume... And I assume having a ClientId on each table would mean that each table would have need to have a many to one relationship to the Client table? Wouldn't that be a very complex database structure?
As it is right now (without the Client table) I have the following tables (1 -> * designates one to many relationship):
Customer 1 -> * Project 1 -> * Task 1 -> * TimeSegment 1 -> * Employee
Also, Customer has a one to many relationship directly with TimeSegment, for convenience to simplify some queries.
This has worked very well so far. Wouldn't it be possible to simply have a Client table (or UserCompany or whatever one might call it) with a one to many relationship with Customer table? Wouldn't the data integrity be sufficient for the other tables since the rest is handled by the relationships?
as far as whether or not to use a single database or multiple databases, it really all depends on the use cases. more databases means more management needs, potentially more diskspace needs, etc. there are alot more things to consider here than just how to create the database, such as how will you automate the backup process creation, etc. i personally would use one database with a good authentication system that would filter the data to the appropriate client.
as to creating a database, check out this blog post. it describes how to use SMO (sql management objects) in c#.net to create a database. they are a really neat tool, and you'll definitely want to familiarize yourself with them.
to deal with the follow up question, yes, a single, top level relationship between clients and customers should be enough to limit the new customers to their appropriate data.
without any real knowledge about your application i can't say how complex adding that table will be, but assuming your data layer is up to snuff, i would assume you'd really only need to limit the customers class by the current client, and then get all the rest of your data based on the customers that are available.
did that make any sense?
See my answer here, it applies to your case as well: c# database architecture

Resources