Mariadb Engine-Independent Table Statistics migration upgrade 10.3 to 10.5 - mariadb

We upgrade MariaDB version from 10.3 to 10.5. Some explain plans change for worst plans.
We see that documentation https://mariadb.com/kb/en/engine-independent-table-statistics/ and we see that
use_stat_tables='preferably_for_queries'
is default configuration but we have empty tables for engine independent stats so I expected that it use innodb stats but if change it to never or complementary_for_queries I get another (usually better) execution plan. So I was very surprised of that.
Next if I want use engine independent stats I need to process some (or all) tables stats with ANALYZE TABLE tbl PERSISTENT FOR ALL;
But it's freeze database, so I understand that I can use a slave generate and copy them (not very friendly with varbinary and blob columns).
And finally I don't know how explain plan will become.
I fact I did it and that's not give me better explain plan (same plan with empty stats tables) with use_stat_tables='preferably_for_queries' option and I did flush table.
So I'm block with use_stat_tables='never' so do I need user independent engine stats ? How to do that ?
Update:
With engine independent stats tables empty why a get different explain plans when a switch from use_stat_tables='never' to use_stat_tables='preferably_for_queries' ? Is it a bug ?

Related

Performance issue in oracle coherence cache using LikeFilter wildcard search

I have implemented wildcard search using oracle coherence API. When I execute the search on string fields(four fields) using
1) "LikeFilter" with "fIgnoreCase" as true and
2) search text is % patterns(eg: "%test%") and
3) accumulated those using " AnyFilter", and
4) the volume of data in the cache is huge then the searches become very slow.
Applying the standard index does not have any effect on the performance, as it appears that this index works only for exact matches or comparisons.
Is there any special type of index in Coherence for wildcard searches (similar to the new indexes in Oracle TEXT)? If not, is there any other way to improve wildcard query performance on Coherence, with large data sets in the cache?
Please provide code snippet to understand the current solution applied. Also, hope following practices already applied:
Explain plan to see the query performance
Leveraging data-grid wide execution for parallel processing considering volume of data
Also, need information on volume of data (in GB) along with Coherence setup in place (no. of nodes, size of each node) to understand sizing of the cluster.

How well does Teradata deal with Foreign Keys?

I'm starting a new project and one of the requirements is to use Teradata. I'm proficient in many different database systems but Teradata is fairly new to me.
On the client end they have removed all foreign keys from their database under the recommendations of "a consultant".
Every part of me cringes.
I'm using a new database instance so I'm not constrained by what they've already done on other databases. I haven't been explicitly told not to use foreign keys and my relation with the customer is such that they will at the very least hear me out. However, my decision and case should be well-informed.
Is there any intrinsic, technological reason that I should not use FKs in Teradata to maintain referential integrity based upon Teradata's design, performance, side-effects, etc...
Of note, I'm accessing Teradata using the .Net Data Provider v16 which only supports up to EF5.
Assuming that the new project is implementing a Data Warehouse there's a simple reason (and this is true for any DWH, not only Teradata): a DWH is not the same as an OLTP system.
Of course you still got Primary & Foreign Keys in the Logical data model, but maybe not implemented in the Physical model (although they are supported by Teradata). There are several reasons:
Data is usually loaded in batches into a DWH and both PK & FKs must be validated by the loading process before Insert/Update/Delete. Otherwise you load 1,000,000 rows and there's a single row failing the constraints. Now you got a Rollback and an error message and try to find the bad data, good luck. But when all the validation is already done during load there's no reason to do the same checks a 2nd time within the database.
Some tables in the DWH will be Slowly Changing Dimensions and there's no way to define a PK/FK on that usibg Standard SQL syntay, you need something like TableA.column references TableB.column and TableA.Timestamp between TableB.ValidFrom and TableB.ValidTo (it is possible when you create Temporal Table)
Sometimes a table is recreated or reloaded from scratch, hard to do if there's a FK referencing it.
Some PKs are never used for any access/join, so why implementing them physically, it's just a huge overhead in CPU/IO/storage.
Knowledge about PK/FK is important for the optimizer, so there's a so-called Soft Foreign Key (REFERENCES WITH NO CHECK OPTION), which is a kind of dummy: applied during optimization, but never actually checked by the DBMS (it's like telling the optimizer trust me, it's correct).

Simulate records in database without entering any

I've nearly finished the development of a project and would like to test its performance, especially the database query calls. I'm using Linq to SQL to search via usernames, but I've only got around 10 'users' in my database, so I can't really get a decent speed reading. How can I simulate thousands/millions of users in the database without actually creating new records? I've read about Selenium, but it seems that is good for repeat actions (simulating concurrent users?). Are there any other tools I should look into, or are there any options in VS 2008 (Professional Edition)?
Thanks
You can "trick" SQL Server into thinking there are more records than there actually are in a table using the approach outlined in this article. See the section on False SQL Server Statistics
e.g.
UPDATE STATISTICS TableName WITH ROWCOUNT=100000
will create statistics for the table as if it has 100000 rows in. You can then see what effect this has on the execution plan. But note this is undocumented functionality as so it may give quirky behaviour.
You could just populate your table with sample data. There's various tools available to help out with that like, Red Gate's SQL Data Generator. I prefer actually having large data volumes as I think that is what will be more accurate.

SQL Server hosting only offers 1GB databases. How do I split my data up?

Using ASP.NET and Windows Stack.
Purpose:
Ive got a website that takes in over 1GB of data about every 6 months. So as you can tell my database can become huge.
Problem:
Most hosting providers only offer Databases in 1GB increments. This means that every time I go over another 1GB, I will need to create another Database. I have absolutely no experience in this type of setup and Im looking for some advice on what to do?
Wondering:
Do I move the membership stuff over to a separate database? This still won't solve much because of the size of the other data I have.
Do I archive data into another database? If I do, how to I allow users to access it?
If I split the data between two databases, do I name the tables the same?
I query all my data with LINQ. So establishing a few different connections wouldn't be a horrible thing.
Is there a hosting provider that anyone knows of that can scale their databases?
I just want to know what to do? How can I solve this dilemma? I don't have the advertising dollars coming in to spend more than $50 a month so far...
While http://www.ultimahosts.net/windows/vps/ seems to offer the best solution for the best price, they still split the databases up. So where do I go from here?
Again, I am a total amateur to multiple databases. Ive only used one at a time..
I'd be genuinely surprised if they actually impose a hard 1GB per DB limit and create a new one for each additional GB, but on the assumption that that actually is the case -
Designate a particular database as your master database. This is the only one your app will directly connect to.
Create a clone of all the tables you'll need in your second (and third, fourth etc) databases.
Within your master database, create a view that does a UNION on the tables as a cross-DB query - SELECT * FROM Master..TableName UNION SELECT * FROM DB2..TableName UNION SELECT * FROM DB3..TableName
For writing, you'll need to use sprocs to locate the relevant records and update them, but you shouldn't have a major problem there. In principle you could extend the view above to return which DB the record was in if you wanted.
Answering this question is very hard for it requires knowing at least some basic facts about the data model, the way the data is queried, etc. Also as suggested by rexem, a better understanding of the use model may allow using normalization to limit the growth (and I had may also allow introducing compression, if applicable)
I'm more puzzled at the general approach and business model (and I do understand the need to keep cost down with a startup application based on ad revenues). Wouldn't you be able to contract an amount that will fit your need for the next 6 months, then, when you start outgrowing this space, purchase additional storage (for an extra 6 month/year, by then you may be "rich"); such may not even require anything on your end (depends on the way hosting service manages racks etc.), or at worse, may require you to copy the old database to the new (bigger) storage?
In this fashion, you wouldn't need to split the database in any artificial fashion, and hence focus on customer-oriented features, rather than optimizing queries that need to compile info from multiple servers.
I believe solution is much more simpler than that: also if your provider manage database in 1 GB space it does not means that you have N databases of 1 GB each, it means that once you reach 1 GB the database could be increased to move to 2 GB, 3 GB and so on...
Regards
Massimo
You would have multiple questions to answer:
It seems the current hosting provider can not be very reliable if it is the way you say: they create a new database every time the initial one gets more then 1GB - this sounds strange... at least they should increase the storage for the current db and announce you that you'll be charged more... Find other hosting solutions with better options...
Is there any information into your current DB that could be archived? That's a very important question since you may carry over "useless" data that could be archived into separate databases and queried only when special requests. As other colleagues told you already, that would be difficult for us to evaluate since we do not know the data model.
Can you split the data model into two total different storages and only replicate between them the common information? You could use SQL Server Replication (http://technet.microsoft.com/en-us/library/ms151198.aspx) to maintain the same membership information between the databases.
If the data model can not be splited then I do not see any practical choice to have multiple databases - just find a bigger storage solution.
You may want to look for a better hosting provider.
Even SQL Express supports a 4GB database, and it's free. Some hosts don't like using SQL Express in a shared environment, but disk space is so cheap these days that finding a plan that starts at or grows in chunks of more than 1GB should be pretty easy.
You should go for a Windows VPS solution. Most of the Windows VPS providers will offer SQL 2008 Web Edition that can support upto 10 GB of database space ...

Profiling SQL Server and/or ASP.NET

How would one go about profiling a few queries that are being run from an ASP.NET application? There is some software where I work that runs extremely slow because of the database (I think). The tables have indexes but it still drags because it's working with so much data. How can I profile to see where I can make a few minor improvements that will hopefully lead to larger speed improvements?
Edit: I'd like to add that the webserver likes to timeout during these long queries.
Sql Server has some excellent tools to help you with this situation. These tools are built into Management Studio (which used to be called Enterprise Manager + Query Analyzer).
Use SQL Profiler to show you the actual queries coming from the web application.
Copy each of the problem queries out (the ones that eat up lots of CPU time or IO). Run the queries with "Display Actual Execution Plan". Hopefully you will see some obvious index that is missing.
You can also run the tuning wizard (the button is right next to "display actual execution plan". It will run the query and make suggestions.
Usually, if you already have indexes and queries are still running slow, you will need to re-write the queries in a different way.
Keeping all of your queries in stored procedures makes this job much easier.
To profile SQL Server, use the SQL Profiler.
And you can use ANTS Profiler from Red Gate to profile your code.
Another .NET profiler which plays nicely with ASP.NET is dotTrace. I have personally used it and found lots of bottlenecks in my code.
I believe you have the answer you need to profile the queries. However, this is the easiest part of performance tuning. Once you know it is the queries and not the network or the app, how do you find and fix the problem?
Performance tuning is a complex thing. But there some places to look at first. You say you are returning lots of data? Are you returning more data than you need? Are you really returning only the columns and records you need? Returning 100 columns by using select * can be much slower than returning the 5 columns you are actually using.
Are your indexes and statistics up-to-date? Look up how to update statisistcs and re-index in BOL if you haven't done this in a awhile. Do you have indexes on all the join fields? How about the fields in the where clause.
Have you used a cursor? Have you used subqueries? How about union-if you are using it can it be changed to union all?
Are your queries sargable (google if unfamiliar with the term.)
Are you using distinct when you could use group by?
Are you getting locks?
There are many other things to look at these are just a starting place.
If there is a particular query or stored procedure I want to tune, I have found turning on statistics before the query to be very useful:
SET STATISTICS TIME ON
SET STATISTICS IO ON
When you turn on statistics in Query Analyzer, the statistics are shown in the Messages tab of the Results pane.
IO statistics have been particularly useful for me, because it lets me know if I might need an index. If I see a high read count from the IO statistics, I might try adding different indexes to the affected tables. As I try an index, I run the query again to see if the read count has gone down. After a few iterations, I can usually find the best index(es) for the tables involved.
Here are links to MSDN for these statistics commands:
SET STATISTICS TIME
SET STATISTICS IO

Resources