Does Vertica DBD takes Inner Queries into account while configuring Projections? - projection

I created COMPREHENSIVE design using a list of 80-85 Queries. Most of them are too big , like 300 - 400 lines each . And most of the Queries has a lot of inner Queries . My Query is that does DBD takes into account the inner Queries too for Projection creation, AS most of the inner Query 's Explain plan does seem to suggest it.

It does take into account queries with joins, in the sense that it tries to look for a common key. After DBD is finished with the design, it's always a good idea to review the design before deploying.
The best way to optimize for joins is by using primary & foreign keys, and possibly pre-join projections.
Another approach is to look at your actual schema design as the goal should be to have joins to perform locally. You may want to replicate smaller tables or have a single, very large table replicated. This will still allow joins to happen locally.
Some articles which may help in optimizing joins:
Join Operator Overview
Pre-join Projections Overview

Related

Does DynamoDB GSI overloading give performance benefits or just flexibility

Does GSI Overloading provide any performance benefits, e.g. by allowing cached partition keys to be more efficiently routed? Or is it mostly about preventing you from running out of GSIs? Or maybe opening up other query patterns that might not be so immediately obvious.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
e.g. I you have a base table and you want to partition it so you can query a specific attribute (which becomes the PK of the GSI) over two dimensions, does it make any difference if you create 1 overloaded GSI, or 2 non-overloaded GSIs.
For an example of what I'm referring to see the attached image:
https://drive.google.com/file/d/1fsI50oUOFIx-CFp7zcYMij7KQc5hJGIa/view?usp=sharing
The base table has documents which can be in a published or draft state. Each document is owned by a single user. I want to be able to query by user to find:
Published documents by date
Draft documents by date
I'm asking in relation to the more recent DynamoDB best practice that implies that all applications only require one table. Some of the techniques being shown in this documentation show how a reasonably complex relational model can be squashed into 1 DynamoDB table and 2 GSIs and yet still support 10-15 query patterns.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-relational-modeling.html
I'm trying to understand why someone would go down this route as it seems incredibly complicated.
The idea – in a nutshell – is to not have the overhead of doing joins on the database layer or having to go back to the database to effectively try to do the join on the application layer. By having the data sliced already in the format that your application requires, all you really need to do is basically do one select * from table where x = y call which returns multiple entities in one call (in your example that could be Users and Documents). This means that it will be extremely efficient and scalable on the db level. But also means that you'll be less flexible as you need to know the access patterns in advance and model your data accordingly.
See Rick Houlihan's excellent talk on this https://www.youtube.com/watch?v=HaEPXoXVf2k for why you'd want to do this.
I don't think it has any performance benefits, at least none that's not called out – which makes sense since it's the same query and storage engine.
That being said, I think there are some practical reasons for why you'd want to go with a single table as it allows you to keep your infrastructure somewhat simple: you don't have to keep track of metrics and/or provisioning settings for separate tables.
My opinion would be cost of storage and provisioned throughput.
Apart from that not sure with new limit of 20

Doctrine Query performance

I am pretty new to doctrine and i have a question about DQL performance. I work for a company with a pretty big database and to get the wanted data i have to write queries with multiple JOIN and WHERES.
My question is:
Is it possible to write "better performing" Queries? An example query would be:
SELECT u
FROM User u
JOIN u.tags t
JOIN u.profile p
JOIN p.picture pic
JOIN u.city c
WHERE p.Finished = 1
AND pic.Active = 1
AND u.created < :eDate
AND u.finished = :sDate
AND t = :tag1 OR t = :tag2
ORDER BY u.name ASC
This is just an example of a query. Some are much longer and have much more JOINS.
I googled a little bit, but the best i could fin 5 Doctrine ORM Performance Traps You Should Avoid
Number 2 on the link says that you should split the Query and fetch each data yourself. Can someone explain how this would look like on the example above?
Also is there anything else i can do with the query that boosts the performance?
Thank you !
This is a superbroad question and it won't be possible to answer it "correctly". So this is what you should do when you are concerned about Doctrine performance:
Doctrine ORM isn't meant to be used to for reading/manipulating large datasets, it's an "object" manager - so you read/update/delete objects (or a list of objects), but you shouldn't read / update thousands of items at once. Use Doctrine ODM or native SQL for it.
From Doctrine's documentation: "An ORM tool is not primarily well-suited for mass inserts, updates or deletions. Every RDBMS has its own, most effective way of dealing with such operations"
No one will be able to tell you where the "edge" is - so the only thing you can do is: Monitor! Log duration, memory usage etc and check whether you have an issue at all or when you get an issue. Additionally this will help you to identify whether Doctrine is part of your issue if you face one.
One option we indeed use to avoid too complex joins and not to loose to much performance (especially in hydrating objects with Dotrine) is to use single queries and fill our objects / relations with these results. BUT see point 1 and 2 AND by using such an approach a certain amount of Doctrine's purpose is contradicted.
Summary: Try it, monitor it - and only if you have an issue: check what you can do specifically to tackle your issue.
If you need any help for a "real" issue please come back.

SQL. Re: inner joins & foreign keys

Good day.
I have a basic question on SQL and table structure.
What we have now: 17 tables. These tables include 1 admin table. The other 13 tables are all branched off 3 "main" tables: customers, CareWorkers, Staff.
If I'm wanting to adhere to ACID ideology, I want to then create tables that each houses unique information.
My question is, and what I'm trying to wrap my head around, when I create each of these "nested-deeper" (not sure what to call it) tables, I simply do an inner join statement to grab the foreign key on my ASP.NET app correct?
First, inner join is how you get your tables "back together", and #SpectralGhost's example is how you do it. But you might want to consider doing it in the database rather than in your ASP code. The way you do that is with views. If you create a view (the syntax is CREATE VIEW and there are plenty of examples out there) then you can make the database schema as complex as you need to without making it hard to use in your ASP application. You can even make views updatable (you define an "INSTEAD OF" trigger, again, many examples if you search).
But you probably don't want to update a view, or a table, directly from your ASP code. You probably want to define STORED PROCEDUREs that update your data, and call those from your ASP code. This allows you to restrict access to your tables and views to read only and force any writes to come through a stored procedure you can control better. This prevents SQL INJECTION, making your ASP application much more secure. If the service account the application pool you ASP page runs under can pass raw queries to the database then any compromise can do tremendous damage to your database. If all it can do is execute a stored procedure where the parameters can be changed but not the functionality, they can only put some junk values in, or maybe not even that if you range check well.
The last bit of advice is that you are not preserving "ACID", you are preserving "NORMALIZED". It's definitely a tough concept to wrap your head around, here's a resource that helped me out a great deal when I was starting out. http://www.marcrettig.com/data-normalization-poster/ I still have a copy on my wall. You shouldn't obsess over normalization, but you should definitely keep it in mind and stick to it when you reasonably can. Again, there are numerous resources a search will get you, but the basic benefit is a normalized database is much more resistant to consistency problems, and is more storage efficient. And since disk IO is slow, storage efficient is usually query efficient too.
They are related tables. You should have at least one table with a primary key and often several that related back to that table from that table's foreign key.
TableOne
TableOneID
TableTwo
TableTwoID
TableOneID
TableTwo relates to TableOne via TableOneID. An inner join would give you where there are records in both tables based on your join. Example:
SELECT *
FROM TableOne t1
INNER JOIN TableTwo t2 ON t1.TableOneID=t2.TableOneID
Specifically how to do this in your application depends on your design. If you are using an ORM, then actual SQL is not terribly important. If you are using stored procedures, then it is.

How can I speed up search/browse/filter with 10 M products?

Background:
I'm using SQL Server 2008 and ASP.NET 4 on Windows 2008
I have one table with about 10 million rows of products that I make available online for users to browse -- not search. Each of the 10 million products have extra attributes -- like categories -- that I keep in lookup tables -- there are three or four lookup tables.
Problem
When someone browses and starts using filters (shipping location, price, quality, brand), I need to join the tables, apply all the filters, and return the results. It's very slow and I want to make it faster. Sometimes users will apply a very broad filter, resulting in 800,000 results, and though I only return the first 10 of those for browsing, I still need to run the query for the full 800,000.
What I've Tried Already
I've joined all the information from the various tables into one physical table and then created a covering index for the table.
The queries are much faster, but there is a good bit of maintenance I have to do on the table behind the scenes with jobs to make sure if something goes out of stock I take it out within a reasonable time frame (5 mins or so).
I don't use materialized/indexed views b/c I've got aggregates in the results which SQL Server doesn't seem to like.
Question
How can I speed up browse results beyond the indexing and table optimization that I've already done? I'm not doing any full-text searches -- I'm filtering with exact parameters.
Possible Solutions I've Thought Of
Large caching solution -- AppFabric or MemCached. I'm know next to nothign about these and don't know they are appropriate.
Small caching solution -- Maybe leveraging ASP.NET caching -- but every person is going to apply different filters so I'm not sure how much this will give me.
SSDs -- as a larger-scale solution I've thought about getting SSDs but that will be down the road
CDN -- I don't think a CDN will help b/c the bottleneck here is my database's search capabilities, not the bandwidth/distance to the requester.
I had a similar problem with a complex join query causing horrible response times. I was able to solve it via using Lucene.NET. It's a .NET implementation of the Lucene search index. Basically, you build indexes on data fields (your categories) and then you can search via those categories and return thousands of rows very quickly. Basically, it takes the join operation out of the equation because it already knows, via the indexes, which records fit your criteria.
The following is a very good article on Lucene.NET. I highly recommend it. It took a search result that was taking 20 seconds using standard joins and reduced the response time to less than a second.
http://www.codeproject.com/Articles/29755/Introducing-Lucene-Net
Also, feel free to ping me if you have specific Lucene.NET implmenetation questions. I just got through a lot of research/learning in order to implement it properly on my site, so if you have specific questions on how to make it work I may be able to help with that as well.
"I perform the full query b/c I need to populate the new filters and
the number of results along with the search results. For example, if
someeone filters on category of "Shoes", and location of TX, some of
the other filters are going to be restricted based on the previous
filter."
Try executing two queries: One to count all results and one to select the top N. Maybe your bottleneck is copying 800,000 rows to the client. Doing two queries would fix this at the cost of an additional query. The cost is likely to be less than 2x though due to optimizations for few rows and for count-only queries.

SQL Server 2005 - Select From Multiple DB's & Compile Results as Single Query

Ok, the basic situation: Due to a few mixed up starts, a project ends up with not one, but three separate databases, each containing a portion of the overall project data. All three databases are the same, it's just that, say 10% of the project was run into the first, then a new DB was made due to a code update and 15% of the project was run into the new one, then another code change required another new database for the rest of the project. Again, the pertinent tables are all exactly the same across all three databases.
Now, assume I wanted to take all three of those databases - bearing in mind that they can't just be compiled into a single databases due to Primary Key issues and so on - and run a single query that would look through all three of them, select a given set of data from each, then compile those three sets into one single result and return it to the reporting page I'm working on.
For reference, at its endpoint the data is output to an ASP.Net/VB.Net backed page, specifically a Gridview object. It doesn't need to be edited, fortunately, just displayed.
What would be the best way to approach this mess? I'm thinking that creating a temporary table would be my best bet, but honestly I'm stepping into a portion of SQL that I'm not familiar with here, and would appreciate any guidance somebody more experienced might have.
I'd say your best bet is to suck it up and combine the databases, even if it is a major pain to combine the primary keys. It may be a major pain now, but it is going to be 10x as painful over the life of the project.
You can do a union across multiple databases as Scott has pointed out, but you are in for a world of trouble as the application gets more complex. For example, even if you circumvent the technical limitations by having multiple tables/databases for the same entity, having duplicates in the PK for a logical entity is a world of trouble.
Implement the workaround solution if you must, but I guarantee you will hate yourself for it later.
Why not just use 3 part naming on the tables and union them all together?
select db1.dbo.Table1.Field1,
db1.dbo.Table1.Field2
from db1.dbo.Table1
UNION
select db2.dbo.Table1.Field1,
db2.dbo.Table1.Field2
from db2.dbo.Table1
UNION
select db3.dbo.Table1.Field1,
db3.dbo.Table1.Field2
from db3.dbo.Table1
-- where ...
-- order by ...
You should create what is called a Partitioned View for each of your tables of interest. These views do a union of the underlying base tables and eventually add a syntetic column to uniquefy the rows:
CREATE VIEW vTableXDB
AS
SELECT 'DB1' as db_key, *
FROM DB1.dbo.table
UNION ALL
SELECT 'DB2' as db_key, *
FROM DB2.dbo.table
UNION ALL
SELECT 'DB3' as db_key, *
FROM DB3.dbo.table;
You create one such view for each table and then design your reports on these views, not on the base tables. You must add the db_key to your join conditions. The query optimizzer has some understanding of the partitioned views and might be able to create plans that do the right thing and avoid joins that span multiple dbs, but that is not guaranteed. If things go haywire and the optimizer does not recognize the partitioning resulting in very bad execution times, you may have to move the db_key into the tables themselves and add some artificial check constraints on the base tables so that the optimizer can understand the partitioning (see the article I linked for details).
You can actually join tables on different databases. If I remember right the syntax is changed from "tablename.columnName" to "Server.Owner.tablename.columnName". You will need to run some stored procedures as an admin to allow this connectivity. It's also pretty slow but the effort to get it working is low.
If you have time to do it right look at data warehouse concepts. That's basically a temp table that collects the data you need to report on.
Building on Scott Ivey's excellent example above,
Use table name aliasing to simplify your code
Use UNION ALL instead of UNION assuming that your data is unique between the three databases
Code:
select
d1t1.Field1,
d1t1.Field2
from db1.dbo.Table1 AS d1t1
UNION ALL
select
d2t1.Field1,
d2.Field2
from db2.dbo.Table1 AS d2t1
UNION ALL
select
d3t1.Field1,
d3t1.Field2
from db3.dbo.Table1 AS d3t1
-- where ...
-- order by ...

Resources