What is the performance of subqueries vs two separate select queries? [closed] - sqlite

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Is one generally faster than the other with SQLite file databases?
Do subqueries benefit of some kind of internal optimization or are they handled internally as if you did two separate queries?
I did some testing but I don't see much difference, probably because my table is too small now (less than 100 records)?

It depends on many factors. Two separate queries means two requests. A request has a little overhead, but this weighs more heavily if the database is on a different server. For subqueries and joins, the data needs to be combined. Small amounts can easily be combined in memory, but if the data gets bigger, then it might not fit, causing the need to swap temporary data to disk, degrading performance.
So, there is no general rule to say which one is faster. It's good to do some testing and find out about these factors. Do some tests with 'real' data, and with the amount of data you are expecting to have in a year from now.
And keep testing in the future. A query that performs well, might suddenly become slow when the environment or the amount of data changes.

Related

Best way to store non-numeric Ids for wildcard search? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 days ago.
Improve this question
Azure Data Explorer is an Microsoft Azure service, closed source, with a good documentation, but sometimes it needs further explanation.
As it is closed source, I cannot check the code myself to understand its inner workings, so I must rely on community.
This questions is very specific for azure data explorer implementation of strings columns and the way it creates indexes for it. In a general sense, I feel I'd know how to implement that in other databases.
This is a real-life scenario. I can't go into much details, but right now we are using postgres for that. But as the vast majority of queries go into the last 90 days, postgres it's getting pretty expensive for the (5+years) of data we have. Data Explorer seems like a perfect fit, because of the cold storage feature.
I this case, in a single event (~170bytes), I have a field holding an identifier (device id), composed of 9 alphanumeric characters.
I want to be able to find events from a device using wildcard queries like:
*A*2
A*2
A*2*
*2
A*
IE: multiple wildcards, prefix and suffix, prefix only and suffix only.
What is the a good approach for that in Azure Data Explorer? This field has ~150million unique values, and we get in the order of 50 million rows per day.
Our users want to do queries like: give me the events involving devices with id in the format "A*2*" in the last 90 days.
What I'm considering doing is:
Alter the column encoding_policy as Identifier, to avoid creating a term search index, as recommended in the Encoding policy documentation
But I'm not sure how that affects this use case of wildcard searches, as the documentation is very vague about what the Identifier encoding profile does. Example: is it good for high cardinality columns? How does it affect performance for regex queries?

Webservice performance - Separate ASMX for each function or both functions inside the same ASMX? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am using ASMX web services.
I have two functions so, should I create two functions in a single ASMX or should I create separate ASMX for each function?
Does that impact performance? Which choice will have the highest performance?
With all things performance you need to profile it before making changes to increase performance otherwise you could end up optimizing the wrong thing.
Most of the times the Pareto principle applies, a small portion of code or a few modules in the entire application are responsible for most of the execution time. Making optimizations there will have the greatest impact on performance.
Have you optimized everything that could be optimized and drawn the conclusion that the service endpoint can cause performance issues?
You should write the code how it's easier to maintain. Do those two functions belong together or are they completely unrelated? Does it make sense to have them exposed through one ASMX or two? That should be your criteria for how to define your endpoints.
My guess is that both choices will have similar performance but if you absolutely need to know build them both ways, profile them, and see which one performs better.

Are graph databases better suited to store trees than key-val stores? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
My application data consists of a huge tree that grows as users interact with the system. Are graph databases more suited to store big trees than key-val stores? Is the loss in scalability (for the fact graph dbs are usually harder to shard) compensated by other features?
It depends.
If you use a key value store, I would imagine you would do a lot of lookups for the children, and this could be a long list, so your key would be the parent node, and your value the children, and you could end up with a lot of movement and querying of the table. This is typically the problem you have in Relational Databases, these type of table joins.
A Graph Database is great as you do not do joins, but traversals, so you would start at the root, and specify depth, or an end condition, then you could let the graph traversal use outgoing relationships to get you to your end result.
I agree with you that sharding is not a good option for Graph Databases, at least not in the sense of cross-store relationship traversals. But I believe with proper modeling of your data, this shouldn't be a problem, at least not if the Graph Database is smart.
Neo4j has a problem with dense nodes, where a node with many(500k+) relationships can cause a slow down on traversals, but you can use indexing the get around this problem. Aside from this, it's great for large data as it's storage on disk is efficient, and it's traversals are very fast.
Define "huge". If you can fit within the confines/limitations of Neo4J or have a natural and logical sharding model, Neo4J would be a much cleaner/simpler/more powerful approach and would require a lot less code. As Nicholas said, if your database will have a lot of "hot spot" nodes (many relationships), you might have some challenges w/Neo4J, though there are generally application design approaches you can use to work around that limitation.

Datatable Archival SQL Server 2008 [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a database that is 54 GB in size, some of the tables have loads of rows in them.
Now querying these tables eats up performance and it takes ages for queries to execute.
Have created indexes and stuff, but now I am planning to archive records in certain tables and only query them when user wants to.
For example, let's say I have a table called Transaction and I often don't need rows of older than 3 months.
Now what I want to do is store all the rows of the Transaction table which are more than 3 months old into some other table and query that table only when user in UI says view archived transactions.
Options I can think of:
Creating ArchivedTransaction in the same database, but problem is that size of database will keep growing and at some point I will have to start deleting rows.
Moving rows to different database altogether but in this case how do I manage database request, there will be lot of change required, also not sure of performance when some one says view archived rows
Adding a column Archived to tables and then checking for the flag when needed, but size issue still the same and performance doesn't improve to that extent.
I am not sure which way to go, I am sure there is much better way to handle this that I am not aware of.
Any ideas which way to go? is there some way you can suggest.

how to deal huge amounts of data,such as grouping operation? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
If a table has Hundreds of millions data,how to do grouping operation in SQL Server 2008?
who can gave me some suggestion?
thanks!
If a table has Hundreds of millions data,how to do grouping operation in SQL Server 2008?
Misunderstandings.
Hundreds of millions = small. I know a table we work on my probject tha gros around 60 million per day, and I have a data set that gros 600 million entries per day.
SQL has grouping operations, you know.
Now, you state "entity framework" as tag. Hereis the deal: you do not use business objects for groups as groups have no functionality anyway and are pure read only projections.
Go SQL (either direct or with a capable LINQ provider). GROUP BY is a SQL Command you may want to read up on.
If oyu need repeatable read, then a materialized view on the server may work, or inserting the data into another table for fast access. Depends a lot on the usage pattern. And make sure you actually have hardware capable enough for your required usage (which I agree many people will never understand). Given proper hardware (Exadata shines here, but it start i thin kat a quarter million USD or so per cluster) you can pull of billion row aggregations in nearly real time.

Resources