This question already has answers here:
select * vs select column
(12 answers)
Closed 9 years ago.
As someone who is newer to many things SQL as I don't use it much, I'm sure there is an answer to this question out there, but I don't know what to search for to find it, so I apologize.
Question: if I had a bunch of rows in a database with many columns but only need to get back the IDs which is faster or are they the same speed?
SELECT * FROM...
vs
SELECT ID FROM...
You asked about performance in particular vs. all the other reasons to avoid SELECT *: so it is performance to which I will limit my answer.
On my system, SQL Profiler initially indicated less CPU overhead for the ID-only query, but with the small # or rows involved, each query took the same amount of time.
I think really this was only due to the ID-only query being run first, though. On re-run (in opposite order), they took equally little CPU overhead.
Here is the view of things in SQL Profiler:
With extremely high column and row counts, extremely wide rows, there may be a perceptible difference in the database engine, but nothing glaring here.
Where you will really see the difference is in sending the result set back across the network! The ID-only result set will typically be much smaller of course - i.e. less to send back.
Never use * to return all columns in a table–it’s lazy. You should only extract the data you need.
so-> select field from is more faster
There are several reasons you should never (never ever) use SELECT * in production code:
since you're not giving your database any hints as to what you want, it will first need to check the table's definition in order to determine the columns on that table. That lookup will cost some time - not much in a single query - but it adds up over time.
in SQL Server (not sure about other databases), if you need a subset of columns, there's always a chance a non-clustered index might be covering that request (contain all columns needed). With a SELECT *, you're giving up on that possibility right from the get-go. In this particular case, the data would be retrieved from the index pages (if those contain all the necessary columns) and thus disk I/O and memory overhead would be much less compared to doing a SELECT *.... query.
Long answer short, selecting only the columns you need will always be faster. SELECT * requires scanning the whole table. This is a best practice thing that you should adopt very early on.
For the second part, you should probably post a seperate question instead of piggybacking off this one. Makes it easy to distinguish what you are asking about.
Related
I wanted to improve the performance on my SQLite3 database. I went with the most extreme course of action first (just to see what would happen) and added an index to every column of every table in the database.
The database size more than doubled, and to my surprise, performance dropped drastically. Where I had previously gotten 4000 selects per second I now get ~50 selects per second.
This question is not specifically about my case. My question is; is it possible that adding indexes will decrease SELECT performance in SQLite3? I'm asking because I want to know if my problem is that I added too many indexes, or if I've made a mistake somewhere that is causing the slowdown.
To be more specific about my case: the database increased from 140 MB to 280 MB and I have an SSD.
There a mechanisms by which additional indexes could cause a slowdown:
Most optimization decisions are designed for the worst case – when you're accessing data that is too large to fit into any cache and has to be loaded from disk.
If the data itself fits into the caches, but all the various indexes used by your queries are so large that the entire working set becomes too large, you will get more swapping.
SELECT queries will ignore any indexes that are not actually used.
However, INSERT/UPDATE/DELETE statements must update all indexes of the changed table, so every additional index will slow down such changes.
Use EXPLAIN QUERY PLAN to check which indexes are actually used by a query.
Read Query Planning and The SQLite Query Planner to understand how indexes can be used.
I want create database. Simple I think. Just to storage number of phone, date, time and note.
Better (for database perfomance) use new table for every phone number and notes or one table and all information in it?
The right way is to normalize your data (hence, use as much tables as needed).
If you split your data into several tables (assuming you use indexed) write performance will be better.
Regarding read performance, depends on the size of the data (namely notes), but I would argue that having more tables is also better - except if indexing is out of the question (no reason for that really) and if you would otherwise need to join tables to get data. Even then, I don't think it would be a big trade-off.
SQLite can write millions of rows/s and read another more, are you sure you want to ask this question?
Background:
I'm using SQL Server 2008 and ASP.NET 4 on Windows 2008
I have one table with about 10 million rows of products that I make available online for users to browse -- not search. Each of the 10 million products have extra attributes -- like categories -- that I keep in lookup tables -- there are three or four lookup tables.
Problem
When someone browses and starts using filters (shipping location, price, quality, brand), I need to join the tables, apply all the filters, and return the results. It's very slow and I want to make it faster. Sometimes users will apply a very broad filter, resulting in 800,000 results, and though I only return the first 10 of those for browsing, I still need to run the query for the full 800,000.
What I've Tried Already
I've joined all the information from the various tables into one physical table and then created a covering index for the table.
The queries are much faster, but there is a good bit of maintenance I have to do on the table behind the scenes with jobs to make sure if something goes out of stock I take it out within a reasonable time frame (5 mins or so).
I don't use materialized/indexed views b/c I've got aggregates in the results which SQL Server doesn't seem to like.
Question
How can I speed up browse results beyond the indexing and table optimization that I've already done? I'm not doing any full-text searches -- I'm filtering with exact parameters.
Possible Solutions I've Thought Of
Large caching solution -- AppFabric or MemCached. I'm know next to nothign about these and don't know they are appropriate.
Small caching solution -- Maybe leveraging ASP.NET caching -- but every person is going to apply different filters so I'm not sure how much this will give me.
SSDs -- as a larger-scale solution I've thought about getting SSDs but that will be down the road
CDN -- I don't think a CDN will help b/c the bottleneck here is my database's search capabilities, not the bandwidth/distance to the requester.
I had a similar problem with a complex join query causing horrible response times. I was able to solve it via using Lucene.NET. It's a .NET implementation of the Lucene search index. Basically, you build indexes on data fields (your categories) and then you can search via those categories and return thousands of rows very quickly. Basically, it takes the join operation out of the equation because it already knows, via the indexes, which records fit your criteria.
The following is a very good article on Lucene.NET. I highly recommend it. It took a search result that was taking 20 seconds using standard joins and reduced the response time to less than a second.
http://www.codeproject.com/Articles/29755/Introducing-Lucene-Net
Also, feel free to ping me if you have specific Lucene.NET implmenetation questions. I just got through a lot of research/learning in order to implement it properly on my site, so if you have specific questions on how to make it work I may be able to help with that as well.
"I perform the full query b/c I need to populate the new filters and
the number of results along with the search results. For example, if
someeone filters on category of "Shoes", and location of TX, some of
the other filters are going to be restricted based on the previous
filter."
Try executing two queries: One to count all results and one to select the top N. Maybe your bottleneck is copying 800,000 rows to the client. Doing two queries would fix this at the cost of an additional query. The cost is likely to be less than 2x though due to optimizations for few rows and for count-only queries.
I am relatively new to sql(ite), and I'm learning as I go while working on a new project.
We have got millions of transaction rows in one "data" table, one field being a "sessionid" field.
Since I want to concentrate on in-session activity for now, I primarily need to look only at transactions from the same sessions.
My intuition now is, that it would be a lot faster if I separate the database by sessions into many single session tables, than always querying for a single sessionid, and then proceeding. My question: is that correct? will that make a difference?
Even if not: Could you help me out and tell me, how I could split the one "data" table rows into many session-specific tables, the rows staying the same? Plus one table which relates sessionIds to their tables?
Thanks!
A friend just told me, the splitting-into-tables thing would be extremely unflexible, and I should try adding a distinct index instead for the different sessionId rows to access single sessions faster. Any thoughts on that and how to do it best?
First of all, are you having any specific performance bottleneck with it till now? If yes, please describe it.
Having one table per session will probably speed lookups/indexes (for INSERTs) things up.
SQLite doesn't impose a limit on the number of tables, so you should be okay.
One other solution that provides easier maintenance, is if you create one table per day/week.
Depending on how long your sessions last, this could be feasible or not.
Related: https://stackoverflow.com/a/811862/89771
Let's say i have a table in a database with 10k records. I dont need to actually use those 10k records anymore, but i still need to keep them in the database. That very table is now going to be used to store new data. So there's gonna be more records coming on top of the 10K records already present in the table. As opposed to the "old" 10K records, i do need to work with the newly inserted data. Right now im doing this to get the data i need:
List<Stuff> l = (from x in db.Table
where x.id > id
select x).ToList();
My question now is: how does the where clause in LINQ (or in SQL in general) work under the covers? Is the ENTIRE table going to be searched until (x.id > id) is true? Because let's say the table will increase from 10k records to 20K. It'd be a little silly to look through the entire 20 k records, if i know that i only have to start looking from a certain point.
I've had performance problems (not dramatic, but bad enough to be agitated by it) with this while using LINQ to entities, which i kinda don't understand because it should be no problem at all for a modern computer to sift through a mere 20 k records. I've been advised to use a stored procedure instead of a LINQ query, but i dont know whether or not this will boost performance?
Any feedback will be appreciated.
It's going to behave just like a similarly worded SQL query would. The question is whether the overhead you're experiencing is happening in the query or in the conversion of the query to a list. The query itself as you've written should equate literally to:
Select ID, Column1, Column2, Column3, ... , Column(n+1)
From db.Table
Where ID > id
This query should be fairly fast depending on the nature of the data. The query itself will not be executed until it is acted upon, however. In this case, you're converting it to a list, which is the equivalent of acting upon it. I can't find the comment someone made to me about this practice, but I've found it too be quite helpful in keeping performance clean. Unless you have some very specific need, you should leave your queries as IQueryable. Converting them to lists doubles the effort because first the query must be executed and then the result set must be converted into an appropriate IEnumerable (List in this case).
So you have 2 potential bottlenecks. The simple query could be taking a long time to query a massive collection of data, or the number of records could be bottenecking at the poing where the List is created. Another possibility is the nature of ID in this case. If it is numeric, that will save you some time. If it's performing a text-based search then it's going to be heavier.
To answer your specific question, yes, it's going to search every record in the database and return all of the records that match the expression. Edit: If the database has a proper index on the column in question, it will not search EVERY record but rather will use the index to perform the search. From comment from #Pleun.
As for using a stored procedure, that's a load of hogwash, but it's a perfectly acceptable alternative. I have several programs that routinely run similar queries against a database with over 40 million records, and the only performance issue I've run into so far has been CPU usage when multiple users are performing rapid firing queries. To solve your specific issue, I'd recommend that you tune it a little in SQL Management Studio until the query you want returns to your interface with an acceptable speed. Then you can convert that query into a compatible Linq statement. As long as you leave it as an IQueryable it should exhibit similar results.