Linq to entity performance issues with large set of data

Linq to entity performance issues with large set of data - asp.net

I am currently working with EF4 and in one of my scenario i am using join and wanted to retrieve the data but as the resultant data is so much EF4 is even fail to generate the query plan..As a work around i tried to load the data in simple generic list( using Selecting all data from both the tables) and then tried to join on that two list but still i am getting outofmemory exception as one table contains around 100k records and second table contains 50k records i wanted to join them in query...but still with noluck using EF...please suggest me any work around of this...

I can't think of any scenario where you would need a result set containing 100k+ records. It may not be the answer you want, but the best way to improve performance is to reduce the amount of records that you're dealing with.

What we did is that we wrote custom SQL and executed it with Context.Database.SqlQuery(sql, params)

Related

Should I use WITH instead of a JOIN on a table with a lot of data?

I have a MariaDB table which contains a lot of metadata and is very big in terms of bytes.
I have columns A, B in that table a long with other columns.
I would like to join that table with another table (stuff) in order to get column C from it.
So I have something like:
SELECT metadata.A, metadata.B, stuff.C FROM metadata JOIN
stuff on metadata.D = stuff.D
This query takes a very long time sometimes, I suspect its because (AFAIK, please correct me if Im wrong) that JOIN stores the result of the join in some side table and because metadata table is very big it has to copy a lot of data even though I dont use it, so I thought about optimizing it with WITH as follows:
WITH m as (SELECT A,B,D FROM metadata),
s as (SELECT C,D FROM stuff)
SELECT * FROM m JOIN s ON m.D = s.D;
The execution plan is the same (using EXPLAIN) but I think it will be faster since the side tables that will be created by WITH (again AFAIK WITH also creates side tables, please correct me if Im wrong) will be smaller and only contain the needed data.
Is my logic correct? Is there some way I can test that in MariaDB?

More likely, there is some form of cache speeding up one query or the other.
The Query cache is usually recognizable by a query time that is only about 1ms. It can be turned off via SELECT SQL_NO_CACHE ... to get a timing to compare against.
The other likely cache is the buffer_pool. Data is read from disk into the buffer_pool unless it is already there. The simple workaround for strange timings is to run the query twice and take the second 'time'.
Your hypothesis that WITH creates 'small' temp tables falls apart because of the work that is needed to read the original tables is the same with or without WITH.
Please provide SHOW CREATE TABLE for the two tables. There are a couple of datatype issues that may be involved -- big TEXTs or BLOBs.
The newly-added WITH opens up the possibility of recursive CTEs (and other things). And it provides a way to materialize a temp table that is used more than once. Neither of those applies in your query, so I would not expect any performance improvement.

PLSQL : To show data from multiple tables efficiently on page loading

I am using data from multiple tables (6 tables) from multiple schemas to show a grid on the launch of an application. Due to many outer joins it is taking very long to load the initial page. How can i increase the efficiency of my program.

There are many ways to optimize,
you could start paginating the query results. i.e show only few results in the page at a time and click on page number to see the other results.
Optimize your query. There are options like Query plan that can give you an idea about the performance of the query. Also preferably use SQL queries and avoid using Procedures

websql performance, can we shard tables

I am using websql to store data in a phonegap application. One of table have a lot of data say from 2000 to 10000 rows. So when I read from this table, which is just a simple select statement it is very slow. I then debug and found that as the size of table increases the performance deceases exponentially. I read somewhere that to get performance you have to divide table into smaller chunks, is that possible how?

One idea is to look for something to group the rows by and consider breaking into separate tables based on some common category - instead of a shared table for everything.
I would also consider fine tuning the queries to make sure they are optimal for the given table.
Make sure you're not just running a simple Select query without a where clause to limit the result set.

LINQ to entities performance regarding where clause

Let's say i have a table in a database with 10k records. I dont need to actually use those 10k records anymore, but i still need to keep them in the database. That very table is now going to be used to store new data. So there's gonna be more records coming on top of the 10K records already present in the table. As opposed to the "old" 10K records, i do need to work with the newly inserted data. Right now im doing this to get the data i need:
List<Stuff> l = (from x in db.Table
where x.id > id
select x).ToList();
My question now is: how does the where clause in LINQ (or in SQL in general) work under the covers? Is the ENTIRE table going to be searched until (x.id > id) is true? Because let's say the table will increase from 10k records to 20K. It'd be a little silly to look through the entire 20 k records, if i know that i only have to start looking from a certain point.
I've had performance problems (not dramatic, but bad enough to be agitated by it) with this while using LINQ to entities, which i kinda don't understand because it should be no problem at all for a modern computer to sift through a mere 20 k records. I've been advised to use a stored procedure instead of a LINQ query, but i dont know whether or not this will boost performance?
Any feedback will be appreciated.

It's going to behave just like a similarly worded SQL query would. The question is whether the overhead you're experiencing is happening in the query or in the conversion of the query to a list. The query itself as you've written should equate literally to:
Select ID, Column1, Column2, Column3, ... , Column(n+1)
From db.Table
Where ID > id
This query should be fairly fast depending on the nature of the data. The query itself will not be executed until it is acted upon, however. In this case, you're converting it to a list, which is the equivalent of acting upon it. I can't find the comment someone made to me about this practice, but I've found it too be quite helpful in keeping performance clean. Unless you have some very specific need, you should leave your queries as IQueryable. Converting them to lists doubles the effort because first the query must be executed and then the result set must be converted into an appropriate IEnumerable (List in this case).
So you have 2 potential bottlenecks. The simple query could be taking a long time to query a massive collection of data, or the number of records could be bottenecking at the poing where the List is created. Another possibility is the nature of ID in this case. If it is numeric, that will save you some time. If it's performing a text-based search then it's going to be heavier.
To answer your specific question, yes, it's going to search every record in the database and return all of the records that match the expression. Edit: If the database has a proper index on the column in question, it will not search EVERY record but rather will use the index to perform the search. From comment from #Pleun.
As for using a stored procedure, that's a load of hogwash, but it's a perfectly acceptable alternative. I have several programs that routinely run similar queries against a database with over 40 million records, and the only performance issue I've run into so far has been CPU usage when multiple users are performing rapid firing queries. To solve your specific issue, I'd recommend that you tune it a little in SQL Management Studio until the query you want returns to your interface with an acceptable speed. Then you can convert that query into a compatible Linq statement. As long as you leave it as an IQueryable it should exhibit similar results.

SQL Server 2005 - Select From Multiple DB's & Compile Results as Single Query

Ok, the basic situation: Due to a few mixed up starts, a project ends up with not one, but three separate databases, each containing a portion of the overall project data. All three databases are the same, it's just that, say 10% of the project was run into the first, then a new DB was made due to a code update and 15% of the project was run into the new one, then another code change required another new database for the rest of the project. Again, the pertinent tables are all exactly the same across all three databases.
Now, assume I wanted to take all three of those databases - bearing in mind that they can't just be compiled into a single databases due to Primary Key issues and so on - and run a single query that would look through all three of them, select a given set of data from each, then compile those three sets into one single result and return it to the reporting page I'm working on.
For reference, at its endpoint the data is output to an ASP.Net/VB.Net backed page, specifically a Gridview object. It doesn't need to be edited, fortunately, just displayed.
What would be the best way to approach this mess? I'm thinking that creating a temporary table would be my best bet, but honestly I'm stepping into a portion of SQL that I'm not familiar with here, and would appreciate any guidance somebody more experienced might have.

I'd say your best bet is to suck it up and combine the databases, even if it is a major pain to combine the primary keys. It may be a major pain now, but it is going to be 10x as painful over the life of the project.
You can do a union across multiple databases as Scott has pointed out, but you are in for a world of trouble as the application gets more complex. For example, even if you circumvent the technical limitations by having multiple tables/databases for the same entity, having duplicates in the PK for a logical entity is a world of trouble.
Implement the workaround solution if you must, but I guarantee you will hate yourself for it later.

Why not just use 3 part naming on the tables and union them all together?
select db1.dbo.Table1.Field1,
db1.dbo.Table1.Field2
from db1.dbo.Table1
UNION
select db2.dbo.Table1.Field1,
db2.dbo.Table1.Field2
from db2.dbo.Table1
UNION
select db3.dbo.Table1.Field1,
db3.dbo.Table1.Field2
from db3.dbo.Table1
-- where ...
-- order by ...

You should create what is called a Partitioned View for each of your tables of interest. These views do a union of the underlying base tables and eventually add a syntetic column to uniquefy the rows:
CREATE VIEW vTableXDB
AS
SELECT 'DB1' as db_key, *
FROM DB1.dbo.table
UNION ALL
SELECT 'DB2' as db_key, *
FROM DB2.dbo.table
UNION ALL
SELECT 'DB3' as db_key, *
FROM DB3.dbo.table;
You create one such view for each table and then design your reports on these views, not on the base tables. You must add the db_key to your join conditions. The query optimizzer has some understanding of the partitioned views and might be able to create plans that do the right thing and avoid joins that span multiple dbs, but that is not guaranteed. If things go haywire and the optimizer does not recognize the partitioning resulting in very bad execution times, you may have to move the db_key into the tables themselves and add some artificial check constraints on the base tables so that the optimizer can understand the partitioning (see the article I linked for details).

You can actually join tables on different databases. If I remember right the syntax is changed from "tablename.columnName" to "Server.Owner.tablename.columnName". You will need to run some stored procedures as an admin to allow this connectivity. It's also pretty slow but the effort to get it working is low.
If you have time to do it right look at data warehouse concepts. That's basically a temp table that collects the data you need to report on.

Building on Scott Ivey's excellent example above,
Use table name aliasing to simplify your code
Use UNION ALL instead of UNION assuming that your data is unique between the three databases
Code:
select
d1t1.Field1,
d1t1.Field2
from db1.dbo.Table1 AS d1t1
UNION ALL
select
d2t1.Field1,
d2.Field2
from db2.dbo.Table1 AS d2t1
UNION ALL
select
d3t1.Field1,
d3t1.Field2
from db3.dbo.Table1 AS d3t1
-- where ...
-- order by ...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex