I have an issue and I have looked long and hard over the Internet for an answer but cant find anything.
I have a little app that sucks in a web service. It then passes this web services results another applications via its own web service and stores the request in a table.
What I am trying to do is quickly import the results to a table from the data set.
Option 1 is that I loop through all the rows in the data set and use an insert on each. The issue with this is it is slow and will slow down the response of the little apps web service.
Option 2 is to bulk upload the data set into sql. The bad news for me is I have no idea of how to do this!! Can any one help?
You can use SqlBulkCopy. A simple SqlBulkCopy might look like:
DataTable dtMyData = ... retrieve records from WebService
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connectionString)) {
bulkCopy.BulkCopyTimeout = 120; // timeout in seconds, default is 30
bulkCopy.DestinationTableName = "MyTable";
bulkCopy.WriteToServer(dtMyData);
}
If you are processing a great deal of data you may also want to set the BatchSize property.
It sounds like a mighty busy little web service.
You might instead (or even in addition) consider making it asynchronous or multi-threaded. Otherwise you've just built in the coupling between the two web apps that you may have split them to avoid in the first place.
Using a bulk load can be a pain for various reasons, starting with that how you do it is specific to a particular DBMS. (The next thing that usually gets you is where the file with the data needs to be located.)
However, you can often improve performance dramatically in a portable way merely by doing all of your INSERTs in a single transaction:
BEGIN TRANSACTION
INSERT ...
INSERT ...
...
COMMIT TRANSACTION
If time is on your side, you could look into using SQL Server Service Broker technology however there is initially quite a steep learning curve.
http://msdn.microsoft.com/en-us/library/ms166043(SQL.90).aspx
You can create a web service that fires say a stored procedure to insert the results of your process.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have been looking for some advice for a while on how to handle a project I am working on, but to no avail. I am pretty much on my fourth iteration of improving an "application" I am working on; the first two times were in Excel, the third Time in Access, and now in Visual Studio. The field is manufacturing.
The basic idea is I am taking read-only data from a massive Sybase server, filtering it and creating much smaller tables in Access daily (using delete and append Queries) and then doing a bunch of stuff. More specifically, I use a series of queries to either combine data from multiple tables or group data in specific ways (aggregate functions), and then I place this data into a table (so I can sort and manipulate data using DAO.recordset and run multiple custom algorithms). This process is then repeated multiple times throughout the database until a set of relevant tables are created.
Many times I will create a field in a query with a value such as 1.1 so that when I append it to a table I can store information in the field from the algorithms. So as the process continues the number of fields for the tables change.
The overall application consists of 4 "back-end" databases linked together on a shared drive, with various output (either front-end access applications or Excel).
So my question is is this how many data driven applications that solve problems essentially work? Each backend database is updated with fresh data daily and updating each takes around 10 seconds (for three) and 2 minutes(for 1).
Project Objectives. I want/am moving to SQL Server soon. Front End will be a Web Application (I know basic web-development and like the administration flexibility) and visual-studio will be IDE with c#/.NET.
Should these algorithms be run "inside the database," or using a series of C# functions on each server request. I know you're not supposed to store data in a database unless it is an actual data point, and in Access I have many columns that just hold calculations from algorithms in vba.
The truth is, I have seen multiple professional Access applications, and have never seen one that has the complexity or does even close to what mine does (for better or worse). But I know some professional software applications are 1000 times better then mine.
So Please Please Please give me a suggestion of some sort. I have been completely on my own and need some guidance on how to approach this project the right way.
If you are going to sql server or any other full client server DBMS for that matter, the trick (generally) is to do as much on the server as possible.
Depends on how you've written the code really. In general the optimisations for a desktop are the inverse of those for a server.
For instance if you a Find Customer facility.
In a desktop you'd get the entire table and then use say Locate to find the record by name, post/zip code etc. Because effectively your application is both server and client.
In a Client Server set up, you pass customer Name etc to the DBMS, and let it find the customer(s) that matched and pass only those back.
So in your situation forgetting the web application bit, you've got to look at what your application does and say can I write this in sql.
So
If you had
// get orders
foreach(Order order in clientOrders)
{
if (Order.Discount > 0)
{
Order.Value = Order.ItemCount * Order.ItemPrice * Order.Discount;
}
}
// save orders
you'd replace that with a query that did
Update Orders Set Value = ItemCount * ItemPrice * Discount
Where ClientID = #ClientID and Discount > 0
Let the server do the work on the server instead of pulling and pushing loads of data into and out of an application.
If I was you though, I'd either do the sql server piece, or I'd do the web server piece, not both at the same time. In terms of client server there's a lot of overlap. Neither one precludes the other, but a lot of times you'll be able to use either to solve the same problem in slightly different ways.
As more details emerge, it appears one piece of your application involves storing 15K rows in your Access db file(s) so that you may later perform computations on those data.
However, it's not clear why you feel those data must be stored in Access to perform the computations.
Ideally, we would create a query to ask the server to perform those calculations. If that's not possible with your server's capabilities, or so computationally intensive as to place an unacceptable processing load on the server, you still should not need to download all the raw data to Access in order to use it for your computations. Instead, you could open a recordset populated by a query on the server, move through the recordset rows to perform your computation and store only the results in your Access table (via a second recordset).
Public Sub next_level_outline()
Dim db As DAO.Database
Dim rsLocal As DAO.Recordset
Dim rsServer As DAO.Recordset
Dim varLastValue As Variant
Set db = CurrentDb
Set rsLocal = db.OpenRecordset("AccessTable", dbOpenTable, dbAppendOnly)
Set rsServer = db.OpenRecordset("ServerQuery", dbOpenSnapshot)
Do While Not rsServer.EOF
rsLocal.AddNew
rsLocal!computed_field = YourAlgorithm(varLastValue)
rsLocal.Update
varLastValue = rsServer!indicator_field.value
rsServer.MoveNext
Loop
rsLocal.Close
Set rsLocal = Nothing
rsServer.Close
Set rsServer = Nothing
Set db = Nothing
End Sub
That is only a crude outline. Much depends on the nature of YourAlgorithm(). From a comment, I gathered it has something to do with a previous row ... so I included varLastValue as a placeholder.
Part of your approach was to filter 2 million source rows to the 15K rows which apply to your selected factory. Do that with a WHERE clause in ServerQuery:
WHERE factory_id = 'foo'
If the row ordering is important for YourAlgorithm(), include an ORDER BY clause in ServerQuery.
The driver for this suggestion is to avoid redundantly storing data in Access. And, if you can't eliminate the redundancy completely, at least limit the extent of it.
You may then find you can consolidate the Access storage into a single db file rather than four. The single db file could simplify other aspects of your application and should also offer improved performance.
I think you should make certain you've thoroughly addressed this issue before you move on to the next stage of your application's evolution. I don't believe this challenge will become any easier in ASP.Net.
The application you describe appears to be an example of "ETL" - extract, transform, load.
It was one of the first projects I ever worked on as a professional programmer - and it's distinctly non-trivial. There are a bunch of tools you can use to help with this process (including one from Microsoft), but they are aimed mostly at populating a data warehouse - it's not clear that's what you're building, so that may not be hugely useful. Nevertheless, read through the Wikipedia article, and perhaps look at some of the ETL tools to get some ideas.
If you go your own way, I'd suggest writing a windows service to automatically run your ETL process. I assume you run the import on some kind of trigger - nightly, hourly, when the manufacturing system sends you a message or whatever; write your windows service to poll for this trigger.
I'd then execute whatever database commands from the service you need to move the data around, run your algorithms etc; pay attention to error handling and logging (services don't have a user interface, so you have to write errors to the system log and make sure someone is paying attention). Consider wrapping your database code in stored procedures - it makes them easier to invoke from the service.
It sounds like this is a fairly complex app; pay attention to code quality, consider unit tests (though it's hard to unit test database code). Buy "Code complete" by Steve McConnell and read it cover to cover if you're not a professional coder.
Background: I am using SQLite database in my flex application. Size of the database is 4 MB and have 5 tables which are
table 1 have 2500 records
table 2 have 8700 records
table 3 have 3000 records
table 4 have 5000 records
table 5 have 2000 records.
Problem: Whenever I run a select query on any table, it takes around (approx 50 seconds) to fetch data from database tables. This has made the application quite slow and unresponsive while it fetches the data from the table.
How can i improve the performance of the SQLite database so that the time taken to fetch the data from the tables is reduced?
Thanks
As I tell you in a comment, without knowing what structures your database consists of, and what queries you run against the data, there is nothing we can infer suggesting why your queries take much time.
However here is an interesting reading about indexes : Use the index, Luke!. It tells you what an index is, how you should design your indexes and what benefits you can harvest.
Also, if you can post the queries and the table schemas and cardinalities (not the contents) maybe it could help.
Are you using asynchronous or synchronous execution modes? The difference between them is that asynchronous execution runs in the background while your application continues to run. Your application will then have to listen for a dispatched event and then carry out any subsequent operations. In synchronous mode, however, the user will not be able to interact with the application until the database operation is complete since those operations run in the same execution sequence as the application. Synchronous mode is conceptually simpler to implement, but asynchronous mode will yield better usability.
The first time SQLStatement.execute() on a SQLStatement instance, the statement is prepared automatically before executing. Subsequent calls will execute faster as long as the SQLStatement.text property has not changed. Using the same SQLStatement instances is better than creating new instances again and again. If you need to change your queries, then consider using parameterized statements.
You can also use techniques such as deferring what data you need at runtime. If you only need a subset of data, pull that back first and then retrieve other data as necessary. This may depend on your application scope and what needs you have to fulfill though.
Specifying the database with the table names will prevent the runtime from checking each database to find a matching table if you have multiple databases. It also helps prevent the runtime will choose the wrong database if this isn't specified. Do SELECT email FROM main.users; instead of SELECT email FROM users; even if you only have one single database. (main is automatically assigned as the database name when you call SQLConnection.open.)
If you happen to be writing lots of changes to the database (multiple INSERT or UPDATE statements), then consider wrapping it in a transaction. Changes will made in memory by the runtime and then written to disk. If you don't use a transaction, each statement will result in multiple disk writes to the database file which can be slow and consume lots of time.
Try to avoid any schema changes. The table definition data is kept at the start of the database file. The runtime loads these definitions when the database connection is opened. Data added to tables is kept after the table definition data in the database file. If changes such as adding columns or tables, the new table definitions will be mixed in with table data in the database file. The effect of this is that the runtime will have to read the table definition data from different parts of the file rather than at the beginning. The SQLConnection.compact() method restructures the table definition data so it is at the the beginning of the file, but its downside is that this method can also consume much time and more so if the database file is large.
Lastly, as Benoit pointed out in his comment, consider improving your own SQL queries and table structure that you're using. It would be helpful to know your database structure and queries are the actual cause of the slow performance or not. My guess is that you're using synchronous execution. If you switch to asynchronous mode, you'll see better performance but that doesn't mean it has to stop there.
The Adobe Flex documentation online has more information on improving database performance and best practices working with local SQL databases.
You could try indexing some of the columns used in the WHERE clause of your SELECT statements. You might also try minimizing usage of the LIKE keyword.
If you are joining your tables together, you might try simplifying the table relationships.
Like others have said, it's hard to get specific without knowing more about your schema and the SQL you are using.
Hai guys,
I ve developed a web application using asp.net and sql server 2005 for an attendance management system.. As you would know attendance activities will be carried out daily.. Inserting record one by one is a bad idea i know,my questions are
Is Sqlbulkcopy the only option for me when using sql server as i want to insert 100 records on a click event (ie) inserting attendance for a class which contains 100 students?
I want to insert attendance of classes one by one?
Unless you have a particularly huge number of attendance records you're adding each day, the best way to do it is with insert statements (I don't know why exactly you've got it into your head that this is a bad idea, our databases frequently handle tens of millions of rows being added throughout the day).
If your attendance records are more than that, you're on a winner, getting that many people to attend whatever functions or courses you're running :-)
Bulk copies and imports are generally meant for transferring sizable quantities of data and I mean sizeable as in the entire contents of a database to a disaster recovery site (and other things like that). I've never seen it used in the wild as a way to get small-size data into a database.
Update 1:
I'm guessing based on the comments that you're actually entering the attendance records one by one into your web app and 1,500 is taking too long.
If that's the case, it's not the database slowing you down, nor the web app. It's how fast you can type.
The solution to that problem (if indeed it is the problem) is to provide a bulk import functionality into your web application (or database directly if you wish but you're better off in my opinion having the application do all the work).
This is of course assuming that the data you're entering can be accessed electronically. If all you're getting is pieces of paper with attendance details, you're probably out of luck (OCR solutions notwithstanding), although if you could get muliple people doing it concurrently, you may have some chance of getting it done in a timely manner. Hiring 1,500 people do do one each should knock it over in about five minutes :-)
You can add functionality to your web application to accept the file containing attendance details and process each entry, inserting a row into your database for each. This will be much faster than manually entering the information.
Update 2:
Based on your latest information that it's taking to long to process the data after starting it from the web application, I'm not sure how much data you have but 100 records should basically take no time at all.
Where the bottleneck is I can't say, but you should be investigating that.
I know in the past we've had long-running operations from a web UI where we didn't want to hold up the user. There are numerous solutions for that, two of which we implemented:
take the operation off-line (i.e., run it in the background on the server), giving the user an ID to check on the status from another page.
same thing but notify user with email once it's finished.
This allowed them to continue their work asynchronously.
Ah, with your update I believe the problem is that you need to add a bunch of records after some click, but it takes too long.
I suggest one thing that won't help you immediately:
Reconsider your design slightly, as this doesn't seem particularly great (from a DB point of view). But that's just a general guess, I could be wrong
The more helpful suggestion is:
Do this offline (via a windows service, or similar)
If it's taking too long, you want to do it asynchronously, and then later inform the user that the operation is completed. Probably they don't even need to be around, you just don't let them do whatever functions that the data is needed, before it's completed. Hope that idea makes sense.
The fastest general way is to use ExecuteNonQuery.
internal static void FastInsertMany(DbConnection cnn)
{
using (DbTransaction dbTrans = cnn.BeginTransaction())
{
using (DbCommand cmd = cnn.CreateCommand())
{
cmd.CommandText = "INSERT INTO TestCase(MyValue) VALUES(?)";
DbParameter Field1 = cmd.CreateParameter();
cmd.Parameters.Add(Field1);
for (int n = 0; n < 100000; n++)
{
Field1.Value = n + 100000;
cmd.ExecuteNonQuery();
}
}
dbTrans.Commit();
}
}
Even on a slow computer this should take far less than a second for 1500 inserts.
[reference]
Context
My current project is a large-ish public site (2 million pageviews per day) site running a mixture of asp classic and asp.net with a SQL Server 2005 back-end. We're heavy on reads, with occasional writes and virtually no updates/deletes. Our pages typically concern a single 'master' object with a stack of dependent (detail) objects.
I like the idea of returning all the data required for a page in a single proc (and absolutely no unnecesary data). True, this requires a dedicated proc for such pages, but some pages receive double-digit percentages of our overall site traffic so it's worth the time/maintenance hit. We typically only consume multiple-recordsets from our .net code, using System.Data.SqlClient.SqlDataReader and it's NextResult method. Oh, yeah, I'm not doing any updates/inserts in these procs either (except to table variables).
The question
SQL Server (2005) procs which return multiple recordsets are working well (in prod) for us so far but I am a little worried that multi-recordset procs are my new favourite hammer that i'm hitting every problem (nail) with. Are there any multi-recordset sql server proc gotchas I should know about? Anything that's going to make me wish I hadn't used them? Specifically anything about it affecting connection pooling, memory utilization etc.
Here's a few gotchas for multiple-recordset stored procs:
They make it more difficult to reuse code. If you're doing several queries, odds are you'd be able to reuse one of those queries on another page.
They make it more difficult to unit test. Every time you make a change to one of the queries, you have to test all of the results. If something changed, you have to dig through to see which query failed the unit test.
They make it more difficult to tune performance later. If another DBA comes in behind you to help performance improve, they have to do more slicing and dicing to figure out where the problems are coming from. Then, combine this with the code reuse problem - if they optimize one query, that query might be used in several different stored procs, and then they have to go fix all of them - which makes for more unit testing again.
They make error handling much more difficult. Four of the queries in the stored proc might succeed, and the fifth fails. You have to plan for that.
They can increase locking problems and incur load in TempDB. If your stored procs are designed in a way that need repeatable reads, then the more queries you stuff into a stored proc, the longer it's going to take to run, and the longer it's going to take to return those results back to your app server. That increased time means higher contention for locks, and the more SQL Server has to store in TempDB for row versioning. You mentioned that you're heavy on reads, so this particular issue shouldn't be too bad for you, but you want to be aware of it before you reuse this hammer on a write-intensive app.
I think multi recordset stored procedures are great in some cases, and it sounds like yours maybe one of them.
The bigger (more traffic), you site gets, the more important that 'extra' bit of performance is going to matter. If you can combine 2-3-4 calls (and possibly a new connections), to the database in one, you could be cutting down your database hits by 4-6-8 million per day, which is substantial.
I use them sparingly, but when I have, I have never had a problem.
I would recommend having invoking in one stored procedure several inner invocations of stored procedures that return 1 resultset each.
create proc foo
as
execute foobar --returns one result
execute barfoo --returns one result
execute bar --returns one result
That way when requirments change and you only need the 3rd and 5th result set, you have a easy way to invoke them without adding new stored procedures and regenerating your data access layer. My current app returns all reference tables (e.g. US states table) if I want them or not. Worst is when you need to get a reference table and the only access is via a stored procedure that also runs an expensive query as one of its six resultsets.
I don't know when to add to a dataset a tableadapter or a query from toolbox. Does it make any difference?
I also dont know where to create instances of the adapters.
Should I do it in the Page_Load?
Should I just do it when I'm going to use it?
Am I opening a new connection when I create a new instance?
This doesn't seem very important, but every time I create a query a little voice on my brain asks me these questions.
Should I just do it when I'm going to use it?
I would recommend that you only retrieve the data when you are going to use it. If you are not going to need it, there is no reason to waste resources by retrieving it in Page_Load. If you are going to need it multiple times throughout the page load, consider saving the query results to a private variable or collection so that the same data can be reused multiple times throughout the page load.
Am I opening a new connection when I create a new instance?
Asp.net handles connection pooling, and opens and closes connections in an efficient way. You shouldn't have to worry about this.
One other thing to consider from a performance perspective is to avoid using Datasets and TableAdapters. In many cases, they add extra overhead into data retrieval that does not exist when using Linq to Sql, Stored Procedures or DataReaders.