I currently have a problem where the performance of my database is impacted by several million lines of updates that run (it takes more or less 3 days, so we usually run them over a weekend)
However since the site is live, search performance is impacted. A 3 second query to pull 1.3 million records and page through them takes in excess of the timeout values by default in sql server sometimes. This obviously creates a user experience no one wants (or can afford to) to have happen.
My question now. If I setup replication on the Master to a Slave on the same server; Would I be able to point the website to the Slave and avoid that performance impact? Or would it just be duplicating the same problem since the Master will push any updates through to the Slave in any case?
I don't think replication is going to help you here, it is only going to make things on the source system worse IMHO.
Is it possible you can make a static copy of the data for the users running queries while the updates are going on? For reporting solutions that don't need to be up-to-the-minute, I've done this in several cases using two schemas - one that holds the static versions of the tables for querying, and one where the work is being done; when the work is done, switch. I go into this methodology in a little more detail here: What is the best way to refresh a rollup table under load?
Perhaps another thought is to make your updates more efficient, such that they don't take 3 days? Do you only do this on long weekends?
Question is, what is nature of your site ?? Do users use it just to "Search" or it does CRUD operations ?? If it is just for "Search" and Report generation then I agree with #Aaron. You can have some database just for reporting purposes, you can even use Log Shipping to automatically update your reporting database at very brief interval.
Is it possible that user can change data at the same time while records are being updated by your update process ?? In that case, you will have to update your Primary Database using your update job, and then again update Primary database for changes made by users using Slave database.
Related
Here's the scenario...
We have an internal website that is running the latest version of the ODAC (Oracle Client). It opens database connections, runs a stored procedure or packaged method, then disconnects. Connection pooling is turned on, and we are currently under version 11g in both our development and test environments, but under 10gR2 in our production environment. This happens on Production.
A few days ago, a process began firing off a ORA-2020 error. The process is called from a webpage on our internal website. The user simply sets a date, hits a button, and a job is started on another system that is separate from the website. The call itself, however, uses a database link to run a function.
We've scoured the SQL to find that it only uses that one database link. And since these links are on a per session basis and the user isn't exceeding the default limit of 4, how is it possible that we are getting a ORA-2020 error.
We have ran a number of tests to try to push over the default limit of 4. ODAC, from what I recall, runs a commit after each connection, and I can't seem to run 4 DB links, then run a piece of SQL with 1 DB link directly after with any errors. The only way I can bring up this error is if I run a query with 4 DB links, then a function or piece of dynamic SQL with a database link within it. We don't have that problem as this issue is sporadic. It isn't ALWAYS happening.
Questions
Is it possible that connection pooling is allowing User B to use User A's connection after the initial process was run, thus adding to the open links number if User B runs a SQL statement with more database links?
Is this a scenario where we should up our limit past 4? What are the disadvantages of increasing the number?
Do I need to explicitly close open database links before disconnecting from the database? Oracle documentation seems to suggest it should automatically happen, but "on occasion"... doesn't.
Firstly, the simple solution: I'd double check that in the production database the number of default links is actually 4.
select *
from v$system_parameter
where name = 'OPEN_LINKS'
Assuming you're not going to get off that lightly:
Is it possible that connection pooling is allowing User B to use User
A's connection after the initial process was run, thus adding to the
open links number if User B runs a SQL statement with more database
links?
You say that you explicitly close the session, which, according to the documentation, should mean that all links associated with that session are closed. Other than that I confess complete ignorance on this point.
Is this a scenario where we should up our limit past 4? What are the
disadvantages of increasing the number?
There aren't any disadvantages that I can think of. Tom Kyte suggests, albeit a long time ago, that each open database link uses 500k of PGA memory. If you don't have any then this will obviously cause a problem but it should be more than fine for most situations.
There are, however, unintended consequences: Imagine that you up this number to 100. Somebody codes something that continually opens links and draws a lot of data through all them select * from my_massive_table or similar. Instead of 4 sessions doing this you have 100, which is attempting to transfer hundreds of gigabytes simultaneously. Your network dies under the strain...
There's probably more but you get the picture.
Do I need to explicitly close open database links before disconnecting
from the database? Oracle documentation seems to suggest it should
automatically happen, but "on occasion"... doesn't.
As you've noted the best answer is "probably not", which isn't much help. You don't mention exactly how you're terminating the session but if you're killing it rather than closing gracefully then definitely.
Using a database link spawns a child process on the remote server. Because your server is no longer in absolute charge of this process there's a myriad of things that could cause it to become orphaned or otherwise not close on termination of the parent process. By no means does this happen the whole time but it can and does.
I would do two things.
In your process, if an exception is encountered, e-mail the results of the following query to yourself.
select *
from v$dblink
As a minimum at least you will know what database links are open in the session and give you some way of tracing them.
Follow the documentations advice; specifically the following:
"You may have occasion to close the link manually. For example, close
links when:
The network connection established by a link is used infrequently in an application.
The user session must be terminated."
The first seems to exactly fit your situation. Unless your process is time-sensitive, which doesn't seem to be the case, then what have you got to lose? The syntax is:
alter session close database link <linkname>
We ended up increasing the link amount, but we never did find the root cause.
My question is same as this question - but a little to add on to it. My problem is that the users of my web app are allowed to create new versions of a record. Every new version of a record results in creating corresponding "new versions" in 50 other related tables to record individual changes to that particular version. This creation of a new version with about 50 tables involved is running within a transaction (to rollback all changes in the event of an error). Many a times this procedure is slow, understandably due to a "lengthy transaction" with too many table "insertions" involved.
I am looking at a better solution/design to implement such a scenario.
Is there a better way to maintain "versions" of the same record, particularly when it creates too many duplicates in multiple tables
I don't feel the design in itself is good with too many records getting inserted for "every row version", but would at least like to address the immediate problem, the "lengthy transaction" - which causes delay at times. There may not be way way out, but wanted to still ask out - If I don't put the "versioning" inside a transaction, is there a better way to rollback in the event of an error (because transaction appears to block other OLTP queries - due to inserting new versions on all primary tables)
The versioning query runs for about 10 seconds right now, but gets worse at times. Any thoughts are appreciated
Do you have to return the "created" or "failed" message in real time? The following might also be overkill for your solution but it does create a scalable solution.
The web-server could post a message (to a queue of some sort) requesting the action. At this point the user could continue using the site to do other stuff. A windows service in the background can process the message (out of context of the website) and then notify the user (by a message inside the website, similar to stack overflow notifications) or by email that the task has either run or failed.
If you can get away with doing processing decoupled in near real time then you can modify your windows service to scale. You can have a thread pool to manage requests - so maybe you only have 5 threads running at any one time to limit load. If you run into more performance problems you can scale out and develop a system that can have 2 or more processors of the queue (that does add it's own problems / complexity).
I have developed a CRM for my company. Next I would like to take that system and make it available for others to use in a hosted format. Very much like a salesforce.com. The question is what type of database structure would I use. I see two options:
Option 1. Each time a company signs up, I clone the master database for them.
The disadvantage of this is that I could end up with thousands of databases. Thats a lot of databases to backup every night. My CRM uses cron jobs for maintanance, those jobs would have to run on all databases.
Each time I upgrade the system with a new feature, and need to add a new column to the database, I will have to add that column to thousands of databases.
Option 2. Use only one database.
At the beginning of EVERY table add "CompanyID". In EVERY SQL statement add "and
companyid={companyid}".
The advantage of this method is the simplicity of only one database. Just one database to backup each night. Just one database to update when needed.
The disadvantage is what if I get 1000 companies signing up, and each wants to store data on 100,000 leads, that 100,000,000 rows in the lead table, which worries me.
Does anyone know how the online hosted CRMs like salesforce.com do this?
Thanks
Wouldn't you clone a table structure style to each new database id all sheets archived in master base indexed client clone is hash verified to access specific sheet run through a host machine at the front end of the master system. Then directing requests as primary role. Internal access is batched to read/write slave systems in groups. Obviously set raid configurations to copy real time and scheduled. Balance requests for load to speed of system resources. That way you separated the security flawed from ui and connection to the retention schema. it seems like simplified structures and reducing policy requests cut down request rss in the query processing. or Simply a man in the middle approach from inside out.
1) Splitting your data into smaller groups in a safe, unthinking way (such as one database per grouping) is almost always best if you want to scale. In this case, unless for some reason you want to query between companies, keeping them in separate databases is best.
2) If you are updating all of the databases by hand, you are doing something wrong if you want to scale. You'd want to automate the process.
3) Ultimately, salesforce.com uses this as the basis of their DB infrastructure:
http://blog.database.com/blog/2011/08/30/database-com-is-open-for-business/
I'm running an ASP.NET app in which I have added an insert/update query to the [global] Page_Load. So, each time the user hits any page on the site, it updates the database with their activity (session ID, time, page they hit). I haven't implemented it yet, but this was the only suggestion given to me as to how to keep track of how many people are currently on my site.
Is this going to kill my database and/or IIS in the long run? We figure that the site averages between 30,000 and 50,000 users at one time. I can't have my site constantly locking up over a database hit with every single page hit for every single user. I'm concerned that's what will happen, however this is the first time I have attempted a solution like this so I may just be overly paranoid.
Do it Async.
Create a dll that handles the update, and in the page load do a fire and forget with parameters.
Insert-Based designs have less locking than Update-Based designs.
So if a user logged-in and then logged-out, in an Insert-Based design you would have multiple rows with a SessionID in each, one for each activity whereas in an Update-Based design, you would have a SessionId, LoginTime and a LogoutTime column and you would update the LogoutTime based on the SessionId.
I have seen many more locking and contention problems caused by Update activity more than Insert activity.
Activities such as counting and linking logins to logouts etc take more complex queries and a little more resources.
It goes without saying that your queries, especially the ones that run on every page, should be as fast as possible so that the site doesn't appear slow to users.
To keep track of how many users are currently on your site you could use performance counters. What you describe though sounds more like a full fledged logging of every page hit.
Lets say you realy have 50k users connected at any one time.
As long as you don't have contention between the updates (trying to lock the same record) a database can track a very high number of inserts and updates. You need to do some capacity planning to assure the load can be carried. 50k users visiting a page every minute will give you 50k inserts and 50k updates per minute, roughly 850 inserts and 850 updates per second, which have to commit (flush the log). Does your DB I/O subsytem support such a write pressure load, in addition to responding to all the requests (reads)?
Also 50k users doing 1 page hit per minute adds up to 72 mil hits per day, 72 mil. logging inserts, at such a rate you need to carefully plan the size capacity of the database and consider what kind of analysis you'll do on the collected data since querying ad-hoc 2 billion rows (one month data) will get you nowhere fast (actually... quite slow).
Doing it async can give you some relief over very short spikes, but not on the long run. If your DB system cannot handle the load then doing async calls will just create a backlog queue in the application process (in the ASP app pool) and this will grow until out of memory, at which moment the all vigilant IIS will 'recycle' the app pool, thus loosing all pending async updates.
I think updating the database in the begin session and end session will do the job. that will reduce the count of statements dramatically.
I think it makes no difference if you track hits or begin/end session. with hits you'll also need additional logic to subtract inactive users
EDIT: session end is not fired always. I would suggest to call an update statement/stored procedure in another session begin event (in addition to the other insert statement) that will fix invalid sessions.
I don't think that calling this "fix routine" is necessary in every page load event because I think you cant exactly count "current no. of visitors".
I would keep this in Application state instead - if possible. On ApplicationStart create some data structure saved to App state that you can update from anywhere in your application - session start, page load, wherever. Keep it out of the database. You are just using it to track "currently online" info anyway it sounds like.
If you have multiple instances of your app, or if there is a requirement to maintain historical info beyond the IIS logs, this won't work obviously. Go with chris' fire-and-forget solution in that case.
What's wrong with IIS Logs?
2009-05-01 12:30:31 207.219.27.35 GET /assocadmin/ibb-reg.asp - usernameremoved 544.566.570.575 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+6.0;+SLCC1;+.NET+CLR+2.0.50727;+Media+Center+PC+5.0;+.NET+CLR+3.5.30729;+.NET+CLR+3.0.30618) 200 0 0 40058
EDIT: I'd like to close this answer, but I want the comments to stay. Consider this answer withdrawn.
How about adding a small object to the session?
Something like LoggedInUserFlag:IDisposable
In the constructor, increment your counter however you decide to implement it.
Then in the Dispose method, decrement the counter.
This way, regardless of how the session is ended, the counter will always be (eventually) decremented.
see:
http://weblogs.asp.net/cnagel/archive/2005/01/23/359037.aspx
for info on using IDisposable.
I am not an ASP guy at all, but what about rather than logging all that other info, and insert their IP address?
If they have an IP address already in there, have a last_seen timestamp, and on each refresh just delete any row that isn't 10 minutes ago?
This is how I would take a shot at it. It is much more space efficient, but I am not sure about the checking and deleting so much on such a high profile site.
As a direct answer to your question, yes, running a database query in-line with every request is a bad idea:
Synchronous requests will tie up a thread, which will reduce your scalability (fewer simultaneous activities)
DB inserts (or updates) are writes to the DB, which will put a load on your log volume
DB accesses shouldn't be required in a single server / single AppPool scenario
I answered your question about how to count users in the other thread:
Best way to keep track of current online users
If you are operating in a multi-server / load-balanced environment, then DB accesses may in fact be required. In that case:
Queue them to a background thread so the foreground request thread doesn't have to wait
Use Resource Governor in SQL 2008 to reduce contention with other DB accesses
Collect several updates / inserts together into a single batch, in a single transaction, to minimize log disk I/O pressure
Return the current count with each DB access, to minimize round-trips
In case it's of any interest, I cover sync/async threading issues and the techniques above in detail in my book, along with code examples: Ultra-Fast ASP.NET.
I have a form with a list that shows information from a database. I want the list the update in run time (or almost real time) every time something changes in the database. These are the three ways I can think of to accomplish this:
Set up a timer on the client to check every few seconds: I know how to do this now, but it would involve making and closing a new connection to the database hundreds of times an hour, regardless of whether there was any change
Build something sort of like a TCP/IP chat server, and every time a program updates the database it would also send a message to the TCP/IP server, which in turn would send a message to the client's form: I have no idea how to do this right now
Create a web service that returns the date and time of when the last time the table was changed, and the client would compare that time to the last time the client updated: I could figure out how to build a web service, but I don't how to do this without making a connection to the database anyway
The second option doesn't seem like it would be very reliable, and the first seems like it would consume more resources than necessary. Is there some way to tell the client every time there is a change in the database without making a connection every few seconds, or is it not that big of a deal to make that many connections to a database?
I would imagine connection pooling would make this a non-issue. Depending on your database, it probably won't even notice it.
Are you making the update to the database? Or is the update happening from an external source?
Generally, hundreds of updates per hour won't even bother the DB. Even Access, which is pretty slow, won't cause a performance issue.
Here's a rough idea if you really want to optimize it and you're doing the data updates. Store an application variable on the server side called, say, LastUpdateTime. When you make updates to the database, you can update the LastUpdateTime variable with the current time. Since LastUpdateTime is a very lightweight object in server memory, your clients can technically request the last update time hundreds if not thousands of times per second without any round trip to the database. Based on the last time the client retrieved new information vs. the last update time on the server, you can then go fetch the updated info.
We have a similar question Polling database for updates from C# application. Another idea (may be not a proper solution) would be to use Microsoft Sync Framework. You can use a timer to sync the DB.