BizTalk MessageBox > 60GB

BizTalk MessageBox > 60GB - biztalk

I have a BizTalk 2010 Server up and running for a few months.
Recently we noticed that the MessageBox grows and throttling kicked in.
In production, the MessageBox is > 60 GB and the trackingdata view returns 1.5m records.
In our acceptance environment, the trackingdata view returns about 300k records.
We missed to create a dedicated tracking host in the first place, but managed to create one in acceptance last week. The dedicated tracking host has not changed anything in our acceptance environment, therefore I have not yet created one in production.
All jobs are enabled and run continuously without an error.
In acceptance, I do not have any running/suspended messages.
I also can't find any exceptions in the event log.
I'm looking forward to any hint to improve the setup and reduce the messagebox size.
Thanks & best regards
Michael

Have you checked to see the sql server jobs that purge records from the database are running?
Look at the sql server agent job named 'MessageBox_Message_Cleanup_BizTalkMsgBoxDb' (there are others there as well). They normally run periodically to clean things up. You might run the job manually and see if there are errors and check the job activity monitor.

Related

Quartz.net Jobs after IIS Reset or server shutdown

I have used Quartz.Net to schedule some jobs in my ASP.net Application.
I am using default JobStore I think it is RAMJobStore but I am not sure exactly I only know I didn't config anything about Store.
Well the problem is that I want to store all jobs in DB so that I can run remaining jobs again when server resets or shutdown and start again .
I know I can store Jobs to DB using AdoJobStore but I want to know If my application server restarts and application not running in IIS how those remaining jobs can be triggered again ?
Is there any tutorial about this ? does anyone had some experience about triggering stored jobs after server reset ? any help would be appreciated

A. You will use AdoJobStore (as mentioned in comments).
B. You need to understand and tweak the "misfire" configuration. There is not a simple "yes" or "no" answer to this question.
http://royontechnology.blogspot.com/2009/03/so-you-think-you-understand.html
Since quartz.net is a java port over, you can also find helpful information here
http://www.nurkiewicz.com/2012/04/quartz-scheduler-misfire-instructions.html
Here is a sample paragraph (in case that link dies in the future at some point)
Sometimes Quartz is not capable of running your job at the time when
you desired. There are three reasons for that:
(1) all worker threads were
busy running other jobs (probably with higher priority)
(2) the scheduler
itself was down
(3) the job was scheduled with start time in the past
(probably a coding error)
The second one "(2)" (in the above quote) is your scenario.
The then goes on to explain scenarios and options.
Things to internet-search if any of the articles disappear in the future.
java'ish
org.quartz.jobStore.misfireThreshold
c#
"So you think you understand misfireThreshold"

BizTalk 2013 R2 WCF-SQL adapter having random issues

Ahoy,
We have two BizTalk applcations in BizTalk 2013 R2 that seem to be having random issues. Both applications follow the same process.
Pull data from a WCF endpoint.
Delete data from a database via a stored procedure.
Insert the new data that was pulled via WCF-SQL call.
Both applications worked great during our testing for quite a while. But, over time, we've had a few issues crop up with the insert via the WCF-SQL call.
A fatal error occurred while reading the input stream from the network. The session will be terminated (input error: 64, output error: 0).
This error showed up in the Sql Server logs. We had this one for about a day and then it just went away. Everything else continued to work fine on that target sql server. It was only BizTalk that had issues.
Our latest error is where the request to the WCF-SQL insert happens ( the data is actually inserted ), but there never is a response. So, the Send Port continues to try and send for it's retries and the Orchestration just dehydrates.
We tinkered with every setting throughout the application to try and solve this, but only a delete of the application and a redeploy fixed this ( for now at least ).
So, I guess my question is whether or not anyone else has had these sorts of issues with BizTalk having "random" errors like this where it'll work great and then go downhill like we've seen?
I'd really prefer to have something stable that is minimal maintenance. This is an enterprise product after all.

I've issues similar to this happen when moving between environments where there were data differences, e.g. a column full of NULLs in QA and a column full of actual data in PROD. There are a few things you can try.
Use SQL Sever Profiler to capture the RPC call coming from BizTalk, and try running it directly on the SQL Server BizTalk is calling remotely (wrap it in a transaction you roll back at the end if this is production). Does it take longer than expected to run? Debug the procedure to find the pain points and optimize if possible. I've written a blog about how to do this here: http://blog.tallan.com/2015/01/09/capturing-and-debugging-a-sql-stored-procedure-call-from-biztalk/
Up the timeout settings in the binding configuration for the send port to ensure that it is not timing out before SQL can finish doing its work.
Up the System.Transactions timeout in Machine.config to ensure that MSDTC isn't causing issues: http://blogs.msdn.com/b/madhuponduru/archive/2005/12/16/how-to-change-system-transactions-timeout.aspx and http://blog.brandt-lassen.dk/2012/11/overriding-default-10-minutes.html
If possible, do a data compare between the TEST/QA and PROD databases. Look for significant differences, especially in columns that you are using in JOIN conditions and WHERE clauses.

How to prevent proxy timeouts with SQL Server Reporting Services

We have a system running Windows Server 2008R2 x64 and SQL Server 2008R2 x64 with SSRS installed/configured. This is a shared reporting server used by a large number of people, with some fairly large inefficient databases (400-500gb of data ish), and these users use the system to generate ad-hoc reports based of a reporting model that sits on top of the aforementioned databases. Note that the users are using NTLM to logon and identify for running reports.
Most reports are quick, but if you are running a report for 1 or 2 years worth of data, they can take a while to return (5minutes ish). This is fine for most users, however some of the users are stuck behind a proxy, which has a connection timeout set at 2minutes. As SSRS 2008R2 does not seem to send back a "keep-alive" signal (confirmed via wireshark), when running one of these long reports the proxy server thinks the connection has died, and as such it just gives up and kills the connection. This gives the user a 401 or 503 error and obviously cancels the report (the incorrect error is a known bug in SSRS which Microsoft refuse to fix).
We're getting a lot of flak from the user's about this, even though it's not really our issue..so I am looking for a creative solution.
So far I have come up with:
1) Discovering some as yet unknown setting for SSRS that can make it keep the connection alive.
2) installing our own proxy in between the users and our reports server, which WILL send a keep-alive back (not sure this will work and it's a bit hacky, just thinking creatively!)
3) re-writing our reports databases to be more efficient (yes this is the best solution, but also incredibly expensive)
4) ask the experts :) :)
We have a call booked in with Microsoft Support to see if they can help - but can any experts on Stack help out? I appreciate that this may be a better question for server fault (and I may post it there) but it's a development question too really :)
Thanks!

A few things:
A. For SSRS overall on it's service:
I personally use a keep alive service as I believe the default recycle is 12 hours for SSRS server. I use a tool someone turned me onto called 'VisualCron' that can do many task processes automatically. You can also just make a call in a WCF service or similar to. Basically I know the first report from a user for the day is generally slow. Usually you need to hit http:// (servername)/ReportServer to keep it alive.
B. For cachine report level items:
If this does not help I would suggest possibly caching DataSets when possible. Some people have data that is up to the moment but for a lot of people that is not the case. You may create a shared dataset in SSRS and then cache that on a schedule. So if you have domain like tables that only need to be updated once in a blue moon put them there. Same with data that is nightly or in batches. If you are transactional based shop that is up to the moment this may not help but for batch based businesses this can help tremendously.
You can also cache the reports for their data as a continuation of this. Under 'Manage' drop down for a report when in the /Reports landing page you can set the data to run under a specific schedule. You can also set a snapshot which is an extension of this as it executes with some default parameters set on a schedule and is a copy of the report when it was ran.
You are mentioning ASP.NET so I am not certain how much some of this will work if you are doing this all through a site you are setting up internally as a pass through. But you could email or save files on a schedule as well through SSRS's subscription service.
C. Change how you store your data for reporting.
You can create a Report Warehouse of select item level values of queries. Create a small database that is just a few recent years of data and only certain fields and certain tables. Then index it to death and report off of that. In my experience this method will fly in terms of performance but it does take the extra overhead of setting it up. Generally most companies will whine about this but it often takes a single day to set up and then you create one SSMS job that does it all nightly or an SSIS package then you don't worry about it. I like this method as I know my data is not being reported off of production and is isolated personally.

ORACLE/ASP.NET: ORA-2020 - Too many database links... what's causing this?

Here's the scenario...
We have an internal website that is running the latest version of the ODAC (Oracle Client). It opens database connections, runs a stored procedure or packaged method, then disconnects. Connection pooling is turned on, and we are currently under version 11g in both our development and test environments, but under 10gR2 in our production environment. This happens on Production.
A few days ago, a process began firing off a ORA-2020 error. The process is called from a webpage on our internal website. The user simply sets a date, hits a button, and a job is started on another system that is separate from the website. The call itself, however, uses a database link to run a function.
We've scoured the SQL to find that it only uses that one database link. And since these links are on a per session basis and the user isn't exceeding the default limit of 4, how is it possible that we are getting a ORA-2020 error.
We have ran a number of tests to try to push over the default limit of 4. ODAC, from what I recall, runs a commit after each connection, and I can't seem to run 4 DB links, then run a piece of SQL with 1 DB link directly after with any errors. The only way I can bring up this error is if I run a query with 4 DB links, then a function or piece of dynamic SQL with a database link within it. We don't have that problem as this issue is sporadic. It isn't ALWAYS happening.
Questions
Is it possible that connection pooling is allowing User B to use User A's connection after the initial process was run, thus adding to the open links number if User B runs a SQL statement with more database links?
Is this a scenario where we should up our limit past 4? What are the disadvantages of increasing the number?
Do I need to explicitly close open database links before disconnecting from the database? Oracle documentation seems to suggest it should automatically happen, but "on occasion"... doesn't.

Firstly, the simple solution: I'd double check that in the production database the number of default links is actually 4.
select *
from v$system_parameter
where name = 'OPEN_LINKS'
Assuming you're not going to get off that lightly:
Is it possible that connection pooling is allowing User B to use User
A's connection after the initial process was run, thus adding to the
open links number if User B runs a SQL statement with more database
links?
You say that you explicitly close the session, which, according to the documentation, should mean that all links associated with that session are closed. Other than that I confess complete ignorance on this point.
Is this a scenario where we should up our limit past 4? What are the
disadvantages of increasing the number?
There aren't any disadvantages that I can think of. Tom Kyte suggests, albeit a long time ago, that each open database link uses 500k of PGA memory. If you don't have any then this will obviously cause a problem but it should be more than fine for most situations.
There are, however, unintended consequences: Imagine that you up this number to 100. Somebody codes something that continually opens links and draws a lot of data through all them select * from my_massive_table or similar. Instead of 4 sessions doing this you have 100, which is attempting to transfer hundreds of gigabytes simultaneously. Your network dies under the strain...
There's probably more but you get the picture.
Do I need to explicitly close open database links before disconnecting
from the database? Oracle documentation seems to suggest it should
automatically happen, but "on occasion"... doesn't.
As you've noted the best answer is "probably not", which isn't much help. You don't mention exactly how you're terminating the session but if you're killing it rather than closing gracefully then definitely.
Using a database link spawns a child process on the remote server. Because your server is no longer in absolute charge of this process there's a myriad of things that could cause it to become orphaned or otherwise not close on termination of the parent process. By no means does this happen the whole time but it can and does.
I would do two things.
In your process, if an exception is encountered, e-mail the results of the following query to yourself.
select *
from v$dblink
As a minimum at least you will know what database links are open in the session and give you some way of tracing them.
Follow the documentations advice; specifically the following:
"You may have occasion to close the link manually. For example, close
links when:
The network connection established by a link is used infrequently in an application.
The user session must be terminated."
The first seems to exactly fit your situation. Unless your process is time-sensitive, which doesn't seem to be the case, then what have you got to lose? The syntax is:
alter session close database link <linkname>

We ended up increasing the link amount, but we never did find the root cause.

Queuing using the Database or MSMQ?

A part of the application I'm working on is an swf that shows a test with some 80 questions. Each question is saved in SQL Server through WebORB and ASP.NET.
If a candidate finishes the test, the session needs to be validated. The problem is that sometimes 350 candidates finish their test at the same moment, and the CPU on the web server and SQL Server explodes (350 validations concurrently).
Now, how should I implement queuing here? In the database, there's a table that has a record for each session. One column holds the status. 1 is finished, 2 is validated.
I could implement queuing in two ways (as I see it, maybe you have other propositions):
A process that checks the table for records with status 1. If it finds one, it validates the session. So, sessions are validated one after one.
If a candidate finishes its session, a message is sent to a MSMQ queue. Another process listens to the queue and validates sessions one after one.
Now:
What would be the best approach?
Where do you start the process that will validate sessions? In your global.asax (application_start)? As a windows service? As an exe on the root of the website that is started in application_start?
To me, using the table and looking for records with status 1 seems the easiest way.

The MSMQ approach decouples your web-facing application from the validation logic service and the database.
This brings many advantages, a few of which:
It would be easier to handle situations where the validation logic can handle 5 sessions per second, and it receives 300 all at once. Otherwise you would have to handle copmlicated timeouts, re-attempts, etc.
It would be easier to do maintanance on the validation service, without having to interrupt the rest of the application. When the validation service is brought down, messages would queue up in MSMQ, and would get processed again as soon as it is brought up.
The same as above applies for database maintanance.

If you don't have experience using MSMQ and no infrastructrure set up, I would advice against it. Sure, it might be the "proper" way of doing queueing on the Microsoft platform, but it is not very straight-forward and has quite a learning curve.
The same goes for creating a Windows Service; don't do it unless you are familiar with it. For simple cases such as this I would argue that the pain is greater than the rewards.
The simplest solution would probably be to use the table and run the process on a background thread that you start up in global.asax. You probably also want to create an admin page that can report some status information about the process (number of pending jobs etc) and maybe a button to restart the process if it for some reason fails.

What is validating? Before working on your queuing strategy, I would try to make validating as fast as possible, including making it set based if it isn't already so.

I have recently been investigating this myself so wanted to mention my findings. The location of the Database in comparison to your application is a big factor on deciding which option is faster.
I tested inserting the time it took to insert 100 database entries versus logging the exact same data into a local MSMQ message. I then took the average of the results of performing this test several times.
What I found was that when the database is on the local network, inserting a row was up to 4 times faster than logging to an MSMQ.
When the database was being accessed over a decent internet connection, inserting a row into the database was up to 6 times slower than logging to an MSMQ.
So:
Local database - DB is faster, otherwise MSMQ is.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex