Asynchronous logging of server-side activities - asynchronous

I have a web server that on every request produces a log that should be persistently saved in a DB.
Since the request rate is too high, I can't make DB operations on every request.
I thought to do the following thing:
Any request to a web server produces a log.
This log is placed somewhere in a place where it can be stored in a fast way (redis?)
Another service (a cron job?) periodically flushes the data from that place, removes duplicates (yes, there can be duplicates that don't need to be stored in the DB) and makes a single MySQL query to save the data permanently.
What would be the most efficient way to achieve this thing?

Normally you would use a common logging library (e.g. log4j) which will manage safely writing your log statements to the file. However you should note that log verbosity can still impact application performance.
After the file is on disk, you can do whatever you like with it - it would be completely normal to ingest that file into Splunk for further processing and ad-hoc searches/alerting.

If you want to have better performance on this operation, you should send you'r logs to some kind of queue and with a service reading that queue, send all the logs to the database.
I found some information about queue's in mysql.
MySQL Insert Statement Queue
regards.

Related

How to implement synchronized Memcached with database

AFAIK, Memcached does not support synchronization with database (at least SQL Server and Oracle). We are planning to use Memcached (it is free) with our OLTP database.
In some business processes we do some heavy validations which requires lot of data from database, we can not keep static copy of these data as we don't know whether the data has been modified so we fetch the data every time which slows the process down.
One possible solution could be
Write triggers on database to create/update prefixed-postfixed (table-PK1-PK2-PK3-column) files on change of records
Monitor this change of file using FileSystemWatcher and expire the key (table-PK1-PK2-PK3-column) to get updated data
Problem: There would be around 100,000 users using any combination of data for 10 hours. So we will end up having a lot of files e.g. categ1-subcateg5-subcateg-78-data100, categ1-subcateg5-subcateg-78-data250, categ2-subcateg5-subcateg-78-data100, categ1-subcateg5-subcateg-33-data100, etc.
I am expecting 5 million files at least. Now it looks a pathetic solution :(
Other possibilities are
call a web service asynchronously from the trigger passing the key
to be expired
call an exe from trigger without waiting it to finish and then this
exe would expire the key. (I have got some success with this approach on SQL Server using xp_cmdsell to call an exe, calling an exe from oracle's trigger looks a bit difficult)
Still sounds pathetic, isn't it?
Any intelligent suggestions please
It's not clear (to me) if the use of Memcached is mandatory or not. I would personally avoid it and use instead SqlDependency and OracleDependency. The two both allow to pass a db command and get notified when the data that the command would return changes.
If Memcached is mandatory you can still use this two classes to trigger the invalidation.
MS SQL Server has "Change Tracking" features that maybe be of use to you. You enable the database for change tracking and configure which tables you wish to track. SQL Server then creates change records on every update, insert, delete on a table and then lets you query for changes to records that have been made since the last time you checked. This is very useful for syncing changes and is more efficient than using triggers. It's also easier to manage than making your own tracking tables. This has been a feature since SQL Server 2005.
How to: Use SQL Server Change Tracking
Change tracking only captures the primary keys of the tables and let's you query which fields might have been modified. Then you can query the tables join on those keys to get the current data. If you want it to capture the data also you can use Change Capture, but it requires more overhead and at least SQL Server 2008 enterprise edition.
Change Data Capture
I have no experience with Oracle, but i believe it may also have some tracking functionality as well. This article might get you started:
20 Using Oracle Streams to Record Table Changes

sqlite setup for immediate writes but can't corrupt old data

I am trying to figure out how to best configure sqlite3. I need writes to be very fast but I can't risk the entire database getting corrupt in the event of a power failure. I don't care if the last write or last few writes are lost in the event of a power failure. I just don't want all the data to be lost. What would be the best settings to use to achieve this?
What you are looking for is the Write ahead log, or WAL journalling mode. Otherwise, there is also the asynchronous I/O module. You will find information about it here: An Asynchronous I/O Module For SQLite.
It saves writes to a queue which is dispatched to the filesystem in a background thread. The transactional guarantees still apply so as long as your transactions are composed correctly, there's no danger of corrupting the database.

long running http process - how to put in separate process?

I know that similar questions have been asked all over the place, but I'm having trouble finding one that relates directly to what I'm after.
I have a website where a user uploads a data file, then that file is transformed and imported into SQL. The file could be up to 50mb in size, and some times this process can take 30 minutes or sometimes even longer.
I realise I need to palm off the actual work to another process, and poll that process on the web page. I'm wondering what the best approach would be though? Being a web developer by trade, I'm finding all this new Windows Service stuff a bit confusing, and I just wanted somewhere to start.
So:
Can I do / should I being doing this with a windows service? if so, how?
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
clarifications
The data is imported into sql, there's no file distribution taking place.
If there is a failure, it absolutely MUST be reported to the user. The web page will poll every, lets say, 5 seconds, from the time the async task begins, to get the 'status' of the import. Once it's finished another response will tell the page to stop polling for status updates.
queries on final decision
ok, so as I thought, it seems that a windows service is the best idea. So as to HOW to get it to work, it seems the 'put the file there and wait for the service to pick it up' idea is the generally accepted way, is there a way I can start a process run by the service, without it having to constantly be checking a database table / folder? As I said earlier, I don't have any experience with Windows Services - I wondered if I put a public method in the service, can I call it somehow?
well ...
var thread = new Thread(() => {
// your action
});
thread.Start();
but you will have problems with that:
what if the import to sql fails? should there be any response to the client
if it fails, how do you ensure the file on a later request
what if the applications shuts down ... this newly created and started thread will be killed either
...
it's not always a good idea to store everything in sql (especially files...). if you want to make the file available to several servers why not distribute them via ftp ...?
i believe that your whole concept is a bit messed up (sry assuming this), and it might be helpful if you elaborate and give us more information about your intentions!
edit:
Can I do / should I being doing this
with a windows service? if so, how?
you can :) i advise you to create a simple console-program and convert this with srvany and sc. you can get a rough overview howto here (note: insert blanks after =... that's a silly pitfall)
the term should is relative, because you did not answer the most important question
what if a record is persisted to the database, telling a consumer that file test.img should be persisted, but your service hasn't captured it or did not transform it yet?
so ... next on
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you probably could create a WCF-service which recieves some binary-data and then stores this to a database. this request could be async. yes. but what for?
once again:
please give us more insight to your workflow: what are you exactly trying to achieve? which "environmental-conditions" to you have (eg. app A polls db and expects file-records which are referenced in table x to be persisted) ...
edit:
so you want to import a .csv-file. well that changes everything :)
but i won't advise you to use a wcf-service (there could be a usage: eg. a wcf-service which has a method to insert a single row, then your iteration through the file would be implemented in another app... not that good, though).
i would suggest following:
at first do everything in your webapp (as you've already done), but rather use some sort of bulk-insert and do your transformation/logic on the database.
if you have some sort of bottle-neck then, i would suggest you something like a minor job-service, eg:
webapp will upload the file and insert a row to a job-table. the job-service is continiously polling the table/or gets informed via wcf by the webapp (hey, hey, finally some sort of usage for WCF in your scenario... :) ) and then does the import-job, writing a finish-note to a table/or set the state of the job to finished ...
but this is a bit overkill :)
Please see if my below comments helps you to resolve your issue:
•Can I do / should I being doing this with a windows service? if so, how?
Yes you can do this with a windows service. And I think that is the way you should be doing it. You can implement your own service to process your request or you can use the open source code Job Proccessor
Basically the idea is..
You submit a request for processing
the csv file in database table with
some status as not started.
Then your windows service picks up
the request from database table which
are not started and update them as in
progress status.
Once the processing is complete
succesfully /unsuccesfuly your
service updated the database table
with status as Completed / Failed.
And your asp.net page can poll to
database table for the current status
every 5 sec or so.
•Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you should not be using WCF for this purpose.

Sending out 20,000+ emails with asp.net

I am writing an application that will need to send a massive amount of emails to our students who will be selected from our database (each email will be personalized to the extent that will include their name, course of study etc...so needs to be sent one at a time).
I could do this looping over an SmtpClient, but I'm afraid that with the numbers I'm trying to send, I'll ultimately run into timeout issues or my thread being killed because of lack of machine resources.
At this point I'm just looking for suggestions of a better way to handle this, or if looping over SmtpClient is an ok solution, how I should go about handling it to prevent what I posted above.
Would a web service be a better alternative?
Please advise,
TIA
My suggestion would be to batch this out. Do not try to run ASP.NET code to create 20K emails and send them out as you are bound to runinto timeout and performance issues. Instead populate a table with recipient, subject, body and begin a batch process from a windows service. This way your code is executed in a way where lag can be managed, instead of having a web page wait for the request to return.
Ok, first - this is hardly massive, I have been handling 50.000 emails+
Let the emails be written to a FOLDER - directory. Then use the local SMTP service you can install, pointing it at the folder as pickup directory. This will at least make sure you have a decent buffer in between (i.e. you can finish the asp.net side, while the emails are not sent at that point).
What about using Database Mail tool provided by SQL Server itself. You can still use asp.net but Email will be sent by SQL Server and all you have to do call the stored procedure and leave things to SQL Server.
I wouldn't recommend batch option since there won't be any optimization or queue thing unless you have to put a lot of effort to do so.
Database Mail tool provides queue and priority options as well. If a problem occurs, the e-mail will be queued again and sent later but the other ones will be still being sent.
It is called Database Mail in SQL Server 2008 and SQL Mail in previous versions.
For more information, please check the links below :
http://blog.sqlauthority.com/2008/08/23/sql-server-2008-configure-database-mail-send-email-from-sql-database/
http://msdn.microsoft.com/en-us/library/ms175887.aspx
You could use javascript on a timer to request the script which sends mail in small chunks that wont time out. Then you'd call the script from the browser. Add a progress bar, authentication, etc.
You don't need a web service, but rather a way to batch the emails into a queue (perhaps an "Email_Queue" table in the DB). Then a Windows service running on a server could read through the queue in a first-in/first-out basis at reasonable chunks at a time being sent to the SMTPClient, removing items from the queue as they are processed. You would probably need to run some measurements to determine what the chunk size and delay would be for your mail server.

Queuing using the Database or MSMQ?

A part of the application I'm working on is an swf that shows a test with some 80 questions. Each question is saved in SQL Server through WebORB and ASP.NET.
If a candidate finishes the test, the session needs to be validated. The problem is that sometimes 350 candidates finish their test at the same moment, and the CPU on the web server and SQL Server explodes (350 validations concurrently).
Now, how should I implement queuing here? In the database, there's a table that has a record for each session. One column holds the status. 1 is finished, 2 is validated.
I could implement queuing in two ways (as I see it, maybe you have other propositions):
A process that checks the table for records with status 1. If it finds one, it validates the session. So, sessions are validated one after one.
If a candidate finishes its session, a message is sent to a MSMQ queue. Another process listens to the queue and validates sessions one after one.
Now:
What would be the best approach?
Where do you start the process that will validate sessions? In your global.asax (application_start)? As a windows service? As an exe on the root of the website that is started in application_start?
To me, using the table and looking for records with status 1 seems the easiest way.
The MSMQ approach decouples your web-facing application from the validation logic service and the database.
This brings many advantages, a few of which:
It would be easier to handle situations where the validation logic can handle 5 sessions per second, and it receives 300 all at once. Otherwise you would have to handle copmlicated timeouts, re-attempts, etc.
It would be easier to do maintanance on the validation service, without having to interrupt the rest of the application. When the validation service is brought down, messages would queue up in MSMQ, and would get processed again as soon as it is brought up.
The same as above applies for database maintanance.
If you don't have experience using MSMQ and no infrastructrure set up, I would advice against it. Sure, it might be the "proper" way of doing queueing on the Microsoft platform, but it is not very straight-forward and has quite a learning curve.
The same goes for creating a Windows Service; don't do it unless you are familiar with it. For simple cases such as this I would argue that the pain is greater than the rewards.
The simplest solution would probably be to use the table and run the process on a background thread that you start up in global.asax. You probably also want to create an admin page that can report some status information about the process (number of pending jobs etc) and maybe a button to restart the process if it for some reason fails.
What is validating? Before working on your queuing strategy, I would try to make validating as fast as possible, including making it set based if it isn't already so.
I have recently been investigating this myself so wanted to mention my findings. The location of the Database in comparison to your application is a big factor on deciding which option is faster.
I tested inserting the time it took to insert 100 database entries versus logging the exact same data into a local MSMQ message. I then took the average of the results of performing this test several times.
What I found was that when the database is on the local network, inserting a row was up to 4 times faster than logging to an MSMQ.
When the database was being accessed over a decent internet connection, inserting a row into the database was up to 6 times slower than logging to an MSMQ.
So:
Local database - DB is faster, otherwise MSMQ is.

Resources