Exporting Large Amounts of Data from SQL Server using ASP.Net - asp.net

At the company I work for, we are building a data warehousing application which will be a web based front end for a lot of queries that we run. [Using ASP.net]
Some of these queries will bring in over a million records (and by year end maybe around 2 million records) -
Most will be thousands, etc.
What is a good way to architect the application in such a way that you can browse to the query that you want, export it and have a CSV file generated of the data requested -
I was thinking a web based interface that calls BCP to generate a file, and shows you when the report has been created so that it can be downloaded, and expires within 24 hours of being created -
Any Ideas?
Sas

I architected something like this in a former life. Essentially a C# command-line app that ran queries against our reporting server.
Some key points:
It didn't matter how long they took because it was in the background - when a file was ready, it would show up in the UI and the user could download it. They weren't waiting in real time.
It didn't matter how inefficient the query was - both because of the point above, and because the reports were geared to the previous day. We ran them off of a reporting copy of production, not against production, which was kept on a delay via log shipping.
We didn't set any expiration on the files because the user could request a report on a Friday and not expect to look at it until Monday. We'd rather have a file sitting around on the disk than run the report again (file server disk space is relatively cheap). We let them delete reports once they were done with them, and they would do so on their own to prevent clutter in the UI.
We used C# and a DataReader rather than bulk export methods because of various requirements for the data. We needed to provide the flexibility to include column headers or not, to quote data or not, to deal with thousand separators and decimal points supporting various cultures, to apply different line endings (\r\n, \n, \r), different extensions, Unicode / non-Unicode file formats, and different delimiters (comma, tab, pipe, etc). There were also different requirements for different customers and even for different reports - some wanted an e-mail notification as soon as a report was finished, others wanted a copy of the report sent to some FTP server, etc.
There are probably a lot of requirements you haven't thought of yet but I hope that gives you a start.

Related

Do I need to create a new SQLite database every time an application is updated?

I have a Xamarin Forms application I would like to develop. It will have a SQLite database and I wish to make this available on iOS and Android. The database will be populated with data from a SQL Server database on the cloud with initial seed data. I'm thinking this will be about 500 rows of data with each row about 1Kb.
What I don't understand is when and how to populate this. Should I try to put the data into a CSV file and have this populate the database when the application is installed, or when it first starts? What's the normal way to populate seed data other than lines inside of the code with a huge number of insert statements.
Any help or advice on how this is normally done (I'm thinking most people do it the same way) would be much appreciated.
Thanks
Lets break the problem down.
Is the initial data that you wish to use in your app going to change over time?
If you include any pre-populated data (a SQLite, Realm, or CSV-based file, ...) and the data that you are including goes stale and you have to update it on a routine basis, you will need to publish an application update (.apk/.ipa) so your new user installs receive the updated data (more on this below).
Note: This assumes that your current users get the updated data via actually running your app and it is handling the local data updates on routine basis (background service, push notifications, data polling, etc..)
Is this a Line of Business (LoB) application published via Ad-Hoc, private Store, and/or iOS Enterprise publishing?
If you control the user base, than having to force an update install so your users get your new/updated pre-populated data might be an acceptable approach, but not a great user experience if they forced to update the application all the time... but it works...
Is this application going to be distributed via the public Apple and Google App Stores?
This is where you need to be very careful on what pre-populated data you include within your application.
If the data goes stale and you need to push an updated app version to the Stores for your new install installs, beware that it could be days (or weeks or even month+) to get that new app into the store.
The Play Store usually is less then 24 hours on publishing app updates, and while the Apple Store can be the same, do not bet on it.
We routinely see 48-72 hour delays and randomly get rejected and thus it can take a week or more to get an update app into the Apple Store. We have had rejections delaying an app update for over a month and have gone into the appeal process and even removed already existing features to get re-published
Note: Every app update to the Apple store resets your user reviews... :-(
Bottom line: You want to want to publish to the Stores when you are bug fixing and/or adding features, not to update some "static" data that is stored within your app bundle...
What does this data cost your end-user and you?
Negative costs to you as an app developer are bad reviews and uninstalls. Look at how this "data" effects the end-users access to your application and how they react. Longer download time, usually acceptable. Longer initial app startup times, less acceptable... etc....
What markets will your app be used in? Network speeds and the cost of data transfer in many markets across the world are slow and costly...
What really is the true size of the data?
I "pre-populate" a Realm data instance with thousand of rows with 5MB of JSON data in under a second. SQLite takes longer, but it is still not bad. The data itself is stored in a zip and accessed as a static file (https-based get) and at a 80% compression factor, the 1MB of compressed data is pulled from a server (AWS S3) in under one second using LTE cellular data speeds and uncompressing it as stream while deserializing the JSON on-fly to update the Realm instance adds another second...
So, the user impact is very small and I "hide" this initial pre-populate update via a first-time welcome screen and some text that the user hopefully reads before getting to the first "real" app screen...
Note: This does assume that the user will have network data access the first time they open the app... In many markets around the world, this is not true, so factor this into your app design.
I also architect the app so its data can be update on background threads during its launch (the initial one or not) and thus the user does not stand there watching a spinning busy indictor, they can at least interact with the data that they do have.
So should you include any pre-populated data in your app bundle?
Sure, when that data is absolutely required to get the user up and running as fast as possible to enhance the user experience. Games are a great example of this in bundling 100s of megabytes or even gigabytes via .obb... with the various levels, media files, etc... into the app so the user does not experience a 10+ min. wait time upon opening the app the first time.
Now this does mean that their initial download time for the install was longer as that data was bundled within the app, the overall user experience was better as users accept the download/install times and view that as a carrier/phone/service plan issue vs. the time to open your app the first time to actually get to a functional screen.
So what do?
Personally I look at this issue on a case by case basis. I look at the data and if it is not going to change and only get added to and possibly pruned over time, include it as a pre-populated SQLite or Realm store or... Why cause the user to wait for the web requests, database updates and the additional network data usage and associated costs. If the data is going to go stale, do not bundle it in your app.
As for the mechanics of installing pre-populated data:
See my answer on this SO Question about "Bundle prebuilt Realm files"
You don't have to create your sqlite database every time the app is updated.
Actually SQLiteOpenHelper provides the following two methods:
OnCreate() : you should implement this method and create your sqlite database with populated data from the server. It is called when you the app is started for the first time.
OnUpgrade(): you should implement this method if you want to modify the database (add a new table or column in a table) or populate additional data.
The database is preserved between app updates and you don't need to create it each time.
Check the following examples which explain how to use sqlite database with Xamarin:
Using Sqlite in a Xamarin.Android Application Developed using Visual Studio
and
An Introduction to Xamarin.Forms and SQLite

Caching of data in a text file — Better options

I am working on an application at the moment that is using as a caching strategy the reading and writing of data to text files in a read/write directory within the application.
My gut reaction is that this is sooooo wrong.
I am of the opinion that these values should be stored in the ASP.NET Cache or another dedicated in-memory cache such as Redis or something similar.
Can you provide any data to back up my belief that writing to and reading from text files as a form of cache on the webserver is the wrong thing to do? Or provide any data to prove me wrong and show that this is the correct thing to do?
What other options would you provide to implement this caching?
EDIT:
In one example, a complex search is performed based on a keyword. The result from this search is a list of Guids. This is then turned into a concatenated, comma-delimited string, usually less than 100,000 characters. This is then written to a file using that keyword as its name so that other requests using this keyword will not need to perform the complex search. There is an expiry - I think three days or something, but I don't think it needs to (or should) be that long
I would normally use the ASP.NET Server Cache to store this data.
I can think of four reasons:
Web servers are likely to have many concurrent requests. While you can write logic that manages file locking (mutexes, volatile objects), implementing that is a pain and requires abstraction (an interface) if you plan to be able to refactor it in the future--which you will want to do, because eventually the demand on the filesystem resource will be heavier than what can be addressed in a multithreaded context.
Speaking of which, unless you implement paging, you will be reading and writing the entire file every time you access it. That's slow. Even paging is slow compared to an in-memory operation. Compare what you think you can get out of the disks you're using with the Redis benchmarks from New Relic. Feel free to perform your own calculation based on the estimated size of the file and the number of threads waiting to write to it. You will never match an in-memory cache.
Moreover, as previously mentioned, asynchronous filesystem operations have to be managed while waiting for synchronous I/O operations to complete. Meanwhile, you will not have data consistent with the operations the web application executes unless you make the application wait. The only way I know of to fix that problem is to write to and read from a managed system that's fast enough to keep up with the requests coming in, so that the state of your cache will almost always reflect the latest changes.
Finally, since you are talking about a text file, and not a database, you will either be determining your own object notation for key-value pairs, or using some prefabricated format such as JSON or XML. Either way, it only takes one failed operation or one improperly formatted addition to render the entire text file unreadable. Then you either have the option of restoring from backup (assuming you implement version control...) and losing a ton of data, or throwing away the data and starting over. If the data isn't important to you anyway, then there's no reason to use the disk. If the point of keeping things on disk is to keep them around for posterity, you should be using a database. If having a relational database is less important than speed, you can use a NoSQL context such as MongoDB.
In short, by using the filesystem and text, you have to reinvent the wheel more times than anyone who isn't a complete masochist would enjoy.

Handle ActionResults as cachable, "static content" in ASP.NET MVC (4)

I have a couple of ActionMethods that returns content from the database that is not changing very often (eg.: a polygon list of available ZIP-Areas, returned as json; changes twice per year).
I know, there is the [OutputCache(...)] Attribute, but this has some disadvantages (a long time client-side caching is not good; if the server/iis/process gets restartet the server-side cache also stopps)
What i want is, that MVC stores the result in the file system, calculates the hash, and if the hash hasn't changed - it returns a HTTP Status Code 304 --> like it is done with images by default.
Does anybody know a solution for that?
I think it's a bad idea to try to cache data on the file system because:
It is not going to be much faster to read your data from file system than getting it from database, even if you have it already in the json format.
You are going to add a lot of logic to calculate and compare the hash. Also to read data from a file. It means new bugs, more complexity.
If I were you I would keep it as simple as possible. Store you data in the Application container. Yes, you will have to reload it every time the application starts but it should not be a problem at all as application is not supposed to be restarted often. Also consider using some distributed cache like App Fabric if you have a web farm in order not to come up with different data in the Application containers on different servers.
And one more important note. Caching means really fast access and you can't achieve it with file system or database storage this is a memory storage you should consider.

Im building a salesforce.com type of CRM - what is the right database architecture?

I have developed a CRM for my company. Next I would like to take that system and make it available for others to use in a hosted format. Very much like a salesforce.com. The question is what type of database structure would I use. I see two options:
Option 1. Each time a company signs up, I clone the master database for them.
The disadvantage of this is that I could end up with thousands of databases. Thats a lot of databases to backup every night. My CRM uses cron jobs for maintanance, those jobs would have to run on all databases.
Each time I upgrade the system with a new feature, and need to add a new column to the database, I will have to add that column to thousands of databases.
Option 2. Use only one database.
At the beginning of EVERY table add "CompanyID". In EVERY SQL statement add "and
companyid={companyid}".
The advantage of this method is the simplicity of only one database. Just one database to backup each night. Just one database to update when needed.
The disadvantage is what if I get 1000 companies signing up, and each wants to store data on 100,000 leads, that 100,000,000 rows in the lead table, which worries me.
Does anyone know how the online hosted CRMs like salesforce.com do this?
Thanks
Wouldn't you clone a table structure style to each new database id all sheets archived in master base indexed client clone is hash verified to access specific sheet run through a host machine at the front end of the master system. Then directing requests as primary role. Internal access is batched to read/write slave systems in groups. Obviously set raid configurations to copy real time and scheduled. Balance requests for load to speed of system resources. That way you separated the security flawed from ui and connection to the retention schema. it seems like simplified structures and reducing policy requests cut down request rss in the query processing. or Simply a man in the middle approach from inside out.
1) Splitting your data into smaller groups in a safe, unthinking way (such as one database per grouping) is almost always best if you want to scale. In this case, unless for some reason you want to query between companies, keeping them in separate databases is best.
2) If you are updating all of the databases by hand, you are doing something wrong if you want to scale. You'd want to automate the process.
3) Ultimately, salesforce.com uses this as the basis of their DB infrastructure:
http://blog.database.com/blog/2011/08/30/database-com-is-open-for-business/

xml parsing / querying performance question for asp.net

I have to port a smaller windows forms application (product configurator) to an asp.net app which will be used on a large company's website, demand should be moderate because it's for a specialized product line.
I don't have access to a database and using XML is a requirement from their web developers.
There are roughly 30 different products with roughly 300 different possible configurations stored in the xml files, and linked questions / answers that lead to a product recommendation. Also some production options. The app is available in 6 languages.
How would you solve the 'data access' layer, if you could call it this way? I thought of reading / deserializing the xml files into their objects and store them in asp.net's cache if they're not there already and then read from the cache on subsequent requests. But that would mean all objects live in the memory all day and night.
Is that even necessary, or smart, performance wise? As I said before, the app is not that big, the xml files not that large. Could I just create some Repository class that reads the xml files whenever an object is requested (ie. 'Product Details', or 'Next question') and returns it that way, and drive memory consumption down?
The whole approach seems to be sticking to a single server. First consider if this is appropriate as you mentioned a "large company's website", that sets a red flag for me. If you need the site to scale, you will end up having more than a single server, which prevents considering a simple local file.
If you are constrained to using that, analyze what data is more appropriate to keep in cache (does not change often, its long lived, the same info is requested different times). Try to keep the cached stuff separated from the non cached, which will reduce the amount of amount of info in the more dynamic files. If you expect big amounts of information, consider splitting the files with something appropriate to your domain.
I use Cache whenever I can. I cache objects upon their first request. If memory is of any concern, I set expiration policy. And whether it is or not, when short on memory, the framework will unload the cache anyway.
Since it is per application and not per user, it makes sense to have it, especially if the relative footprint is small.
If you have to expand to multiple servers later, you can access the same file over the network or modify DA layer to retrieve data by any other means (services, DB, etc). The caching code will stay the same and performance will be virtually unaffected.
If you set dependency, objects will always stay current.
I'm for it.
Using the cache, and setting an appropriate expiration policy as advised by others is a sound approach. I'd suggest you look at using LINQ to XML as the basis for your data access code as it is so much easier to use than traditional methods of querying XML. You can find a decent introduction here.

Resources