Save image url or save image file in sql database? - asp.net

We can save an image with 2 way
upload image in Server and save image url in Database.
save directly image into database
which one is better?

There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!

Like many questions, the ansewr is "it depends." Systems like SharePoint use option 2. Many ticket tracking systems (I know for sure Trac does this) use option 1.
Think also of any (potential) limitations. As your volume increases, are you going to be limited by the size of your database? This has particular relevance to hosted databases and applications where increasing the size of your database is much more expensive than increasing your storage allotment.

Saving the image to the server will work better for a website, given that these are incidental to your website, like per customer branding images - if you're setting up the next Flickr obviously the answer would be different :). You'd want to set up one server to act as a file server, share out the /uploaded_images directory (or whatever you name it), and set up an application variable defining the base url of uploaded images. Why is it better? Cost. File servers are dirt cheap commodity hardware. You can back up the file contents using dirt cheap commodity (even just consumer grade) backup software. And if your file server croaks and someone loses a day of uploaded images? Who cares. They just upload them again. Our database server is an enterprise cluster running on SSD SAN. Our backups and tran logs are shipped to remote sites over expensive bandwidth and maintained even on tape for x period. We use it for all the data where we need the ACID (atomicity, consistency, isolation, durability) benefits of a RDBMS. We don't use it for company logos.

Store them in the database unless you have a good reason not to.
Storing them in the filesystem is premature optimization.
With a database you get referential integrity, you can back everything up at once, integrated security, etc.
The book SQL Anti-Patterns calls storing files in the filesystem an anti-pattern.

Related

Plain JSON vs in-app DB for React-native?

I have 50-100mb dataset that users need to have access to. It's static, so doesn't make sense to host a server for it. There are two kinds of operations I'll perform on the data:
Reading objects by unique ObjectId. Each object is ~3kb.
Full text search through ~300.000 strings. Each string is 4-60 characters.
I'm considering to store data as JSON files. The 300k strings will be stored separately. I'll use https://github.com/nextapps-de/flexsearch or something similar to perform search over it. I've done something similar before with ~10mb dataset back in 2016. I used just regex search and it was working flawlessly.
Are there reasons to use RealmDB, SQLite, PouchDB or something else instead of just JSON?
I wish I did this question an year ago...
In the office I currently work we tried creating an app by using PouchDB and react native, we basically saw PouchDB as an advantage because it wouldn't require our API to send all data over and over again on every refresh triggered by the user, it would only send the data that changed based on the client's checkpoint. As the data in the server was quite heavy (around 6k entries with more than 200 attributes each) we tried at all costs to go easy on the client's data plan.
Months after this implementation was in place we implemented a search functionality with many different options for sorting and filtering, and not only we had to throw away all our implementation of PouchDB, but we had to start from scratch replacing all its logic with indexed JSON values. PouchDB performance was extremely slow, it was taking more than 5 seconds or so to retrieve results, and we just couldn't afford to delay this time on our scope.
In the end we accomplished to reach a very quick search running flex search inside our indexed JSONs. Don't do the same mistake we did, PouchDB costed us too much budget and precious time. It was a terrible choice.
Unfortunately I cannot offer proof or more details from a reputable source, I can only share the own personal terrible experience I had when I thought we were reaching the end of a project and we had to start from scratch. it was a mess.
Oh boy, a bountied, opinion based question!
I have about 5 years experience with pouchDB specifically, a little with SQLite. I have but a cursory experience with RealmDB - I tried it out and decided it was not a good fit for my hybrid/mobile needs.
pouchDB exceeds in on one area hands down - synchronization/replication just like it's big brother CouchDB. Providing interaction with an offline database that synchronizes with a remote database is huge for many mobile apps. pouchDB is schemaless, leveraging JSON documents. With pouchDB one may choose among several data stores via adapters. As there can be quota headaches1 for your data size the right choice may likely be the SQLite adapter. pouchDB does not support full text search.
SQLite is what its name implies - a relational database, requiring a schema. An advantage to SQLite is platform support and the size of the database is not subject to quota headaches like web storage (e.g. IndexedDB). SQLite supports full text search, and apps can deploy with a canned database.
Between pouchDB and SQLite lies RealmDB - it is a schema based object database that supports synchronization/replication. Like pouchDB, it does not support full text search.
Now your requirements
Looking up object by id
300k static text
full-text search
I read 'static' to mean immutable.
Since your data does not change and full-text search is required, pouchDB and RealmDB would not be good choices. If there is a requirement to enhance, remove or add to the data, either would make sense as changes to data on a single server would replicate changes to the local database, practically in a seamless fashion.
SQLite might be a reasonable choice since it supports search and it is possible to deploy a canned database with the app. However, SQLite can be slow in hybrid apps.
So,
pouchDB and RealmDB would be massive overkill and not a good fit.
SQLite would add a fair bit of complexity.
For your specific requirements I'd stay on your path, though I have a care as it appears flexsearch loads its index into memory - if its performance returns some penalty then SQLite, with it's ability to deploy a canned database and providing a search facility may prove a reasonable trade off versus complexity.
Good luck!
1 Quota Headaches
I would say it really just depends on whether you want and NEED to leverage the power of relational queries. Because your data is never changing I would use JSON unless you are trying to perform complex comparisons between your data. In your case it sounds like you are just going to be searching for the particular ObjectId so JSON is your best bet especially because you are saying you won't need to change the data later.
If you organize your JSON so that your ObjectId are in a sorted order you will easily be able to search quickly.

Serving Lazy Thumbnail Images from Azure Blob Storage - What is the overhead of Exists?

I have a website where users upload images. These images are shown on various sections of the site with various thumbnail dimensions. Since the site is still under rapid development, I don't yet want to commit to a set number of thumb sizes. Thus I believe I should be generating thumbnails on a lazy basis.
Of the two options, which is the most performant way to do this:
When I go to serve the thumbnail, convert the dimensions into a canonical filename (like "bighouse-thumb-160x120"). Check if the file exists in blob storage using client.GetContainerReference(containerName).GetBlockBlobReference(key).Exists(); If it does not exist, generate it and save it.
When I go to serve the thumbnail, query my SQL database to see if the thumbnail exists. If it exists, get the blob URI from the DB and emit that as HTML. If it does not exist, generate it and update the SQL database.
I've used #2 in the past, but design-wise it is duplicating state which is bad. If querying azure for the existence of blobs is scalable, I'd rather do that. I don't really understand the threading model in Asp.Net. If I have 200 users requesting thumbs, will my azure Exists calls all happen in parallel? Even if they do, two round trips seem like a lot of overhead. I assume roundtripping the database is faster and lends itself more easily to generic caching solutions.
What is the right answer?
Regardless of the overhead, I would pre-generate thumbnails when you upload/store the image. This way you move the burden of generating thumbnails from something that is done many times (retrieving an image) to one that is much less often executed (storing an image).
Consider the following scenario, when you lazily generate thumbnails on the first view:
Check for an existing thumbnail (is false, first view remember ;))
Generate a thumbnail
Store the thumbnail
Send the thumbnail to the client
With pre-generated thumbnails the process is much shorter:
Send the thumbnail to the client
Done.
With 'lazy generating' the check for existing can be expensive due to network overhead (on every hit!), generating the thumbnail can be hugely expensive memory- & CPU-wise and than you have to store it, with network overhead again. You can even offload generating the thumbnail(s) to a separate process, possibly started by queue messages, to take the burden of generating the images even further away from your webservers.
However, this brings up the question of what you should do when you introduce a new thumbnail/image size. When you pre-generate the thumbnails you can write a simple tool to create the new sizes and store them, and if you went the separate process route it's even simpler. Just upgrade the separate process, generate a queue message for every existing image and just let it do its work.

Is it good to store Images in DB and retrieve? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Storing Images in DB - Yea or Nay?
Friends,
I have a requirement to show dynamically changing images in DataList. what I did was I am storing the images in DB as image datatype and retrieving the images. Is that good technique to store images in DB?
FYI, user can upload the images.
Regards,
Abhi
The answer is - it depends... Studies have been done (http://research.microsoft.com/pubs/64525/tr-2006-45.pdf) which have basically concluded if objects are larger than one megabyte on average, NTFS has a clear advantage over SQL Server. If the objects are under 256 kilobytes, the database has a clear advantage. Inside this range, it depends on how write intensive the workload is, and the storage age of a typical replica in the system.
I would store them as physical files on the server, but store the file path in the database, not the actual image. Storing the image in the database will increase it's size dramatically over time.
Images sizes will increase your DB size unnecessarily so not good practice to store it, instead store the file path in your db, which is not that big.
Storing image in DB should be done if you have some strong requirement or use case.
The thing you have to address if you store paths etc. is maintaining referential integrity with your images. What if somebody moves files, what if somebody uploads a new file with the same name (I'd suggest uploads get renamed to reflect some kind of key rather than keeping their original name of bob.jpg). You'll need to look at segmenting your directories etc. to keep the list sensible. Finding the images may be harder than if you store them in a DB also.
However, on the up side, you can form a CDN based on distribution of your images over diverse servers, subdomains, cloud etc. if you don't jam them all in your database
Depends on the size of the images and the DB you use.
For SQL Server it is pretty bad idea if they are larger than 1MB and you do not use the NTFS Filestreams for storage of your BLOB fields.
See for example http://www.simple-talk.com/sql/learn-sql-server/an-introduction-to-sql-server-filestream/
If you have a document oriented database like Couch DB it might be ok.
I would store them as physical files on the server, but store the file path in the database, not the actual image. And search the file as per the location store into Databasse. Storing the image in the database will increase it's size dramatically over time.
Storing them in database is also useful if you need to scale your site across multiple web servers.
If they are static then there is no use as they can be deployed with your site but things like avatars are generally better stored in the DB so they are available to all cluster members.

Saving pictures on server

I'm programming site for ren-a-car company, and I need to save exactly three pictures for every car. What is better to store in database ( table description) path to images and images save in some folder OR to save pictures in table (MS SQL )?
The main question is: how big are those pictures on average??
A thorough examination by Microsoft Research (To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem) has shown:
that it's better to store stuff inside your SQL tables as long as the data is typically smaller than 256 KB
that it's better to store on filesystem if the elements are typically larger than 1 MB
The in-between is a bit of a gray area....
So: are you pictures mostly more than 1 MB?? Then store on disk and keep a reference. Otherwise, I'd recommend storing them inside your database table.
Better to save the path to the picture (and probably even better just the filename usually). By storing the picture in the table, you are increasing the size of the tables and therefore increasing lookup time, insert time, etc for no apparent gain.
The other thing is that you say you need to save exactly 3 pictures for each posting, so this makes me think you're using fields in your posts table such as pic1, pic2, pic3. You might want to normalize this so that you have a post_pictures table that links each post with a picture.
For instance:
post_pictures:
post_picture_id | post_id | picture_filename. (you can even get away with having just two fields).
Depending on the horsepower of your server farm, save in SQL. I have set the up fields as varbinary(max).
What advantage do you have in storing the images in database? Can you think of any use of the image data being stored in a DB?
I suggest you store the images in file system rather than database. You can use the database to store meta data of the images (path,keywords etc) in the database.

SQL Server hosting only offers 1GB databases. How do I split my data up?

Using ASP.NET and Windows Stack.
Purpose:
Ive got a website that takes in over 1GB of data about every 6 months. So as you can tell my database can become huge.
Problem:
Most hosting providers only offer Databases in 1GB increments. This means that every time I go over another 1GB, I will need to create another Database. I have absolutely no experience in this type of setup and Im looking for some advice on what to do?
Wondering:
Do I move the membership stuff over to a separate database? This still won't solve much because of the size of the other data I have.
Do I archive data into another database? If I do, how to I allow users to access it?
If I split the data between two databases, do I name the tables the same?
I query all my data with LINQ. So establishing a few different connections wouldn't be a horrible thing.
Is there a hosting provider that anyone knows of that can scale their databases?
I just want to know what to do? How can I solve this dilemma? I don't have the advertising dollars coming in to spend more than $50 a month so far...
While http://www.ultimahosts.net/windows/vps/ seems to offer the best solution for the best price, they still split the databases up. So where do I go from here?
Again, I am a total amateur to multiple databases. Ive only used one at a time..
I'd be genuinely surprised if they actually impose a hard 1GB per DB limit and create a new one for each additional GB, but on the assumption that that actually is the case -
Designate a particular database as your master database. This is the only one your app will directly connect to.
Create a clone of all the tables you'll need in your second (and third, fourth etc) databases.
Within your master database, create a view that does a UNION on the tables as a cross-DB query - SELECT * FROM Master..TableName UNION SELECT * FROM DB2..TableName UNION SELECT * FROM DB3..TableName
For writing, you'll need to use sprocs to locate the relevant records and update them, but you shouldn't have a major problem there. In principle you could extend the view above to return which DB the record was in if you wanted.
Answering this question is very hard for it requires knowing at least some basic facts about the data model, the way the data is queried, etc. Also as suggested by rexem, a better understanding of the use model may allow using normalization to limit the growth (and I had may also allow introducing compression, if applicable)
I'm more puzzled at the general approach and business model (and I do understand the need to keep cost down with a startup application based on ad revenues). Wouldn't you be able to contract an amount that will fit your need for the next 6 months, then, when you start outgrowing this space, purchase additional storage (for an extra 6 month/year, by then you may be "rich"); such may not even require anything on your end (depends on the way hosting service manages racks etc.), or at worse, may require you to copy the old database to the new (bigger) storage?
In this fashion, you wouldn't need to split the database in any artificial fashion, and hence focus on customer-oriented features, rather than optimizing queries that need to compile info from multiple servers.
I believe solution is much more simpler than that: also if your provider manage database in 1 GB space it does not means that you have N databases of 1 GB each, it means that once you reach 1 GB the database could be increased to move to 2 GB, 3 GB and so on...
Regards
Massimo
You would have multiple questions to answer:
It seems the current hosting provider can not be very reliable if it is the way you say: they create a new database every time the initial one gets more then 1GB - this sounds strange... at least they should increase the storage for the current db and announce you that you'll be charged more... Find other hosting solutions with better options...
Is there any information into your current DB that could be archived? That's a very important question since you may carry over "useless" data that could be archived into separate databases and queried only when special requests. As other colleagues told you already, that would be difficult for us to evaluate since we do not know the data model.
Can you split the data model into two total different storages and only replicate between them the common information? You could use SQL Server Replication (http://technet.microsoft.com/en-us/library/ms151198.aspx) to maintain the same membership information between the databases.
If the data model can not be splited then I do not see any practical choice to have multiple databases - just find a bigger storage solution.
You may want to look for a better hosting provider.
Even SQL Express supports a 4GB database, and it's free. Some hosts don't like using SQL Express in a shared environment, but disk space is so cheap these days that finding a plan that starts at or grows in chunks of more than 1GB should be pretty easy.
You should go for a Windows VPS solution. Most of the Windows VPS providers will offer SQL 2008 Web Edition that can support upto 10 GB of database space ...

Resources