About 3 days Ago our Real-time Database Storage doubled really fast, we don't have much data since we just store user-s data and we do not have much users, but this made me think if someone found a way to write data on a way they should not or whatever.
Our database structure is:
/users/
-/UID_1/profile
-/UID_2/profile
I strongly believe that any of these guys wrote a bunch of something, but I am not able to know size on each "record" or how to call
Here, I should write on what did I try, but I didn't try anything yet, I am not sure on what to do, It's production db, and I am a bit afraid to touch it.
PS.
I have the backup file, it's about 25mb large when unzipped.
Looking forward to receive any suggestions,
Cheers
Forgive me but this is a supper basic question, I couldn't really find a good explanation..
Say I am querying a collection looking for certain attributes, and I return some results that match that.
Then I requery looking for some different attributes, some of which will have the same results.
Is there a way to avoid redownloading that data? some of my results will have already been loaded to the users phone. Does that make sense? Basically looking for a way to cache data and not have to worry about pulling the same data twice
I currently use LinqToTwitter to send posts to Twitter. I'd like to convert words in the title of the post to hashtags when it gets fired off as tweet so something like - "Firefox is cool" is the blog post and becomes #Firefox is cool http://myshortu.rl/dhsgeh on Twitter.
So far the way i see it is i need a database table with the words i want to convert to hashtags. I'd have to parse out the title and compare the words to those in the db and add on the pound sign. Is the best way to use a db table? Or can I do it with an in memory collection or keep the words in web.config? Thanks....
The decision on whether to use a database or file (such as web.config) might depend on whether you want to write code that allows you to maintain the list. e.g. Add, Modify, Remove. If so, then a DB sounds like the easiest option. If the list is small and doesn't change, then adding a delimited list to web.config would work fine.
Since you're using ASP.NET you can't hold it in a memory variable, but you can hold the list in Cache. This can make for some very fast lookups, rather than multiple file or DB queries.
Just to put this into perspective though, it's tough to recommend a proper design in a forum because there might be details that aren't known. So, it's best to take my answer as something that helps think about what the tradeoffs are, rather than a definitive recommendation on what you should do.
I am caching data in an application I am currently writing and was wondering which would be better to use a regular text file or a sqlite database to hold the cached data? Thanks.
EDIT:
I am using Zend_Cache so relationships are handled without the need of database. What I am caching is xml strings if saved as regular files can be as big as 60kBs.
Depends on the kind of data you're planning on storing.
If the data is related, and you'd want to retrieve the data based on those relationships...then use SQLite.
If the data is completely unrelated and you're just looking for a way to store a retreive plain text...then use a plain text file. It won't have the added overhead of calling in to SQLite.
If you don't need to persist the data in a file for any reason, and the data lends itself to key/value pair storage...you should look in to something like memcached so you don't have to deal with file IO.
If you really CACHE data (not store or transfer) may be better use cache solutions like memcached ?
60k is so small you probably wouldn't even bother caching the data, just keep it in memory.
The only reason to write it out to disk if you want to persist the data between sessions. In this case, using text/xml or sqlite wont make much difference.
I'm trying to start using plain text files to store data on a server, rather than storing them all in a big MySQL database. The problem is that I would likely be generating thousands of folders and hundreds of thousands of files (if I ever have to scale).
What are the problems with doing this? Does it get really slow? Is it about the same performance as using a Database?
What I mean:
Instead of having a database that stores a blog table, then has a row that contains "author", "message" and "date" I would instead have:
A folder for the specific post, then *.txt files inside that folder than has "author", "message" and "date" stored in them.
This would be immensely slower reading than a database (file writes all happen at about the same speed--you can't store a write in memory).
Databases are optimized and meant to handle such large amounts of structured data. File systems are not. It would be a mistake to try to replicate a database with a file system. After all, you can index your database columns, but it's tough to index the file system without another tool.
Databases are built for rapid data access and retrieval. File systems are built for data storage. Use the right tool for the job. In this case, it's absolutely a database.
That being said, if you want to create HTML files for the posts and then store those locales in a DB so that you can easily get to them, then that's definitely a good solution (a la Movable Type).
But if you store these things on a file system, how can you find out your latest post? Most prolific author? Most controversial author? All of those things are trivial with a database, and very hard with a file system. Stick with the database, you'll be glad you did.
It is really depends:
What is file size
What durability requirements do you have?
How many updates do you perform?
What is file system?
It is not obvious that MySQL would be faster:
I did once such comparison for small object in order to use it as sessions storage for CppCMS. With one index (Key Only) and Two indexes (primary key and secondary timeout).
File System: XFS ext3
-----------------------------
Writes/s: 322 20,000
Data Base \ Indexes: Key Only Key+Timeout
-----------------------------------------------
Berkeley DB 34,400 1,450
Sqlite No Sync 4,600 3,400
Sqlite Delayed Commit 20,800 11,700
As you can see, with simple Ext3 file system was faster or as fast as Sqlite3 for storing data because it does not give you (D) of ACID.
On the other hand... DB gives you many, many important features you probably need, so
I would not recommend using files as storage unless you really need it.
Remember, DB is not always the bottle neck of the system
Forget about long-winded answers, here's the simplest reasons why storing data in plaintext files is a bad idea:
It's near-impossible to query. How would you sort blog posts by date? You'd have to read all the files and compare their date, or maintain your own index file (basically, write your own database system.)
It's a nightmare to backup. tar cjf won't cut it, and if you try you may end up with an inconsistent snapshot.
There's probably a dozen other good reasons not to use files, it's hard to monitor performance, very hard to debug, near impossible to recover in case of error, there's no tools to handle them, etc...
I think the key here is that there will be NO indexing on your data. SO to retrieve anything in say a search would be rediculously slow compared to an indexed database. Also, IO operations are expensive, a database could be (partially) in memory, which makes the data available much faster.
You don't really say why you won't use a database yourself... But in the scenario you are describing I would definitely use a DB over folder any day, for a couple of reasons. First of all, the blog scenario seems very simple but it is very easy to imagine that you, someday, would like to expand it with more functionality such as search, more post details, categories etc.
I think that growing the model would be harder to do in a folder structure than in a DB.
Also, databases are usually MUCH faster that file access due to indexing and memory caching.
IIRC Fudforum used the file-storage for speed reasons, it can be a lot faster to grab a file than to search a DB index, retrieve the data from the DB and send it to the user. You're trading the filesystem interface with the DB and DB-library interfaces.
However, that doesn't mean it will be faster or slower. I think you'll find writing is quicker on the filesystem, but reading faster on the DB for general issues. If, like fudforum, you have relatively immutable data that you want to show several posts in one, then a file-basd approach may be a lot faster: eg they don't have to search for every related post, they stick it all in 1 text file and display it once. If you can employ that kind of optimisation, then your file-based approach will work.
Also, mail servers work in the file-based approach too, the Maildir format stores each email message as a file in a directory, not in a database.
one thing I would say though, you'll be better storing everything in 1 file, not 3. The filesystem is better at reading (and caching) a single file than it is with multiple ones. So if you want to store each message as 3 parts, save them all in a single file, read it to get any of the parts and just display the one you want to show.
...and then you want to search all posts by an author and you get to read a million files instead of a simple SQL query...
Databases are NOT faster. Think about it: In the end they store the data in the filesystem as well. So the question if a database is faster depends strongly on the access path.
If you have only one access path, which correlates with your file structure the file system might be way faster then a database. Just make sure you have some caching available for the filesystem.
Of course you do loose all the nice things of a database:
- transactions
- flexible ways to index data, and therefore access data in a flexible way reasonably fast.
- flexible (though ugly) query language
- high recoverability.
The scaling really depends on the filesystem used. AFAIK most file system have some kind of upper limit for number of files (totally or per directory), though on the new ones this is often very high. For hundreds and thousands of files with some directory structure to keep directories to a reasonable size it should be possible to find a well performing file system.
#Eric's comment:
It depends on what you need. If you only need the content of exact on file per query, and you can determine the location and name of the file in a deterministic way the direct access is faster than what a database does, which is roughly:
access a bunch of index entries, in order to
access a bunch of table rows (rdbms typically read blocks that contain multiple rows), in order to
pick a single row from the block.
If you look at it: you have indexes and additional rows in memory, which make your caching inefficient, where is the the speedup of a db supposed to come from?
Databases are great for the general case. But if you have a special case, there is almost always a special solution that is better in some sense.
if you are preferred to go away with RDBMS, why dont u try the other open source key value or document DBs (Non- relational Dbs)..
From ur posting i understand that u r not goin to follow any ACID properties of relational db.. it would be better to adapt other key value dbs (mongodb,coutchdb or hyphertable) instead of your own file system implementation.. it will give better performance than the existing approaches..
Note: I am not also expert in this.. just started working on MongoDB and find useful in similar scenarios. just wanted to share in case u r not aware of these approaches