Accessing huge data from application - sqlite

Before starting application, I just would like to know the feasibility here.
I have data around 15GB (text and some Images) stored in SQLite database of my SD Card, I need to access it from my application. Data will get increased on daily basis and may reach till 64 GB.
Can any one tell me limitations in accessing such huge database stored in SD card from the application?

SQLite itself supports databases in that range like 16-32GB (it may start working slower, but it should still work).
However, you are likely to hit a limit of FAT32 maximum file size, which is just 4GB - and this will be tough to overcome. SQLite allows to use attached databases which allow you to split it into smaller chunks, but this is really cumbersome.
If you can format your SD card as ext4, or use internal storage as ext4, then you should not really have big problems.

Related

Maria DB recommended RAM,disk,core capacity?

I am not able to find maria DB recommended RAM,disk,number of Core capacity. We are setting up initial level and very minimum data volume. So just i need maria DB recommended capacity.
Appreciate your help!!!
Seeing that over the last few years Micro-Service architecture is rapidly increasing, and each Micro-Service usually needs its own database, I think this type of question is actually becoming more appropriate.
I was looking for this answer seeing that we were exploring the possibility to create small databases on many servers, and was wondering for interest sake what the minimum requirements for a Maria/MySQL DB would be...
Anyway I got this helpful answer from here that I thought I could also share here if someone else was looking into it...
When starting up, it (the database) allocates all the RAM it needs. By default, it
will use around 400MB of RAM, which isn’t noticible with a database
server with 64GB of RAM, but it is quite significant for a small
virtual machine. If you add in the default InnoDB buffer pool setting
of 128MB, you’re well over your 512MB RAM allotment and that doesn’t
include anything from the operating system.
1 CPU core is more than enough for most MySQL/MariaDB installations.
512MB of RAM is tight, but probably adequate if only MariaDB is running. But you would need to aggressively shrink various settings in my.cnf. Even 1GB is tiny.
1GB of disk is more than enough for the code and minimal data (I think).
Please experiment and report back.
There are minor differences in requirements between Operating system, and between versions of MariaDB.
Turn off most of the Performance_schema. If all the flags are turned on, lots of RAM is consumed.
20 years ago I had MySQL running on my personal 256MB (RAM) Windows box. I suspect today's MariaDB might be too big to work on such tiny machine. Today, the OS is the biggest occupant of any basic machine's disk. If you have only a few MB of data, then disk is not an issue.
Look at it this way -- What is the smallest smartphone you can get? A few GB of RAM and a few GB of "storage". If you cut either of those numbers in half, the phone probably cannot work, even before you add apps.
MariaDB or MySQL both actually use very less memory. About 50 MB to 150 MB is the range I found in some of my servers. These servers are running a few databases, having a handful of tables each and limited user load. MySQL documentation claims in needs 2 GB. That is very confusing to me. I understand why MariaDB does not specify any minimum requirements. If they say 50 MB there are going to be a lot of folks who will want to disagree. If they say 1 GB then they are unnecessarily inflating the minimum requirements. Come to think of it, more memory means better cache and performance. However, a well designed database can do disk reads every time without any performance issues. My apache installs (on the same server) consistently use up more memory (about double) than the database.

The Case of the Missing '14 second SQLite database' performance

I have developed a program which uses SQLite 3.7 ... database, in it there is a rather extensive write/read module that imports , checks and updates data. This process takes 14 seconds on my PC and Im pleased as punch with the performance.
I use transactions for everything with paratetrs my PC is a Intel i7 with 18gig of ram. I have not set anything in the database. I used SQLite Expert to create the database and create the data structures including table and columns and checked that all indexes are created. In other words its all OK.
I have since deployed the program/database to 2 other machines. That 14 second process takes over 5 minutes on the other machines. Same program, identical data, identical database. The machines are upto date, one is a 3rd gen Intel i7 bought last week, the other is quite fast as well so hardware should not be an issue.
Im just not understanding what the problem could be? Is it the database itself ? I have not set anything other then encription on it. Remembering that I run the same and it takes the 14 seconds. Could it be that the database is 'optimised' to my PC ? so when I give it to others its not optimised?
I know I could turn off jurnaling to get better performance, but that would only speed up the process and still would leave the problem.
Any ideas would be welcome.
EDIT:
I have tested the program on my 7yo Dual Athelon with 3gig of ram running XP on HDD, and the procedure took 35 seconds. Well in tolerable limits considering. I just dont get what could be making 2 modern machines take 5 min ?
I have an idea that its a write issue, as using a reader they are slower but quite ecceptable.
SQLite speed is affected most by how well the disk does random reads and writes; any SSD is much more better at this than any rotating disk.
Whenever changes overflow the internal cache, they must be written to disk. You should use PRAGMA cache_size to increase the cache to more than the default 2 MB.
Changed data must be written to disk at the end of every transaction. Make sure that there are as many changes as possible in one transaction.
If much of your processing involves temporary tables or indexes, the speed is affected by the speed of the main disk. If your machines have enough RAM, you can force temporary data to RAM with PRAGMA temp_store.
You should enable Write-Ahead Logging.
Note: the default SQLite distribution does not have encryption.

SQLite Abnormal Memory Usage

We are trying to Integrate SQLite in our Application and are trying to populate as a Cache. We are planning to use it as a In Memory Database. Using it for the first time. Our Application is C++ based.
Our Application interacts with the Master Database to fetch data and performs numerous operations. These Operations are generally concerned with one Table which is quite huge in size.
We replicated this Table in SQLite and following are the observations:
Number of Fields: 60
Number of Records: 1,00,000
As the data population starts, the memory of the Application, shoots up drastically to ~1.4 GB from 120MB. At this time our application is in idle state and not doing any major operations. But normally, once the Operations start, the Memory Utilization shoots up. Now with SQLite as in Memory DB and this high memory usage, we don’t think we will be able to support these many records.
Q. Is there a way to find the size of the database when it is in memory?
When I create the DB on Disk, the DB size sums to ~40MB. But still the Memory Usage of the Application remains very high.
Q. Is there a reason for this high usage. All buffers have been cleared and as said before the DB is not in memory?
Any help would be deeply appreciated.
Thanks and Regards
Sachin
A few questions come to mind...
What is the size of each record?
Do you have memory leak detection tools for your platform?
I used SQLite in a few resource constrained environments in a way similar to how you're using it and after fixing bugs it was small, stable and fast.
IIRC it was unclear when to clean up certain things used by the SQLite API and when we used tools to find the memory leaks it was fairly easy to see where the problem was.
See this:
PRAGMA shrink_memory
This pragma causes the database connection on which it is invoked to free up as much memory as it can, by calling sqlite3_db_release_memory().

Why are SQLite transactions bound to harddisk rotation?

There's a following statement in SQLite FAQ:
A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second.
As I know there's a cache on the harddisk and there might be also an extra cache in the disk driver that abstract the operation that is perceived by the software from the actual operation against the disk platter.
Then why and how exactly are transactions so strictly bound to disk platter rotation?
From Atomic Commit In SQLite
2.0 Hardware Assumptions
SQLite assumes that the operating
system will buffer writes and that a
write request will return before data
has actually been stored in the mass
storage device. SQLite further assumes
that write operations will be
reordered by the operating system. For
this reason, SQLite does a "flush" or
"fsync" operation at key points.
SQLite assumes that the flush or fsync
will not return until all pending
write operations for the file that is
being flushed have completed. We are
told that the flush and fsync
primitives are broken on some versions
of Windows and Linux. This is
unfortunate. It opens SQLite up to the
possibility of database corruption
following a power loss in the middle
of a commit. However, there is nothing
that SQLite can do to test for or
remedy the situation. SQLite assumes
that the operating system that it is
running on works as advertised. If
that is not quite the case, well then
hopefully you will not lose power too
often.
Because it ensures data integrity by making sure the data is actually written on to the disk rather than held in memory. Thus if the power goes off or something, the database is not corrupted.
This video http://www.youtube.com/watch?v=f428dSRkTs4 talks about reasons why (e.g. because SQLite is actually used in a lot of embedded devices where the power might well suddenly go off.)

Client and Cache configuration for Oracle coherence

I have the specific scenario for which we want to use Coherence as sitributed cache. Which I am gonna describe here.
I have 20+ standalone processes which are going to put the data in cache continuously. the frequency of all of them differs, though thats not a concern.
And 2 procesess which will be reading data from those cache.
I dont need any underlying db except for the way which coherence provide. Data will be written to the cache and read from the cache.
I have 4 node cluster at my disposal (cost constraint whatever) and the coherence cluster will be on different boxes (infra constraint whatever) and both the populating portion of the cache and the reading part will be on differnt nmachines.
The peak memory size of the cache daily will hover around 6 GB max, min being 2 GB.
Cache will have daily data only and I will have separate archiving processes to simulatneosuly keep archiving it also. the point is that cache size for now will have this size only. Lets say I am gonna keep the date out of key equation.
Though Would like to explore if I can store more into those 4 nodes. Right now its simple serialization, can explore other nbinary formats. Or should I definietly at this size of the cache?
My read and write operations are fairly spread out in the day. Meaning the read and write will keep on happening by those 2 reading clients and 20+ writing clients. Its not like one of them is more. Though there is a startup batch process in all of the background process which push more to the cache than the continuous pushing afterwards. But continuous pushing pushes fair amount of data too.
Now my questions regarding those above points (and because of some confusion also)
The biggest one is somebody told me that I an have limited number of connection depending on the nodes we have bought. so he said if its 4, you ideally should have 4 connections only at the max. So, develop a gatekeeper kind of application and what not. Even if we use TCP Extend. Now from my reading so far, I dont think so. Is it? The point is dont wanna go that way if its really is not a constraint.
In other words is there limit on connection through Proxy Service dependeing on the nodes in the cluster?
Soemwhat related to above only. at the very max, I am going to get some penalty on the performance while pushing to cache only if I go the Extend way, right?
Partioned cache/near cache. As the reading time as well as the most update cache both are extremely critical. (the most imp question i have).
Really want to see the benefit which can be obtained from going to POF instead of lets say serialization/externalizatble/protobuf. Can coherence support protobuf out of the box? (may be for later on)
There's no technical limitation to the number of connections a Coherence Extend proxy can support except normal network and hardware resource constraints. You will have to ask an Oracle sales person if there are licensing limitations.
There is some performance impact from using a proxy because you are adding an additional network hop (client to proxy to cluster). If you use POF serialization then the proxy does not have to serialize/deserialize values. It can just pass the object through in its serialized form. In most applications the performance impact of using a proxy is tiny because Coherence is highly optimized for network speed. You are not required to use a proxy unless your clients are .NET or C++, but there are advantages of isolating client performance from impacting the cache.
Near cache will improve retrieval performance dramatically if there a number of frequently retrieved items for a client since they will be found in-process.
POF offers performance improvements based on faster serialization/deserialization and more compact storage. It is always best to try with test data based on your real production data and measure the difference yourself. Coherence does not support protobuf out of the box.

Resources