How Redis RDB persistance actually works behind the scene? - unix

i was going through Redis RDB persistence. I having some doubts regarding RDB persistence related to its disadvantage.
Understanding So far:
We should use rdb persistence when we need to save the snapshot of dataset currently in memory at some regular interval.
I can understand that in this way we can lose some data in case of server break down. But another disadvantage that i can't understand is how fork can be time consuming when persisting large dataset using rdb.
Quoting from Documentation
RDB needs to fork() often in order to persist on disk using a child
process. Fork() can be time consuming if the dataset is big, and may
result in Redis to stop serving clients for some millisecond or even
for one second if the dataset is very big and the CPU performance not
great. AOF also needs to fork() but you can tune how often you want to
rewrite your logs without any trade-off on durability.
I know how fork works as per my knowledge When parent process forks it create a new Child process and we can allow some code that child process will execute based on its pid or we can provide it some new executable that it will work on using exec() system call.
but things that i don't understand how it will be heavy task when size of dataset is larger?
I think i know the answer but i m not sure about that
Quoted from this link https://www.bottomupcs.com/fork_and_exec.xhtml
When a process calls fork then
the operating system will create a new process that is exactly the same as the parent process. This means all the state that was talked about previously is copied, including open files, register state and all memory allocations, which includes the program code.
As per above statement whole dataset of redis will be copied to child.
Am i understanding right?

When standard fork is called with copy-on-write the OS must still copy all the page table entries, which can take time time if you have small 4k pages and a huge dataset, this is what makes the actual fork() time slow.
You can also find a lot of time and memory is required if your dataset is changing a lot in a sparse way, as copy-on-write semantics triggers the actual memory pages to be copied as changes are made to the original. Redis also performs incremental rehashing and maintains expiry etc. so an instance that is more active will typically take longer to save to disk.
More reading:
Faster forking of large processes on Linux?
http://kirkwylie.blogspot.co.uk/2008/11/linux-fork-performance-redux-large.html

Related

Copying 100 GB with continues change of file between datacenters with R-sync is good idea?

I have a datacenter A which has 100GB of the file changing every millisecond. I need to copy and place the file in Datacenter B. In case of failure on Datacenter A, I need to utilize the file in B. As the file is changing every millisecond does r-sync can handle it at 250 miles far datacenter? Is there any possibility of getting the corropted file? As it is continuously updating when we call this as a finished file in datacenter B ?
rsync is a relatively straightforward file copying tool with some very advanced features. This would work great for files and directory structures where change is less frequent.
If a single file with 100GB of data is changing every millisecond, that would be a potential data change rate of 100TB per second. In reality I would expect the change rate to be much smaller.
Although it is possible to resume data transfer and potentially partially reuse existing data, rsync is not made for continuous replication at that interval. rsync works on a file level and is not as commonly used as a block-level replication tool. However there is an --inplace option. This may be able to provide you the kind of file synchronization you are looking for. https://superuser.com/questions/576035/does-rsync-inplace-write-to-the-entire-file-or-just-to-the-parts-that-need-to
When it comes to distance, the 250 miles may result in at least 2ms of additional latency, if accounting for the speed of light, which is not all that much. In reality this would be more due to cabling, routers and switches.
rsync by itself is probably not the right solution. This question seems to be more about physics, link speed and business requirements than anything else. It would be good to know the exact change rate, and to know if you're allowed to have gaps in your restore points. This level of reliability may require a more sophisticated solution like log shipping, storage snapshots, storage replication or some form of distributed storage on the back end.
No, rsync is probably not the right way to keep the data in sync based on your description.
100Gb of data is of no use to anybody without without the means to maintain it and extract information. That implies structured elements such as records and indexes. Rsync knows nothing about this structure therefore cannot ensure that writes to the file will transition from one valid state to another. It certainly cannot guarantee any sort of consistency if the file will be concurrently updated at either end and copied via rsync
Rsync might be the right solution, but it is impossible to tell from what you have said here.
If you are talking about provisioning real time replication of a database for failover purposes, then the best method is to use transaction replication at the DBMS tier. Failing that, consider something like drbd for block replication but bear in mind you will have to apply database crash recovery on the replicated copy before it will be usable at the remote end.

Hadoop - job submission time on large data

Did anyone face any problem with submitting job on large data. Data is around 5-10 TB uncompressed, it is in approximate 500K files. When we try to submit a simple java map reduce job, it's mostly spend more than hour on getsplits() function call. And takes multiple hour to appear in job tracker. Is there any possible solution to solve this problem?
with 500k files, you are spending a lot of time tree walking to find all these files, which then need to be assigned to list of InputSplits (the result of getSplits).
As Thomas points out in his answer, if your machine performing the job submission has a low amount of memory assigned to the JVM, then you're going to see issues with the JVM performing garbage collection to try and find the memory required to build up the splits for these 500K files.
To makes matters worse, if these 500K files are splittable, and larger than a single block size, then you'll get even more input splits to process the files (a file of size say 1GB, with a block size of 256MB, you'll by default get 4 map tasks to process this file, assuming the input format and file compression supports splitting the file). If this is applicable to your job (look at the number of map tasks spawned for your job, are there more than 500k?), then you can force less mappers to be created by amending the mapred.min.split.size configuration property to a size larger then the current block size (setting it to 1GB for the previous example means you'll get a single mapper to process the file, rather than 4). This will help the performance of getSplits method the resultant list of getSplits will be smaller, requiring less memory.
The second symptom of your problem is the time is takes to serialize the input splits to a file (client side), and then the deserialization time at the job tracker end. 500K+ splits is going to take time, and the jobtracker will have similar GC issues if it has a low JVM memory limit.
It largely depends on how "strong" your submission server is (or your laptop client), maybe you need to upgrade RAM and CPU to make the getSplits call faster.
I believe you ran into swap issues there and the computation takes therfore multiple times longer than usual.

Can you sacrifice performance to get concurrency in Sqlite on a NFS?

I need to write a client/server app stored on a network file system. I am quite aware that this is a no-no, but was wondering if I could sacrifice performance (Hermes: "And this time I mean really slash.") to prevent data corruption.
I'm thinking something along the lines of:
Create a separate file in the system everytime a write is called (I'm willing do it for every connection if necessary)
Store the file name as the current millisecond timestamp
Check to see if the file with that time or earlier exists
If the same one exists wait a random time between 0 to 10 ms, and try again.
While file is the earliest timestamp, do work, delete file lock, otherwise wait 10ms and try again.
If a file persists for more than a minute, log as an error, stop until it is determined that the data is not corrupted by a person.
The problem I see is trying to maintain the previous state if something locks up. Or choosing to ignore it, if the state change was actually successful.
Is there a better way of doing this, that doesn't involve not doing it this way? Or has anyone written one of these with a lot less problems than the Sqlite FAQ warns about? Will these mitigations even factor in to preventing data corruption?
A couple of notes:
This must exist on an NSF, the why is not important because it is not my decision to make (it doesn't look like I was clear enough on that point).
The number of readers/writers on the system will be between 5 and 10 all reading and writing at the same time, but rarely on the same record.
There will only be clients and a shared memory space, there is no way to put a server on there, or use a server based RDMS, if there was, obviously I would do it in a New York minute.
The amount of data will initially start off at about 70 MB (plain text, uncompressed), it will grown continuous from there at a reasonable, but not tremendous rate.
I will accept an answer of "No, you can't gain reasonably guaranteed concurrency on an NFS by sacrificing performance" if it contains a detailed and reasonable explanation of why.
Yes, there is a better way. Don't use NFS to do this.
If you are willing to create a new file every time something changes, I expect that you have a small amount of data and/or very infrequent changes. If the data is small, why use SQLite at all? Why not just have files with node names and timestamps?
I think it would help if you described the real problem you are trying to solve a bit more. For example if you have many readers and one writer, there are other approaches.
What do you mean by "concurrency"? Do you actually mean "multiple readers/multiple writers", or can you get by with "multiple readers/one writer with limited latency"?

Asynchronous programs showing locality of reference?

I was reading this excellent article which gives an introduction to Asynchronous programming here http://krondo.com/blog/?p=1209 and I came across the following line which I find hard to understand.
Since there is no actual parallelism(in asnyc), it appears from our diagrams that an asynchronous program will take just as long to execute as a synchronous one, perhaps longer as the asynchronous program might exhibit poorer locality of reference.
Could someone explain how locality of reference comes into picture here?
Locality of reference, like that Wikipedia article mentions, is the observation that when some data is accessed (on disk, in memory, whatever), other data near that location is often accessed as well. This observation makes sense since developers tend to group similar data together. Since the data are related, they're often processed together. Specifically, this is known as spatial locality.
For a weak example, imagine computing the sum of an array or doing a matrix multiplication. The data representing the array or matrix are typically stored in continguous memory locations, and for this example, once you access one specific location in memory, you will be accessing others close to it as well.
Computer architecture takes locality of reference into account. Operating systems have the notion of "pages" which are (roughly) 4KB chunks of data that can be paged in and out individually (moved between physical memory and disk). When you touch some memory that's not resident (not physically in RAM), the OS will bring the entire page of data off disk and into memory. The reason for this is locality: you're likely to touch other data around what you just touched.
Additionally, CPUs have the concept of caches. For example, a CPU might have an L1 (level 1) cache, which is really just a big block of on-CPU data that the CPU can access faster than RAM. If a value is in the L1 cache, the CPU will use that instead of going out to RAM. Following the principle of the locality of reference, when a CPU access some value in main memory, it will bring that value and all values near it into the L1 cache. This set of values is known as a cache line. Cache lines vary in size, but the point is that when you access the first value of an array, the CPU might have to get it from RAM, but subsequent accesses (close in proximity) will be faster since the CPU brought the whole bundle of values into the L1 cache on the first access.
So, to answer your question: if you imagine a synchronous process computing the sum of a very large array, it will touch memory locations in order one after the other. In this case, your locality is good. In the asynchronous case, however, you might have n threads each taking a slice of the array (of size 1/n) and computing the sub-sum. Each thread is touching a potentially very different location in memory (since the array is large) and since each thread can be switched in and out of execution, the actual pattern of data access from the point of view of the OS or CPU is poor. The L1 cache on a CPU is finite, so if Thread 1 brings in a cache line (due to an access), this might evict the cache line of Thread 2. Then, when Thread 2 goes to access its array value, it has to go to RAM, which will bring in its cache line again and potentially evict the cache line of Thread 1, and so on. Depending on the system resources and usage as a whole, this pattern could happen on the OS/page level as well.
The poorer locality of reference results in poorer cache usage -- each time you do a thread switch, you can expect that most of what's in the cache relates to that previous thread, not the current one, so most reads will get data from main memory instead of the cache.
He's ultimately wrong though, at least for quite a few programs. The reason is pretty simple: even though you gain nothing on CPU-bound code, when you can combine some CPU-bound code with some I/O bound code, you can expect an overall speed improvement. You can, for example, initiate a read or write, then switch to doing computation while the disk is busy, then switch back to the I/O bound thread when the disk finishes its work.

Object not garbage collected, but contains no gcroots

Running into a prickly problem with our web app here. (Asp.net 2.0 Win server 2008)
Our memory usage for the website, grows and grows even though I would expect it to remain at a fairly static level. (We have a small amount of data that gets stored in state).
Wanting to find out what the problem is, I've run a System.GC.Collect(); a few times, taken a memory dump and then loaded this memory dump into WinDbg.
When I do a DumpHeap -Stat I get an inordinately large number on particular type hanging around in memory.
0000064280580b40 713471 79908752 PaymentOption
so, doing a DumpHeap -MT for this type, I get a stack of object references. Picking a random number of these, I do a !gcroot and the command comes back reporting that no references are held to it.
To me, this is exactly when the GC should collect these items, but for some reason they have been left outstanding.
Can anybody offer an explanation as to what might be happening?
You could try using sosex.dll in Windbg, which is an extension written to help with .NET debugging. There is a command named !refs which is similar to !gcroot, in that it will show you all the objects referencing an object, plus it will show all the objects that it too is referencing.
In the example on the author's website, !refs is used against an object and the output looks like this:
0:000> !refs 0000000080000db8
Objects referenced by 0000000080000db8 (System.Threading.Mutex):
0000000080000ef0 32 Microsoft.Win32.SafeHandles.SafeWaitHandle
Objects referencing 0000000080000db8 (System.Threading.Mutex):
0000000080000e08 72 System.Threading.Mutex+<>c__DisplayClass3
0000000080000e50 64 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
Few things:
GC.Collect won't help you do any debugging. The garbage collector is already being called: if any objects were available for collection it would have happened already.
Idle memory on a server is wasted memory. Are you sure memory is being 'leaked', or is it just that the framework is deciding it can keep more things in memroy or keep more memory around for faster access? In this case I suspect you are leaking memory, but it's something to double check for.
It sounds like something you don't expect is keeping a reference to PaymentOption objects. Perhaps a static collection somewhere? Or separate thread?
Does PaymentObject implement a finalizer by any chance? Does it call a STA COM object?
I'd be curious to see the output of !finalizequeue to see if the count of objects that are showing up on the heap are roughly the amount of any that might waiting to be finalized. Output should probably look something like this:
generation 0 has 57 finalizable objects (0409b5cc->0409b6b0)
generation 1 has 55 finalizable objects (0409b4f0->0409b5cc)
generation 2 has 0 finalizable objects (0409b4f0->0409b4f0)
Ready for finalization 0 objects (0409b6b0->0409b6b0)
If the number of Ready for finalization objects continues to grow, and your certain garbage collections are occuring (confirm via perfmon counters), then it might be a blocked finalizer thread. You might need to take several snapshots over the lifetime of the process (before a recycle) to confirm. I usually rely on the magic number of three, as long as the site is under some sort of load.
A bug in a finalizer can block the finalizer thread and prevent the objects from ever being collected.
If the PaymentOption object calls a legacy STA COM object, then this article ASP.NET Hang and OutOfMemory exceptions caused by STA components might point in the right direction.
Not without more info on your application. But we ran into some nasty memory problems a long time ago. Do you use ASP.NET caching? As Raymond Chen likes to say, "poor caching strategy is indisitinguishable from a memory leak."
Check out another tool - CLRProfiler.exe - it will help you traverse object reference trees to see where your objects are rooted. This is also good: link text
You've heard this before - if you have to GC.Collect, something is wrong.
Is the PaymentOption object created in an asynchronous process, by any chance? I remember something about, if you don't call EndInvoke, you can get problems like this.
I've been investigating the same issue myself and was asking why objects that had no references were not being collected.
Objects larger than 85,000 bytes are stored on the Large Object Heap, from which memory is freed up less frequently.
http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
A single PaymentOption may not be that big, but are they contained within collections, or are they based on something like a DataSet? You should pick on few instances of the PaymentOption / collection / DataSet and then use the sos !objsize command to see big they are.
Unfortunately this doesn't really answer the question. I like to think I can trust the .net framework to take care of releasing unused memory whenever it needs to. However I see a lot of memory being used by the worker process running the app I am looking at, even when memory looks quite tight on the server.
FYI, SOS in .NET 4 supports a few new commands that might be of assistance, namely !gcwhere (locate the generation of an objection; sosex's gcgen) and !findroots (does what it says on the tin; sosex's !refs)
Both are documented on the SOS documentation and mentioned on Tess Ferrandez's blog.

Resources