Is solution without starvation also solution without deadlocks? And vice versa? - deadlock

I was just wondering...
Is solution without starvation also solution without deadlocks? And vice versa,
is solution without deadlocks also solution without starvation?

A solution without starvation would mean the system is "fair"; each thread has just enough access to a shared, limited resources to make progress. In this case I would assume there are no deadlocks. Deadlock is the end of the road for starving threads... they all starve and no one makes progress.
A solution with no deadlocks could still have cases where only a subset of the threads have "fair" access to a shared, limited resource. The remainder of threads would starve.

Related

Trigger a function after 5 API calls return (in a distributed context)

My girlfriend was asked the below question in an interview:
We trigger 5 independent APIs simultaneously. Once they have all completed, we want to trigger a function. How will you design a system to do this?
My girlfriend replied she will use a flag variable, but the interviewer was evidently not happy with it.
So, is there a good way in which this could be handled (in a distributed context)? Note that each of the 5 API calls are made by different servers and the function to be triggered is on a 6th server.
The other answers suggesting Promises seem to assume all these requests necessarily come from the same client. If the context here is distributed systems, as you said it is, then I don't think those are valid answers. If they were, then the interview question would have nothing to do with distributed systems, except to essay your girlfriend's ability to recognize something that isn't really a distributed systems problem.
And the question does have the shape of some classic problems in distributed systems. It sounds a lot like YouTube view counting: How do you achieve qualities like atomicity and consistency in a multi-threaded, multi-process, or multi-client environment? Failing to recognize this, thinking the answer could be as simple as "a flag", betrayed a lack of experience in distributed systems.
Another thing about that answer is that it leaves many ambiguities. Where does the flag live? As a variable in another (Java?) API? In a database? In a file? Even in a non-distributed context, these are important questions. And if she had gone on to address these questions, even being innocent of all the distributed systems complications, she might have happily fallen into a discussion of the kinds of D.S. problems that occur when you use, say, a file; and how using a ACID-compliant database might solve those problems, and what the tradeoffs might be there... And she might have corrected herself and said "counter" instead of "flag"!
If I were asked this, my first thought would be to use promises/futures. The idea behind them is that you can execute time-consuming operations asynchronously and they will somehow notify you when they've completed, either successfully or unsuccessfully, typically by calling a callback function. So the first step is to spawn five asynchronous tasks and get five promises.
Then I would join the five promises together, creating a unified promise that represents the five separate tasks. In JavaScript I might call Promise.all(); in Java I would use CompletableFuture.allOf().
I would want to make sure to handle both success and failure. The combined promise should succeed if all of the API calls succeed and fail if any of them fail. If any fail there should be appropriate error handling/reporting. What happens if multiple calls fail? How would a mix of successes and failures be reported? These would be design points to mention, though not necessarily solve during the interview.
Promises and futures typically have modular layering system that would allow edge cases like timeouts to be handled by chaining handlers together. If done right, timeouts could become just another error condition that would be naturally handled by the error handling already in place.
This solution would not require any state to be shared across threads, so I would not have to worry about mutexes or deadlocks or other thread synchronization problems.
She said she would use a flag variable to keep track of the number of API calls have returned.
One thing that makes great interviewees stand out is their ability to anticipate follow-up questions and explain details before they are asked. The best answers are fully fleshed out. They demonstrate that one has thought through one's answer in detail, and they have minimal handwaving.
When I read the above I have a slew of follow-up questions:
How will she know when each API call has returned? Is she waiting for a function call to return, a callback to be called, an event to be fired, or a promise to complete?
How is she causing all of the API calls to be executed concurrently? Is there multithreading, a fork-join pool, multiprocessing, or asynchronous execution?
Flag variables are booleans. Is she really using a flag, or does she mean a counter?
What is the variable tracking and what code is updating it?
What is monitoring the variable, what condition is it checking, and what's it doing when the condition is reached?
If using multithreading, how is she handling synchronization?
How will she handle edge cases such API calls failing, or timing out?
A flag variable might lead to a workable solution or it might lead nowhere. The only way an interviewer will know which it is is if she thinks about and proactively discusses these various questions. Otherwise, the interviewer will have to pepper her with follow-up questions, and will likely lower their evaluation of her.
When I interview people, my mental grades are something like:
S — Solution works and they addressed all issues without prompting.
A — Solution works, follow-up questions answered satisfactorily.
B — Solution works, explained well, but there's a better solution that more experienced devs would find.
C — What they said is okay, but their depth of knowledge is lacking.
F — Their answer is flat out incorrect, or getting them to explain their answer was like pulling teeth.

I need a short C programm that runs slower on a processor with HyperThreading than on one without it

I want to write a paper with Compiler Optimizations for HyperTreading. First step would be to investigate why a processor with HyperThreading( Simultaneous Multithreading) could lead to poorer performances than a processor without this technology. First step is to find an application that is better without HyperThreading, so i can run some hardware performance counters on it. Any suggest on how or where i could find one?
So, to summarize. I know that HyperThreading benefits are between -10% and +30%. I need a C application that falls in the 10% performance penalty.
Thanks.
Probably the main drawback of hyperthreading is the effective halving of cache sizes. Each thread will be populating the cache, and so each, in effect, has half the cache.
To create a programme which runs worse with hyperthreading than without, create a single threaded programme which performs a task which just fits inside L1 cache. Then add a second thread, which shares the workload, the works from "the other end" of the data, as it were. You will find performance falls through the floor - this is because both threads now must access L2 cache.
Hyperthreading can dramatically improve or worsen performance. It is completely dependent on use. None of this -10%/+30% stuff - that's ridiculous.
I'm not familiar with compiler optimizations for HT, nor the different between i7 HT and P4's as David pointed out. However, you can expect some general behaviors.
Context switching is very expensive. So if you have one core and run two threads on it simultaneously, switching back and forth one thread from the other always gives you performance penalty. However, threads do not use the core all the time. For example, if the thread reads or writes memory, it just waits for the memory access to be done, without using the core, usually for more than 100 cycles. There are many other cases that a thread need to stall like this, e.g., I/O operations, data dependencies, etc. Here HT helps, because it can ships out the waiting (or blocked) thread, and executes another thread instead.
Therefore, you can think if all threads are really unlikely to be blocked, then context switching will only cause overhead. Think about very computation-bounded application working on a small set of data.

Under what circumstances is a deadlock a good thing?

What is an example of when a deadlock is beneficial?
If the program you're deadlocking is a virus?
If you want to freeze up a process, I suppose that would be the only time you should do it... lol.
It's beneficial in that it clearly demonstrates you that your code is buggy and your synchronization methods need to be revised.
here is an example of exploiting a db deadlock in mysql.
It's more of a hack than a generalizable benefit of deadlocks, but it's the only thing I've ever come across that involves creating a deadlock for a beneficial effect other than for training purposes and for testing automated detection methods (which some may argue are both beneficial but where the benefit comes from helping avoid future deadlocks, so they are beneficial in the same sense as it's beneficial to study a deadly disease in a lab).
A deadlock is never beneficial. It is a huge problem in a program, because it causes the program to freeze under given circumstances!
A deadlock is never beneficial. It occurs when one or more processes are blocked forever because of requirements that cannot be satisfied. This will usually cause the program to appear to freeze as the processes will not continue unless the deadlock is broken. Programs must be crafted specifically to avoid deadlocks in all cases.

MSMQ - Message Queue Abstraction and Pattern

Let me define the problem first and why a messagequeue has been chosen. I have a datalayer that will be transactional and EXTREMELY insert heavy and rather then attempt to deal with these issues when they occur I am hoping to implement my application from the ground up with this in mind.
I have decided to tackle this problem by using the Microsoft Message Queue and perform inserts as time permits asynchronously. However I quickly ran into a problem. Certain inserts that I perform may need to be recalled (ie: retrieved) immediately (imagine this is for POS system and what happens if you need to recall the last transaction - one that still hasn’t been inserted).
The way I decided to tackle this problem is by abstracting the MessageQueue and combining it in my data access layer thereby creating the illusion of a single set of data being returned to the user of the datalayer (I have considered the other issues that occur in such a scenario (ie: essentially dirty reads and such) and have concluded for my purposes I can control these issues).
However this is where things get a little nasty... I’ve worked out how to get the messages back and such (trivial enough problem) but where I am stuck is; how do I create a generic (or at least somewhat generic) way of querying my message queue? One where I can minimize the duplication between the SQL queries and MessageQueue queries. I have considered using LINQ (but have very limited understanding of the technology) and have also attempted an implementation with Predicates which so far is pretty smelly.
Are there any patterns for such a problem that I can utilize? Am I going about this the wrong way? Does anyone have any of their own ideas about how I can tackle this problem? Does anyone even understand what I am talking about? :-)
Any and ALL input would be highly appreciated and seriously considered…
Thanks again.
For anyone interested. I decided in
the end to simply cache the
transaction in another location and
use the MSMQ as intended and described
below.
If the queue has a large-ish number of messages on it, then enumerating those messages will become a serious bottleneck. MSMQ was designed for first-in-first-out kind of access and anything that doesn't follow that pattern can cause you lots of grief in terms of performance.
The answer depends greatly on the sort of queries you're going to be executing, but the answer may be some kind of no-sql database (CouchDB or BerkeleyDB, etc)

How do you pre allocate memory to a process in solaris?

My problem is:
I have a perl script which uses lot of memory (expected behaviour because of caching). But, I noticed that the more I do caching, slower it gets and the process spends most of the time in sleep mode.
I thought pre-allocating memory to the process might speed up the performance.
Does someone have any ideas here?
Update:
I think I am not being very clear here. I will put question in clearer way:
I am not looking for the ways of pre-allocating inside the perl script. I dont think that would help me much here. What I am interested in is a way to tell OS to allocate X amount of memory for my perl script so that it does not have to compete with other processes coming in later.
Assume that I cant get away with the memory usage. Although, I am exploring ways of reducing that too but dont expect much improvement there.
FYI, I am working on a solaris 10 machine.
What I gathered from your posting and comments is this:
Your program gets slow when memory use rises
Your pogram increasingly spends time sleeping, not computing.
Most likely eplanation: Sleeping means waiting for a resource to become available. In this case the resource most likely is memory. Use the vmstat 1 command to verify. Have a look at the sr column. If it goes beyond ~150 consistently the system is desperate to free pages to satisfy demand. This is accompanied by high activity in the pi, po and fr columns.
If this is in fact the case, your best choices are:
Upgrade system memory to meet demand
Reduce memory usage to a level appropiate for the system at hand.
Preallocating memory will not help. In either case memory demand will exceed the available main memory at some point. The kernel will then have to decide which pages need to be in memory now and which pages may be cleared and reused for the more urgently needed pages. If all regularily needed pages (the working set) exceeds the size of main memory, the system is constantly moving pages from and to secondary storage (swap). The system is then said to be thrashing and spends not much time doing useful work. There is nothing you can do about this execept adding memory or using less of it.
From a comment:
The memory limitations are not very severe but the memory footprint easily grows to GBs and when we have competing processes for memory, it gets very slow. I want to reserve some memory from OS so that thrashing is minimal even when too many other processes come. Jagmal
Let's take a different tack then. The problem isn't really with your Perl script in particular. Instead, all the processes on the machine are consuming too much memory for the machine to handle as configured.
You can "reserve" memory, but that won't prevent thrashing. In fact, it could make the problem worse because the OS won't know if you are using the memory or just saving it for later.
I suspect you are suffering the tragedy of the commons. Am I right that many other users are on the machine in question? If so, this is more of a social problem than a technical problem. What you need is someone (probably the System Administrator) to step in and coordinate all the processes on the machine. They should find the most extravagant memory hogs and work with their programmers to reduce the cost on system resources. Further, they ought to arrange for processes to be scheduled so that resource allocation is efficient. Finally, they may need to get more or improved hardware to handle the expected system load.
Some questions you might ask yourself:
are my data structures really useful for the task at hand?
do I really have to cache that much?
can I throw away cached data after some time?
my #array;
$#array = 1_000_000; # pre-extend array to one million elements,
# http://perldoc.perl.org/perldata.html#Scalar-values
my %hash;
keys(%hash) = 8192; # pre-allocate hash buckets
# (same documentation section)
Not being familiar with your code, I'll venture some wild speculation here [grin] that these techniques aren't going to offer new great efficiencies to your script, but that the pre-allocation could help a little bit.
Good luck!
-- Douglas Hunter
I recently rediscovered an excellent Randal L. Schwartz article that includes preallocating an array. Assuming this is your problem, you can test preallocating with a variation on that code. But be sure to test the result.
The reason the script gets slower with more caching might be thrashing. Presumably the reason for caching in the first place is to increase performance. So a quick answer is: reduce caching.
Now there may be ways to modify your caching scheme so that it uses less main memory and avoids thrashing. For instance, you might find that caching to a file or database instead of to memory can boost performance. I've found that file system and database caching can be more efficient than application caching and can be shared among multiple instances.
Another idea might be to alter your algorithm to reduce memory usage in other areas. For instance, instead of pulling an entire file into memory, Perl programs tend to work better reading line by line.
Finally, have you explored the Memoize module? It might not be immediately applicable, but it could be a source of ideas.
I could not find a way to do this yet.
But, I found out that (See this for details)
Memory allocated to lexicals (i.e.
my() variables) cannot be reclaimed or
reused even if they go out of scope.
It is reserved in case the variables
come back into scope. Memory allocated
to global variables can be reused
(within your program) by using
undef()ing and/or delete().
So, I believe a possibility here could be to check if i can reduce the total memory print of lexical variables at a given point in time.
It sounds like you are looking for limit or ulimit. But I suspect that will cause a script that goes over the limit to fail, which probably isn't what you want.
A better idea might be to share cached data between processes. Putting data in a database or in a file works well in my experience.
I hate to say it, but if your memory limitations are this severe, Perl is probably not the right language for this application. C would be a better choice, I'd think.
One thing you could do is to use solaris zones (containers) .
You could put your process in a zone and allocate it resources like RAM and CPU's.
Here are two links to some tutorials :
Solaris Containers How To Guide
Zone Resource Control in the Solaris 10 08/07 OS
While it's not pre-allocating as you asked for, you may also want to look at the large page size options, so that when perl has to ask the OS for more memory for your program, it gets it in
larger chunks.
See Solaris Internals: Multiple Page Size Support for more information on the difference this makes and how to do it.
Look at http://metacpan.org/pod/Devel::Size
You could also inline a c function to do the above.
As far as I know, you cannot allocate memory directly from Perl. You can get around this by writing an XS module, or using an inline C function like I mentioned.

Resources