I am looking to update an application in which I have the ability to update synchronously or asynchronously. For the real-time nature of the app, which currently ranges from a synchronous execution of methods ranging in frequencies from 1-60Hz, do you see any advantages into asynchronously updating due to user input? Or should I wait until the next synchronous cycle to incorporate the change?
My thoughts so far:
The current advantage that I see with introducing an asynchronous update is that if a member in a 1Hz method is updated, the 60Hz method may execute 50+ times with the old value. I know this is still a relatively short amount of time to a user ( < 1 second), but to me the principal of continuing calcs with bad values for 50+ reps seems bad.
The current advantage that I see with keeping it synchronous is the ease of readability for the flow of code execution.
Are there any repercussions I am not thinking of?
It's a little hard to say without more of a sense of your application. In general, I'd say it's preferable to stay synchronous for a real-time application where possible, just because it makes it easier to reason about timeliness (often the hardest thing to reason about.) If you can reasonably make something periodic, make it periodic and thank your lucky stars.
Moving to a partially synchronous or async model does have some advantages. Like you say, it might feel less than aesthetic to continue operating on stale data. But consider: this is a real-time application. Presumably you have a requirement that states what the update latency for data input to your 60Hz task must be. Like in any general purpose computing performance setting, don't go to extra work to do better than that unless it's easy; it's clearer in the implementation; or it becomes necessary to achieve correctness.
So, all that said, there are no hard and fast rules. Make sure your rationale is both written down and reflected in your design.
Related
My girlfriend was asked the below question in an interview:
We trigger 5 independent APIs simultaneously. Once they have all completed, we want to trigger a function. How will you design a system to do this?
My girlfriend replied she will use a flag variable, but the interviewer was evidently not happy with it.
So, is there a good way in which this could be handled (in a distributed context)? Note that each of the 5 API calls are made by different servers and the function to be triggered is on a 6th server.
The other answers suggesting Promises seem to assume all these requests necessarily come from the same client. If the context here is distributed systems, as you said it is, then I don't think those are valid answers. If they were, then the interview question would have nothing to do with distributed systems, except to essay your girlfriend's ability to recognize something that isn't really a distributed systems problem.
And the question does have the shape of some classic problems in distributed systems. It sounds a lot like YouTube view counting: How do you achieve qualities like atomicity and consistency in a multi-threaded, multi-process, or multi-client environment? Failing to recognize this, thinking the answer could be as simple as "a flag", betrayed a lack of experience in distributed systems.
Another thing about that answer is that it leaves many ambiguities. Where does the flag live? As a variable in another (Java?) API? In a database? In a file? Even in a non-distributed context, these are important questions. And if she had gone on to address these questions, even being innocent of all the distributed systems complications, she might have happily fallen into a discussion of the kinds of D.S. problems that occur when you use, say, a file; and how using a ACID-compliant database might solve those problems, and what the tradeoffs might be there... And she might have corrected herself and said "counter" instead of "flag"!
If I were asked this, my first thought would be to use promises/futures. The idea behind them is that you can execute time-consuming operations asynchronously and they will somehow notify you when they've completed, either successfully or unsuccessfully, typically by calling a callback function. So the first step is to spawn five asynchronous tasks and get five promises.
Then I would join the five promises together, creating a unified promise that represents the five separate tasks. In JavaScript I might call Promise.all(); in Java I would use CompletableFuture.allOf().
I would want to make sure to handle both success and failure. The combined promise should succeed if all of the API calls succeed and fail if any of them fail. If any fail there should be appropriate error handling/reporting. What happens if multiple calls fail? How would a mix of successes and failures be reported? These would be design points to mention, though not necessarily solve during the interview.
Promises and futures typically have modular layering system that would allow edge cases like timeouts to be handled by chaining handlers together. If done right, timeouts could become just another error condition that would be naturally handled by the error handling already in place.
This solution would not require any state to be shared across threads, so I would not have to worry about mutexes or deadlocks or other thread synchronization problems.
She said she would use a flag variable to keep track of the number of API calls have returned.
One thing that makes great interviewees stand out is their ability to anticipate follow-up questions and explain details before they are asked. The best answers are fully fleshed out. They demonstrate that one has thought through one's answer in detail, and they have minimal handwaving.
When I read the above I have a slew of follow-up questions:
How will she know when each API call has returned? Is she waiting for a function call to return, a callback to be called, an event to be fired, or a promise to complete?
How is she causing all of the API calls to be executed concurrently? Is there multithreading, a fork-join pool, multiprocessing, or asynchronous execution?
Flag variables are booleans. Is she really using a flag, or does she mean a counter?
What is the variable tracking and what code is updating it?
What is monitoring the variable, what condition is it checking, and what's it doing when the condition is reached?
If using multithreading, how is she handling synchronization?
How will she handle edge cases such API calls failing, or timing out?
A flag variable might lead to a workable solution or it might lead nowhere. The only way an interviewer will know which it is is if she thinks about and proactively discusses these various questions. Otherwise, the interviewer will have to pepper her with follow-up questions, and will likely lower their evaluation of her.
When I interview people, my mental grades are something like:
S — Solution works and they addressed all issues without prompting.
A — Solution works, follow-up questions answered satisfactorily.
B — Solution works, explained well, but there's a better solution that more experienced devs would find.
C — What they said is okay, but their depth of knowledge is lacking.
F — Their answer is flat out incorrect, or getting them to explain their answer was like pulling teeth.
In my programm, I would like to heavily parallelize many mathematical calculations, the results of which are then written to an output file.
I successfully implemented that using collective communication (gather, scatter etc.) but I noticed that using these synchronizing routines, the slowest among all processors dominates the execution time and heavily reduces overall computation time, as fast processors spend a lot of time waiting.
So I decided to switch to the scheme, where one (master) processor is dedicated to receiving chunks of results and handling the file output, and alle the other processors calculate these results and send them to the master using non-blocking send routines.
Unfortunately, I don't really know how to implement the master code; Do I need to run an infinite loop with MPI_Recv(), listening for incoming messages? How do I know when to stop the loop? Can I combine MPI_Isend() and MPI_Recv(), or do both method need to be non-blocking? How is this typically done?
MPI 3.1 provides non-blocking collectives. I would strongly recommend that instead of implementing it on your own.
However, it may not help you after all. Eventually you need the data from all processes, even the slow ones. So you are likely to wait at some point again. Non-blocking communication overlaps communication and computation, but it doesn't fix your load imbalances.
Update (more or less a long clarification comment)
There are several layers to your question, I might have been confused by the title as to what kind of answer you were expecting. Maybe the question is rather
How do I implement a centralized work queue in MPI?
This pops up regularly, most recently here. But that is actually often undesirable because a central component quickly becomes a bottleneck in large scale programs. So the actual problem you have, is that your work decomposition & mapping is imbalanced. So the more fundamental "X-question" is
How do I load balance an MPI application?
At that point you must provide more information about your mathematical problem and it's current implementation. Preferably in form of an [mcve]. Again, there is no standard solution. Load balancing is a huge research area. It may even be a topic for CS.SE rather than SO.
By this question, I am able to store large number (>50k) of entities in datastore. Now I want to access all of it in my application. I have to perform mathematical operations on it. It always time out. One way is to use TaskQueue again but it will be asynchronous job. I need a way to access these 50k+ entities in my application and process them without getting time out.
Part of the accepted answer to your original question may still apply, for example a manually scaled instance with 24h deadline. Or a VM instance. For a price, of course.
Some speedup may be achieved by using memcache.
Side note: depending on the size of your entities you may need to keep an eye on the instance memory usage as well.
Another possibility would be to switch to a faster instance class (and with more memory as well, but also with extra costs).
But all such improvements might still not be enough. The best approach would still be to give your entity data processing algorithm a deeper thought - to make it scalable.
I'm having a hard time imagining a computation so monolithic that can't be broken into smaller pieces which wouldn't need all the data at once. I'm almost certain there has to be some way of using some partial computations, maybe with storing some partial results so that you can split the problem and allow it to be handled in smaller pieces in multiple requests.
As an extreme (academic) example think about CPUs doing pretty much any super-complex computation fundamentally with just sequences of simple, short operations on a small set of registers - it's all about how to orchestrate them.
Here's a nice article describing a drastic reduction of the overall duration of a computation (no clue if it's anything like yours) by using a nice approach (also interesting because it's using the GAE Pipeline API).
If you post your code you might get some more specific advice.
I have a ASP.NET web application (.NET 2008) using MS SQL server 2005. I want to increase the performance of the web site. Does anyone know of an article containing steps to do that, step by step, in SQL (indexes, etc.), and in the code?
Performance tuning is a very specific process. I don't know of any articles that discuss directly how to achieve this, but I can give you a brief overview of the steps I follow when I need to improve performance of an application/website.
Profile.
Start by gathering performance data. At the end of the tuning process you will need some numbers to compare to actually prove you have made a difference. This means you need to choose some specific processes that you monitor and record their performance and throughput.
For example, on your site you might record how long a login takes. You need to keep this very narrow. Pick a specific action that you want to record and time it. (Use a tool to do the timing, or put some Stopwatch code in you app to report times. Also, don't just run it once. Run it multiple times. Try to ensure you know all the environment set up so you can duplicate this again at the end.
Try to make this as close to your production environment as possible. Make sure your code is compiled in release mode, and running on real separate servers, not just all on one box etc.
Instrument.
Now you know what action you want to improve, and you have a target time to beat, you can instrument your code. This means injecting (manually or automatically) extra code that times each method call, or each line and records times and or memory usage right down the call stack.
There are lots of tools out their that can help you with this and automate some of it. (Microsoft's CLR profiler (free), Redgate - Ants (commercial), the higher editions of visual studio have stuff built in, and loads more) But you don't have to use automatic tools, it's perfectly acceptable to just use the Stopwatch class to time each block of your code. What you are looking for is a bottle neck. The likely hood is that you will find a high proportion of the overall time is spent in a very small bit of code.
Tune.
Now you have some timing data, you can start tuning.
There are two approaches to consider here. Firstly, take an overall perspective. Consider if you need to re design the whole call stack. Are you repeating something unnecessarily? Or are you just doing something you don't need to?
Secondly, now you have an idea of where your bottle neck is you can try and figure out ways to improve this bit of code. I can't offer much advice here, because it depends on what your bottle neck is, but just look to optimise it. Perhaps you need to cache data so you don't have to loop over it twice. Or batch up SQL calls so you can do just one. Or tighten your query filters so you return less data.
Re-profile.
This is the most important step that people often miss out. Once you have tuned your code, you absolutely must re-profile it in the same environment that you ran your initial profiling in. It is very common to make minor tweaks that you think might improve performance and actually end up degrading it because of some unknown way that the CLR handles something. This is much more common in managed languages because you often don't know exactly what is going on under the covers.
Now just repeat as necessary.
If you are likely to be performance tuning often I find it good to have a whole batch of automated performance tests that I can run that check the performance and throughput of various different activities. This way I can run these with every release and record performance changes each release. It also means that I can check that after a performance tuning session I know I haven't made the performance of some other area any worse.
When you are profiling, don't always just think about the time to run a single action. Also consider profiling under load, with lots of users logged in. Sometimes apps perform great when there's just one user connected, but when they hit a certain number of users suddenly the whole thing grinds to a halt. Perhaps because suddnely they are spending more time context switching or swapping memory in and out to disk. If it's throughput you want to improve you need to be figuring out what is causing the limit on throughput.
Finally. Check out this huge MSDN article on Improving .NET Application Performance and Scalability. Specifically, you might want to look at chapter 6 and chapter 17.
I think the best we can do from here is give you some pointers:
query less data from the sql server (caching, appropriate query filters)
write better queries (indexing, joins, paging, etc)
minimise any inappropriate blockages such as locks between different requests
make sure session-state hasn't exploded in size
use bigger metal / more metal
use appropriate looping code etc
But to stress; from here anything is guesswork. You need to profile to find the general area for the suckage, and then profile more to isolate the specific area(s); but start by looking at:
sql trace between web-server and sql-server
network trace between web-server and client (both directions)
cache / state servers if appropriate
CPU / memory utilisation on the web-server
I think First of all you have to find your Bottlenecks and then try to improve those.
This helps you to perform exactly where you have serios problem.
An in addition you needto improve your Connection to DB. For exampleusing a Lazy , Singletone Pattern and also create Batch request instead of single requests.
It help you to decrease DB connection.
Check your cache and suitable loop structures.
another thing is to use appropriate types, forexample if you need int donot create a long and etc
at the end ypu can use some Profiler (specially in SQL) andcheckif your queries implemented as well as possible.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have been doing some reading on this subject, but I'm curious to see what the best ways are to optimize your use of the ASP.NET cache and what some of the tips are in regards to how to determine what should and should not go in the cache. Also, are there any rules of thumb for determining how long something should say in the cache?
Some rules of thumb
Think in terms of cache miss to request ratio each time you contemplate using the cache. If cache requests for the item will miss most of the time then the benefits may not outweigh the cost of maintaining that cache item
Contemplate the query expense vs cache retrieval expense (e.g. for simple reads, SQL Server is often faster than distributed cache due to serialization costs)
Some tricks
gzip strings before sticking them in cache. Effectively expands the cache and reduces network traffic in a distributed cache situation
If you're worried about how long to cache aggregates (e.g. counts) consider having non-expiring (or long-lived) cached aggregates and pro-actively updating those when changing the underlying data. This is a controversial technique and you should really consider your request/invalidation ratio before proceeding but in some cases the benefits can be worth it (e.g. SO rep for each user might be a good candidate depending on implementation details, number of unanswered SO questions would probably be a poor candidate)
Don't implement caching yet.
Put it off until you've exhausted all the Indexing, query tuning, page simplification, and other more pedestrian means of boosting performance. If you flip caching on before it's the last resort, you're going to have a much harder time figuring out where the performance bottlenecks really live.
And, of course, if you have the backend tuned right when you finally do turn on caching, it will work a lot better for a lot longer than it would if you did it today.
The best quote i've heard about performance tuning and caching is that it's an art not a science, sorry can't remember who said it but the point here is that there are so many factors that can have an effect on the performance of your app that you need to evaluate each situation case by case and make considered tweaks to that case until you reach a desired outcome.
I realise i'm not giving any specifics here but I don't really think you can
I will give one previous example though. I worked on an app that made alot of calls to webservices to built up a client profile e.g.
GET client
GET client quotes
GET client quote
Each object returned by the webservice contributed to a higher level object that was then used to build the resulting page. At first we gathered up all the objects into the master object and cached that. However we realised when things were not as quick as we would like that it would make more sense to cache each called object individually, this way it could be re-used on the next page the client sees e.g.
[Cache] client
[Cache] client quotes
[Cache] client quote
GET client quote upgrades
Unfortunately there is no pre-established rules...but to give you a common sense, I would say that you can easily cache:
Application Parameters (list of countries, phone codes, etc...)
Any other application non-volatile data (list of roles even if configurable)
Business data that is often read and does not change much (or not a big deal if it is not 100% accurate)
What you should not cache:
Volatile data that change frequently (usually the business data)
As for the cache duration, I tend to use different durations depending on the type of data and its size. Application Parameters can be cached for several hours or even days.
For some business data, you may want to have smaller cache duration (minutes to 1h)
One last thing is always to challenge the amount of data you manipulate. Remember that the end-user won't read thousands of records at the same time.
Hope this will give you some guidance.
It's very hard to generalize this sort of thing. The only hard-and-fast rule to follow is not to waste time optimizing something unless you know it needs to be done. Then the proper course of action is going to be very much dependent on the nitty gritty details of your application.
That said... I'll almost always cache global applications parameters in some easy to use object. This is certainly more of a programming convenience rather than optimization.
The one time I've written specific data caching code was for an app that interfaced with a very slow accounting database, and then it was read-only for data that didn't change very often. All writes went to the DB. With SQL Server, I've never run into a situation where the built-in ASP.NET-to-SQL Server interface was the slow part of the equation.