I have windows service which need to execute around 10000 schedules (need to send/execute data to all the members).
For one member it's taking 3 to 5 seconds for 10000 schedules it's taking around 10 minutes or so..
But I need to execute all these schedules in one minutes.
Thanks In advance
Assuming you need to do parallel processing, you better read this here doc to get to know the paradigm and avoid common pitfalls (its for .net 4.0 but I am suggesting you ready it no matter what because it goes over basic concepts).
If you can push down processing time down to < 2 seconds per task then I'd suggest you don't mess with parallel processing (it's likely to complicate your life in ways you cannot imagine).
Related
...aside from the benefit in separate performance monitoring and logging.
For logging, I am confident I can get granularity through manually adding the name of the "routine" to each call. This is how it is now with several discrete Functions for different parts of the system:
There are multiple automatic logs: start and finish of the routine, for example. It would be more challenging to find out how expensive certain routines are, but it would not be impossible.
The reason I want the entire logic of the application handled by a single handle function is because of reducing cold starts: one function means only one container that can be persistently kept alive when there are very few users of the app.
If a month is ~2.6m seconds and we assume the system uses 1 GB RAM and 1 GHz CPU frequency at all times, that's:
2600000 * 0.0000025 + 2600000 * 0.000001042 = USD$9.21 a month
...for one minimum instance.
I should also state that all of my functions have the bare minimum amount of global scope code; it just sets up Firebase assets (RTDB and Firestore).
From a billing, performance (based on user wait time), and user/developer experience perspective, is there any reason why it would be smart to keep all my functions discrete?
I'd also accept an answer saying "one single function for all logic is reasonable" as long as there's a reason for it.
Thanks!
If you have very small app with ~5 end points and very low traffic. Sure you could do something like this. But why not do it:
billing and performance
The important thing to realize is that with every request a new instance of your function is created. Which means there could be 10s of them running at the same time.
If you would like to have just 1 instance handling all the traffic you should explore GCP Cloud run, where you have 1 container handling multiple requests and scaling only when it's not sufficient.
Imagine you have several end-points and every one of them have different performance requirements.
1 can need only 128MB or RAM
1 can need 1GB RAM
(FYI: You can control the CPU MHz of the function via the RAM settings too - which can speed up execution in some cases)
If you had only 1 function with 1GB of ram. Every request would allocate such function and in some cases most of the memory could go to waste.
But if you split it into multiple, some requests will require much less resources and can save you $ when we talk about bigger amount of executions / month. (tens of thousands+).
Let's imagine function, 3 second execution, 10k executions/month:
128MB would cost you $0.0693
1024MB would cost you $0.495
As you can see, with small app the difference could be nothing. But if you scale it matters. (*The cost can vary based on datacenter)
As for the logging, I don't think it matters. Usually in bigger systems there could be messages traveling trough several functions so you have to deal with that anyway.
As for the cold start. You just need good UI to facilitate that. At first I was worry about it in our apps but later on, you just get used to it that some action can take ~2s to execute (cold start). And you should have the UI "loading" regardless, because you don't know if the function will take ~100ms or 3s due to bad connection.
Recently i got the task to optimize a quite huge PLSQL script which prior to my changes took about 1 hour +/- 10mins.
So I got to do some reallocation of some methods and generally just some replacement of big views with simpler subquery or with statements. I noticed that if I ran the scheduled job by right-clicking it and execute job I would in most cases see the run duration change (in a positive way). But if I enabled the job and let it run by its schedule it takes the original hour no matter what changes you do to it.
Now my question here is: Is there any way to monitor the RAM or CPU usage of the session/job or is there a difference in general how many resources are allocated to background processes? Because my suspicion here is the "manual" run job somehow gets some priorities the scheduler doesn't get or doesn't take.
Either way for troubleshooting purposes you can't take a few hours a work day just to wait for results.
We have a project that deals with millions of transactions everyday which has some tight SLAs. As part of parsing the flat file that comes as input to a bean , we used beanio which was working better with out load. But with load its taking around 250ms to parse a flat file to a bean.
Requirement: Simple string has to converted to a single bean(nested and converted)
Heard the univocity can do better here - and tried the same with below settings.
FixedWidthParserSettings settings = new FixedWidthParserSettings();
settings.getFormat().setLineSeparator("\n");
settings.setRecordEndsOnNewline(false);
settings.setHeaderExtractionEnabled(false);
settings.setIgnoreLeadingWhitespaces(false);
settings.setIgnoreTrailingWhitespaces(false);
settings.setMaxColumns(100);
settings.setMaxCharsPerColumn(100);
settings.setNumberOfRecordsToRead(1);
settings.setReadInputOnSeparateThread(false);
settings.setInputBufferSize(10*1024);
settings.setLineSeparatorDetectionEnabled(false);
settings.setColumnReorderingEnabled(false);
When running with jmeter, with 200 parallel threads - the average time taken is 10ms(to parse and convert around 10 fields where in actual use case we have to the same for around 500 fields)
but when we increased it to 300 or 350 parallel threads , the average time was around 300ms. But our total SLA is around 10ms.
Any help here is highly appreciated!
Probably you are running out of memory on your JVM. Try increasing it with the -Xms and -Xmx flags. Also too many threads won't help you if you don't have enough cores available.
I want to use MPI to make my program parallel, and I want to send something to other computers. I want to know which one is better: sending a huge buffer one time or sending two smaller messages 3 times atrent times during the execution instead of all at once?
It's almost always going to be faster to send the one big message than the smaller one. Each time you do a Send/Receive pair, the two processes have to go through the entire process of sending a message to each other, including at least 6 roundtrip messages. If you are just sending one larger message, there is a minimum of 2 roundtrip messages. Each of those messages can be very expensive (compared to doing things locally like packing all of your data into one buffer).
I'd encourage you to try it out both ways though to be sure that this applies to your application. It could be different if you're doing something unexpected.
Depending on your problem, sending all data may be more efficient, because the nodes have to be synced, every time. That may cause a delay.
I always try to send as much data as I can in a single MPI call. In my experience, sending many small bits of data greatly increases the overhead and network traffic, and I have even run into problems where I overwhelmed the computers' ability to keep up with the number of requests, because I was sending a large member of a complicated class, one integer at a time, to many workers. Therefore, when possible, send the entire data at once, unless you have some reason to believe it is too large.
Further, I strive to use 100% of all the CPU's my program claims. When you are working on shared resources, if you use a CPU, you need to actually use it. Otherwise, someone else who wants to use that core, or node, is blocked out while your program sits and does nothing. For example, on a Cray which I have used, even if you call for only two 'cores', the manager will reserve a full bank of 24 cores, essentially wasting 22. Or, perhaps one worker has nothing to do, while another chugs away -- again, wasting time. Hopefully, there is a way to balance the load, so to speak, to avoid unintentional waste of resources.
Back to the topic at hand. Demonstrate timing and efficiency of vector sending to yourself -- write a program which breaks up the vector into varying sizes of packets, and do the sends/receives. Test it with varying numbers of workers, and on several different configurations of computers, if you can. Before writing production code, do proof of concept, and what optimization you can. Test and time it!
We know that it is better to use parallel computing for longer tasks than shorter tasks, in order to save OS time when switching many small tasks. I am looking for your comments and advise on the following scenario. I got 6 task I can parallel, each of them has a small task that can also be paralleled. Let's say I have 64 cores I can use.
Would it be prudent to use parallel for the 6 larger tasks, and then to parallel again within each task?
You already answered your own question. If the individual tasks are short, the overhead of parallelisation will cause the total calculation time to go up in stead of down. This does not really change when having this kind of hierarchy of parallel jobs. I would think the overhead is even bigger as the information has to be passed down form the smallest jobs, via the intermediate jobs to the top-level. If your smallest tasks do not take a significant amount of time, say at least a few seconds, parallelisation is not going help.