Is there a way to measure or monitor a scheduler job's resources used?

Is there a way to measure or monitor a scheduler job's resources used? - plsql

Recently i got the task to optimize a quite huge PLSQL script which prior to my changes took about 1 hour +/- 10mins.
So I got to do some reallocation of some methods and generally just some replacement of big views with simpler subquery or with statements. I noticed that if I ran the scheduled job by right-clicking it and execute job I would in most cases see the run duration change (in a positive way). But if I enabled the job and let it run by its schedule it takes the original hour no matter what changes you do to it.
Now my question here is: Is there any way to monitor the RAM or CPU usage of the session/job or is there a difference in general how many resources are allocated to background processes? Because my suspicion here is the "manual" run job somehow gets some priorities the scheduler doesn't get or doesn't take.
Either way for troubleshooting purposes you can't take a few hours a work day just to wait for results.

Related

Why wouldn't a small Firebase Functions app just use a single Function to handle logic?

...aside from the benefit in separate performance monitoring and logging.
For logging, I am confident I can get granularity through manually adding the name of the "routine" to each call. This is how it is now with several discrete Functions for different parts of the system:
There are multiple automatic logs: start and finish of the routine, for example. It would be more challenging to find out how expensive certain routines are, but it would not be impossible.
The reason I want the entire logic of the application handled by a single handle function is because of reducing cold starts: one function means only one container that can be persistently kept alive when there are very few users of the app.
If a month is ~2.6m seconds and we assume the system uses 1 GB RAM and 1 GHz CPU frequency at all times, that's:
2600000 * 0.0000025 + 2600000 * 0.000001042 = USD$9.21 a month
...for one minimum instance.
I should also state that all of my functions have the bare minimum amount of global scope code; it just sets up Firebase assets (RTDB and Firestore).
From a billing, performance (based on user wait time), and user/developer experience perspective, is there any reason why it would be smart to keep all my functions discrete?
I'd also accept an answer saying "one single function for all logic is reasonable" as long as there's a reason for it.
Thanks!

If you have very small app with ~5 end points and very low traffic. Sure you could do something like this. But why not do it:
billing and performance
The important thing to realize is that with every request a new instance of your function is created. Which means there could be 10s of them running at the same time.
If you would like to have just 1 instance handling all the traffic you should explore GCP Cloud run, where you have 1 container handling multiple requests and scaling only when it's not sufficient.
Imagine you have several end-points and every one of them have different performance requirements.
1 can need only 128MB or RAM
1 can need 1GB RAM
(FYI: You can control the CPU MHz of the function via the RAM settings too - which can speed up execution in some cases)
If you had only 1 function with 1GB of ram. Every request would allocate such function and in some cases most of the memory could go to waste.
But if you split it into multiple, some requests will require much less resources and can save you $ when we talk about bigger amount of executions / month. (tens of thousands+).
Let's imagine function, 3 second execution, 10k executions/month:
128MB would cost you $0.0693
1024MB would cost you $0.495
As you can see, with small app the difference could be nothing. But if you scale it matters. (*The cost can vary based on datacenter)
As for the logging, I don't think it matters. Usually in bigger systems there could be messages traveling trough several functions so you have to deal with that anyway.
As for the cold start. You just need good UI to facilitate that. At first I was worry about it in our apps but later on, you just get used to it that some action can take ~2s to execute (cold start). And you should have the UI "loading" regardless, because you don't know if the function will take ~100ms or 3s due to bad connection.

Task Scheduler function impact on system

I am in process of understanding the Task Scheduler functions. For example I am working on 32-bit Infineon Aurix Tricore controller whose Task Schedulers are designed for 5msec. Now, if I design to run my application on 10msec task scheduler function instead of 5msec what kind of data I should be consider into account?
Such as impact on CPU run-time, CPU load analysis etc?
Like how my change of task scheduler at low level code impact the code execution.

In short, the smaller the task slice time, the smoother the multitasking will appear to the user. On the other hand, more task switches increases the time spent switching tasks instead of running them.
Longer times with many tasks means long time distance revisiting the same task (e.g., more jerky behavior).
(Note: I normally use 1ms task switches on very low end MCUs with very good results with around 5-10 total tasks.)

If you are changing the entire scheduler to be run on 10ms instead of 5ms , then whether the SW will be able to detect the changes in your system in time should be considered (For example, if you have sensors like environment temperature sensor , the probability of change in value in 5ms is extremely rare.But on the other hand if you are detecting something like wheel speed 10ms might be slow ).Similarly whether you can control the actuators to create the desired response in the system should be considered as well. If these two things are considered and provided task execution time (not the frequency) is within limits the CPU load will not be a problem.
Note: If your application has code which calculates delays under assumption that the task runs in 5ms , then this needs to be changed
On the other hand if you are adding a 10ms task in addition to a 5ms task.then you should consider the following
1.Adding a new time slice adds additional context switch operation which will add a delay.
2.Also based on the task's priority and preemptive/co-operative behavior, one task can block the other for some duration which can potentially create lag on the functionality or cause it to malfunction, but this is just a probability and need not happen as well .
3.Context stitch also means you need now to use some more stack area than before, based on your project utilization this may cause an issue
You need analyse your SW to come to conclusion on points 2 and 3.I have given some examples which could help in the analysis
Ex1: If execution time(not the frequency) of your 10ms task is negligible compared to execution time of your 5ms task ,then better option is to schedule this functionality also in 5ms as this will save you the context switch time as well as stack size.
Ex2: If execution time of your 10ms task is comparatively long and it is of lesser priority(and preemptible by 5ms task ) than the 5ms , it is better to have the functionality in 10ms task as this will help the 5ms task to finish its execution before its next slice.

How to run Airflow DAG for specific number of times?

How to run airflow dag for specified number of times?
I tried using TriggerDagRunOperator, This operators works for me.
In callable function we can check states and decide to continue or not.
However the current count and states needs to be maintained.
Using above approach I am able to repeat DAG 'run'.
Need expert opinion, Is there is any other profound way to run Airflow DAG for X number of times?
Thanks.

I'm afraid that Airflow is ENTIRELY about time based scheduling.
You can set a schedule to None and then use the API to trigger runs, but you'd be doing that externally, and thus maintaining the counts and states that determine when and why to trigger externally.
When you say that your DAG may have 5 tasks which you want to run 10 times and a run takes 2 hours and you cannot schedule it based on time, this is confusing. We have no idea what the significance of 2 hours is to you, or why it must be 10 runs, nor why you cannot schedule it to run those 5 tasks once a day. With a simple daily schedule it would run once a day at approximately the same time, and it won't matter that it takes a little longer than 2 hours on any given day. Right?
You could set the start_date to 11 days ago (a fixed date though, don't set it dynamically), and the end_date to today (also fixed) and then add a daily schedule_interval and a max_active_runs of 1 and you'll get exactly 10 runs and it'll run them back to back without overlapping while changing the execution_date accordingly, then stop. Or you could just use airflow backfill with a None scheduled DAG and a range of execution datetimes.
Do you mean that you want it to run every 2 hours continuously, but sometimes it will be running longer and you don't want it to overlap runs? Well, you definitely can schedule it to run every 2 hours (0 0/2 * * *) and set the max_active_runs to 1, so that if the prior run hasn't finished the next run will wait then kick off when the prior one has completed. See the last bullet in https://airflow.apache.org/faq.html#why-isn-t-my-task-getting-scheduled.
If you want your DAG to run exactly every 2 hours on the dot [give or take some scheduler lag, yes that's a thing] and to leave the prior run going, that's mostly the default behavior, but you could add depends_on_past to some of the important tasks that themselves shouldn't be run concurrently (like creating, inserting to, or dropping a temp table), or use a pool with a single slot.
There isn't any feature to kill the prior run if your next schedule is ready to start. It might be possible to skip the current run if the prior one hasn't completed yet, but I forget how that's done exactly.
That's basically most of your options there. Also you could create manual dag_runs for an unscheduled DAG; creating 10 at a time when you feel like (using the UI or CLI instead of the API, but the API might be easier).
Do any of these suggestions address your concerns? Because it's not clear why you want a fixed number of runs, how frequently, or with what schedule and conditions, it's difficult to provide specific recommendations.

This functionality isn't natively supported by Airflow
But by exploiting the meta-db, we can cook-up this functionality ourselves
we can write a custom-operator / python operator
before running the actual computation, check if 'n' runs for the task (TaskInstance table) already exist in meta-db. (Refer to task_command.py for help)
and if they do, just skip the task (raise AirflowSkipException, reference)
This excellent article can be used for inspiration: Use apache airflow to run task exactly once
Note
The downside of this approach is that it assumes historical runs of task (TaskInstances) would forever be preserved (and correctly)
in practise though, I've often found task_instances to be missing (we have catchup set to False)
furthermore, on large Airflow deployments, one might need to setup routinal cleanup of meta-db, which would make this approach impossible

Hadoop - job submission time on large data

Did anyone face any problem with submitting job on large data. Data is around 5-10 TB uncompressed, it is in approximate 500K files. When we try to submit a simple java map reduce job, it's mostly spend more than hour on getsplits() function call. And takes multiple hour to appear in job tracker. Is there any possible solution to solve this problem?

with 500k files, you are spending a lot of time tree walking to find all these files, which then need to be assigned to list of InputSplits (the result of getSplits).
As Thomas points out in his answer, if your machine performing the job submission has a low amount of memory assigned to the JVM, then you're going to see issues with the JVM performing garbage collection to try and find the memory required to build up the splits for these 500K files.
To makes matters worse, if these 500K files are splittable, and larger than a single block size, then you'll get even more input splits to process the files (a file of size say 1GB, with a block size of 256MB, you'll by default get 4 map tasks to process this file, assuming the input format and file compression supports splitting the file). If this is applicable to your job (look at the number of map tasks spawned for your job, are there more than 500k?), then you can force less mappers to be created by amending the mapred.min.split.size configuration property to a size larger then the current block size (setting it to 1GB for the previous example means you'll get a single mapper to process the file, rather than 4). This will help the performance of getSplits method the resultant list of getSplits will be smaller, requiring less memory.
The second symptom of your problem is the time is takes to serialize the input splits to a file (client side), and then the deserialization time at the job tracker end. 500K+ splits is going to take time, and the jobtracker will have similar GC issues if it has a low JVM memory limit.

It largely depends on how "strong" your submission server is (or your laptop client), maybe you need to upgrade RAM and CPU to make the getSplits call faster.
I believe you ran into swap issues there and the computation takes therfore multiple times longer than usual.

Long time between submit and start time of a command in OpenCL

I'm running a kernel on a big array. When I profile the clEnqueueNDRange command, the execution time (end-start) is .001 ms but the time between submit and start (start-submit) is around 120 ms which varies with the size of the input data. What happens when a command is submitted until it start to execute. Is it reasonable to get this large time?

OpenCL operates asynchronously. That is to say that when you ask for a piece of work to be done, it may not happen at that time. It will happen at some time in the future. This is a little weird, especially when you start profiling things, but it works like this so that the CPU can queue up lots of work for the OpenGL device, and then go do something else while the work is done.
For example:
clEnqueueWriteBuffer(blah);
clEnqueueNDRange(blah);
clEnqueueReadBuffer(blah, but blocking_read = CL_TRUE);
Here, the writeBuffer and the NDRange will probably appear to take very small amounts of time. All they'll do is record what needs to be done. The blocking readBuffer will take a long time, because it has to wait for the results of the read. For that read to complete, the write, and the kernel execution have to complete, before the read can even start.
Now the read might be very small, but because it's waiting for everything before it to finish the time it appears to take is dependent on the amount of work in the commands before it.
I don't quite understand what you're measuring from your question, but I expect what you're seeing is this effect. The time for work is being charged to other functions because they have to wait for previous work to finish.
Knowing which functions cause the CPU to wait on the GPU is one of the big tricks when it comes to writing high performance code. Any time you introduce a wait like this the CPU stops doing any useful work, and the GPU is likely to go idle whilst the CPU prepares the next lump of work. Sometimes, there no alternative, and you just have to wait.