How to get the previous jobs status after performing wrong action in Autosys - autosys

Actually I got a request to perform onice the jobs running on a machine . I took the jobs from machine and performed onice on jobs.After few minutes I have received another request to office the jobs , I performed off ice but by mistake I have off iced all the jobs instead of the specifid machine.
In this case I did a mistake of off ice of other jobs too which are supposed to be in onice.
Now how can I get those previously on ice jobs before the request ?
Can any one help me in this pls?
I hope you got my question.

Related

nginx worker process always run only 1

I have following configuration with
worker_process 4;
But I noticed that it always hits only 1 worker.
I am testing on a local Centos VM. I am doing curl http call on specific port and added a file with 1000 curl requests and ran them from multiple terminal windows.
But see alll of them hit only 1 worker. Is there a way that I can have atleast more than 1 worker started. Can someone please share their knowledge on this.
https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/
In the epoll-and-accept the load balancing algorithm differs: Linux seems to choose the last added process, a LIFO-like behavior. The process added to the waiting queue most recently will get the new connection. This behavior causes the busiest process, the one that only just went back to event loop, to receive the majority of the new connections. Therefore, the busiest worker is likely to get most of the load.

Apache Flink - End to End testing how to terminate input source

I've used apache flink in batch processing for a while but now we want to convert this batch job to a streaming job. The problem I run into is how to run end-to-end tests.
How it worked in a batch job
When using batch processing we created end-to-end tests using cucumber.
We would fill up the hbase table we read from
Run the batch job
Wait for it to finish
verify the result
The problem in a streaming job
We would like to do something similar with the streaming job except the streaming job does not really finish.
So:
fill up the message queue we read from
Run the streaming job.
Wait for it to finish (how?)
Verify the result
We could just wait 5 seconds after every test and assume everything has been processed but that would slow everything down a lot.
Question:
What are some ways or best practices to run end-to-end tests on a streaming flink job without forceable terminating the flink job after x seconds
Most Flink DataStream sources, if they are reading from a finite input, will inject a watermark with value LONG.MAX_VALUE when they reach the end, after which the job will be terminated.
The Flink training exercises illustrate one approach to doing end-to-end testing of Flink jobs. I suggest cloning the github repo and looking at how the tests are setup. They use a custom source and sink and redirect the input and output for testing.
This topic is also discussed a bit in the documentation.

EC2 Instance Requires Daily Restart

I am running a Wordpress blog on an AWS t2.micro EC2 instance running the AWS Linux. However most days I wake to an email saying that my blog is offline. When this happens I cannot SSH into the EC2 instance, however on the AWS dashboard it is shown as being online and none of the metrics look too suspicious.
The time I was notified about the blog being down was just after the start of the first plateau on the CPU Utilization graph - 4:31am.
A restart from the AWS control panel/app fixes things for a day or two, however I would like to have a more permanent fix.
Can anyone suggest any changes I can make to my instance to get it running more reliably?
[Edit - February 2018]
This has started happening again, after being fine for a few months. Each morning this week I have woken up to an alert that my blog offline - a reboot of the server brings it back online. This morning I was able to investigate it and was able to SSH in. Running top gave the following (I noticed the lack of http/mysqld):
My CloudWatch metrics for the last 72 hours are:
The bigger spikes are where I rebooted the instance. As you can see from the CPU balance, although there are spikes, they aren't huge spikes, as the CPU Credit Balance metric barely dips.
As this question has had so many views, I thought I would post about the workaround I have used to overcome this issue.
I still do not know why my blog goes offline, but knowing that rebooting the EC2 instance recovered it, I decided to automate that reboot.
There are three parts to this solution:
Detect the "blog offline" email from Jetpack and send it to AWS. I created a rule on my Gmail to handle this, forwarding the email to an address monitored by AWS SES.
The SNS triggers an AWS Lambda function to run.
The Lambda function reboots the EC2 instance.
Now I usually get a "blog back online" email within a few minutes of the original "blog offline" email.

IIS holding up requests in queue instead of processing those

I'm executing a load test against an application hosted in Azure. It's a cloud service with 3 instances behind an internal load balancer (Hash based load balancing mode).
When I execute the load test, it queues request even though the req/sec and total current request to IIS is quite low. I'm not sure what could be the problem.
Any suggestions?
Adding few screenshot of performance counters which might help you take decision.
Click on image to view original image.
Edit-1: Per request from Rohit Rajan,
Cloud Service is having 2 instances (meaning 2 VMs), each of them having 14 GBs of RAM and 8 cores.
I'm executing a Step load pattern start with 100 and add 100,150 user every 5 minutes, till 4-5 hours until the load reaches to 10,000 VUs.
Any call to external system are written async. Database calls are synchronous.
There is no straight forward answer to your question. One possible way would be to explore additional investigation options.
Based on your explanation, there seems to be a bottleneck within the application which is causing the requests to queue-up.
In order to investigate this, collect a memory dump when you see the requests queuing up and then use DebugDiag to run a hang analysis on it.
There are several ways to gather the memory dump.
Task Manager
Procdump.exe
Debug Diagnostics
Process Explorer
Once you have the memory dump you can install debug diag and then run analysis on it. It will generate a report which can help you get started.
Debug Diagnostics download: https://www.microsoft.com/en-us/download/details.aspx?id=49924

How do I kill running map tasks on Amazon EMR?

I have a job running using Hadoop 0.20 on 32 spot instances. It has been running for 9 hours with no errors. It has processed 3800 tasks during that time, but I have noticed that just two tasks appear to be stuck and have been running alone for a couple of hours (apparently responding because they don't time out). The tasks don't typically take more than 15 minutes. I don't want to lose all the work that's already been done, because it costs me a lot of money. I would really just like to kill those two tasks and have Hadoop either reassign them or just count them as failed. Until they stop, I cannot get the reduce results from the other 3798 maps!
But I can't figure out how to do that. I have considered trying to figure out which instances are running the tasks and then terminate those instances, but
I don't know how to figure out which instances are the culprits
I am afraid it will have unintended effects.
How do I just kill individual map tasks?
Generally, on a Hadoop cluster you can kill a particular task by issuing:
hadoop job -kill-task [attempt_id]
This will kill the given map task and re-submits it on an different
node with a new id.
To get the attemp_id navigate on the Jobtracker's web UI to the map task
in question, click on it and note it's id (e.g: attempt_201210111830_0012_m_000000_0)
ssh to the master node as mentioned by Lorand, and execute:
bin/hadoop job -list
bin/hadoop job –kill <JobID>

Resources